# Twitter Text Report

## EMAT 22110 - Data in Emerging Media and Technology

## Author: Liam Merritt

## Created: 10/19/2021

### Starting the Process

In [134]:
import pandas as pd
import json
import requests
import urllib

In [135]:
ls

'likes' Report - Liam Merritt.ipynb
BLS Data Call.ipynb
Data in EMAT 9-21-21 Notes.ipynb
HipHopAPP_keys.txt
Notes 10-19-21 - Data Scraping.ipynb
Notes 9-23-2021 & 9-28-2021 & 9-30-2021 - Data In Emat.ipynb
Notes 9-30-2021 Data In EMAT.ipynb
Pandas Notes 9-14-21.ipynb
SQL Notes - 10-7-21.ipynb
Twitter API Notes.ipynb
Twitter Text Report - Liam Merritt.ipynb
Twitter_API_key.txt
bls_data.csv
chinook.db
chinook.zip


## Getting the Bearer Token
#### This token is used to access the data on Twitter through my developer account. My use case was to find out the answer to this question. Have frontline workers kept a positive or negative attitude towards working on the frontlines of Covid-19? I will make an attempt to ascertain data that may answer this question by looking at the 300 most recent tweets using the hashtags #frontlineworker, #healthcareworker, and #frontline.

In [136]:
bearer_token = pd.read_csv('Twitter_API_key.txt', header = 0)

In [137]:
### Code of Bearer Token for Twitter API
bearer_token['Bearer Token'].iloc[0]

'AAAAAAAAAAAAAAAAAAAAAOPnNAEAAAAAt6aQBakY445RAyxlUlfmlbKnPuk%3DUHfbKvswaGkQZHojBrhoC5kGybPldtMebzKwMAvHBODwEU3gOK'

## Header
#### This is passed to the request by creating an Authorization for the server to read to validate my bearer token in order for me to get the data that I am requesting about frontline workers in healthcare during the pandemic.

In [138]:
header = {'Authorization' : 'Bearer {}'.format(bearer_token['Bearer Token'].iloc[0])}

## Query
#### The query is the search term(s) that I'm looking for in order to answer the question of whether frontline Covid-19 workers have kept a positive attitude in their work during the pandemic.

In [139]:
query = urllib.parse.quote('(#frontlineworker OR #healthcareworker OR #frontline) lang:en') 
### This search queries the most recent tweets using 
### the hashtag frontlineworker and healthcare in the language of English.

In [140]:
query  ### This is the query that is passed to the Twitter server to bring back the data I've requested.

'%28%23frontlineworker%20OR%20%23healthcareworker%20OR%20%23frontline%29%20lang%3Aen'

## Tweet Fields
#### The tweet fields are the categories that I want to show of the tweets that I'll be pulling from the Twitter server. These tweet fields will bring back the Tweet ID, author name, author handle, Tweet text, engagements, creation time, and the reply settings. The tweet fields I'll be using are: 'public_metrics,created_at,author_id,lang,reply_settings'.

In [141]:
tweet_fields = 'public_metrics,created_at,author_id,lang,reply_settings'

## Expansions
#### The author_id tweet field requires an expansion in order to access the array of data that sits within this field.

In [142]:
expansions = 'author_id'

## Urls
#### This tab specifies the urls that will be needed to call to the server for the correct information about #frontline, #frontlineworker, and #healthcareworker hashtags.

### The first necessary url is the endpoint that every GET recent tweets request needs to access the server.

In [143]:
endpoint_url = 'https://api.twitter.com/2/tweets/search/recent'

### The next url necessary is the url for the tweet fields along with the expansion for the author_id information.

In [144]:
expansion_url = '?query={}&max_results=100&tweet.fields={}&expansions={}&user.fields={}'.format(query, tweet_fields, expansions, 'username')

### The final url is the combination of both urls that will be the url that the server needs to bring back the information that I want about the tweets related to frontline workers in healthcare.

In [145]:
final_url = endpoint_url + expansion_url
final_url

'https://api.twitter.com/2/tweets/search/recent?query=%28%23frontlineworker%20OR%20%23healthcareworker%20OR%20%23frontline%29%20lang%3Aen&max_results=100&tweet.fields=public_metrics,created_at,author_id,lang,reply_settings&expansions=author_id&user.fields=username'

## Requesting the Server
#### Now that I have the header and the query, expansions, and tweet fields have allowed me to create a URL, I can request the information I want from the server for #healthcareworker and #frontlineworker hashtags in the English language.

In [146]:
data_response_1 = requests.request("GET", url = final_url, headers = header)

#### This creates the raw text for the server request and the response that I'm receiving back
I'm not going to run this because it will clog up nbviewer just adding it to show the steps necessary.

In [None]:
data_response_1.text

#### This next variable is storing the raw text that I received in a json dictionary that is more clearly read into a pandas DataFrame.
For similar reasons as above I won't show the json dictionary because it will be lengthy and clog up the page for nbviewer optimization.

In [148]:
data_response_1_dict = json.loads(data_response_1.text)

## Getting the First Dataframe
#### Finally, I'm at the step where I can load the raw json data into a pandas DataFrame that will show the tweet fields in a more succinct and optimized way.
#### First I will need the keys of the json dictionary 'data_response_1_dict'

In [149]:
data_response_1_dict.keys()

dict_keys(['data', 'includes', 'meta'])

#### Next, I'll use the data key to create my first DataFrame for the healthcare and frontline worker related tweets.

In [150]:
health_tweets_df = pd.DataFrame(data_response_1_dict['data'])
### I won't show the DataFrame until I get to the end of the code so that I only have to show the 300 tweets at once
### instead of showing 100 each time and then 300 total as it will save space on nbviewer. I'll show the head of the
### DataFrame just to be sure that I'm on the right track.
health_tweets_df.head()

Unnamed: 0,reply_settings,author_id,id,created_at,public_metrics,lang,text
0,everyone,1325833659694407685,1450633234476326912,2021-10-20T01:21:29.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,#Mandates #BREAKING #news #Frontline workers n...
1,everyone,1325833659694407685,1450632511365734401,2021-10-20T01:18:36.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,#Mandates #BREAKING #news #Frontline workers n...
2,everyone,18393144,1450629923022987264,2021-10-20T01:08:19.000Z,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",en,RT @FSantuccisays: #Mandates #BREAKING #news #...
3,everyone,4596301403,1450628663783956483,2021-10-20T01:03:19.000Z,"{'retweet_count': 3, 'reply_count': 0, 'like_c...",en,"RT @reliefweb: .@MSF's report, “Adding salt to..."
4,everyone,34911505,1450625935062020097,2021-10-20T00:52:28.000Z,"{'retweet_count': 3, 'reply_count': 0, 'like_c...",en,"RT @reliefweb: .@MSF's report, “Adding salt to..."


#### I want to show the username and Twitter handle of each tweet so I'll need to utilize a new variable called 'user_info' to add these columns to my DataFrame.
#### My first step is to use the key 'includes' and then I'll take two of the columns from the new DataFrame I'll create and add them to my current DataFrame.

In [None]:
data_response_1_dict['includes']
### Again I won't run this cell just for nbviewer optimization to not clog up the page.

#### I'll add the two DataFrames together and show the head of the DataFrame just to be sure that I'm doing the correct process.

In [152]:
user_info = pd.DataFrame(data_response_1_dict['includes']['users'])
user_info.head()

Unnamed: 0,id,name,username
0,1325833659694407685,familiadeSantuccisays,FSantuccisays
1,18393144,TxTiny,TxTiny
2,4596301403,Mariana T.,marianatrobosky
3,34911505,PCDN,pcdnetwork
4,81147404,LillyGrillzit,LillyGrillzit


#### Here's the name and username added to the original DataFrame.

In [153]:
health_tweets_df['name'] = user_info['name']
health_tweets_df['username'] = user_info['username']
health_tweets_df.head()
### I'm using the head() method once again just to show that I'm doing the process correctly.

Unnamed: 0,reply_settings,author_id,id,created_at,public_metrics,lang,text,name,username
0,everyone,1325833659694407685,1450633234476326912,2021-10-20T01:21:29.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,#Mandates #BREAKING #news #Frontline workers n...,familiadeSantuccisays,FSantuccisays
1,everyone,1325833659694407685,1450632511365734401,2021-10-20T01:18:36.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,#Mandates #BREAKING #news #Frontline workers n...,TxTiny,TxTiny
2,everyone,18393144,1450629923022987264,2021-10-20T01:08:19.000Z,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",en,RT @FSantuccisays: #Mandates #BREAKING #news #...,Mariana T.,marianatrobosky
3,everyone,4596301403,1450628663783956483,2021-10-20T01:03:19.000Z,"{'retweet_count': 3, 'reply_count': 0, 'like_c...",en,"RT @reliefweb: .@MSF's report, “Adding salt to...",PCDN,pcdnetwork
4,everyone,34911505,1450625935062020097,2021-10-20T00:52:28.000Z,"{'retweet_count': 3, 'reply_count': 0, 'like_c...",en,"RT @reliefweb: .@MSF's report, “Adding salt to...",LillyGrillzit,LillyGrillzit


## Using the Next Token to get the next 100 tweets.
#### This next step utilizes the 'meta' key to give me the next token for my next url in order to streamline the process of receiving the next 100 tweets that I want to add to my request.

In [154]:
data_response_1_dict['meta']

{'newest_id': '1450633234476326912',
 'oldest_id': '1450072565976743938',
 'result_count': 100,
 'next_token': 'b26v89c19zqg8o3fpdv5s9yg8bvrbqfqtko0hnvamd1tp'}

#### Now I can use the next_token to create my second url for the next set of 100 tweets to get me to 200 total tweets.

In [155]:
second_final_url = final_url + '&next_token={}'.format(data_response_1_dict['meta']['next_token'])
second_final_url

'https://api.twitter.com/2/tweets/search/recent?query=%28%23frontlineworker%20OR%20%23healthcareworker%20OR%20%23frontline%29%20lang%3Aen&max_results=100&tweet.fields=public_metrics,created_at,author_id,lang,reply_settings&expansions=author_id&user.fields=username&next_token=b26v89c19zqg8o3fpdv5s9yg8bvrbqfqtko0hnvamd1tp'

#### Now I have my second url I can make the request to the server for my next set of 100 tweets.

In [156]:
data_response_2 = requests.request("GET", url = second_final_url, headers = header)

#### Now I'll repeat the process from above to get the text of data_response_2 turned into a dictionary and then a DataFrame. I'll get the name and username from the 'includes' key in the same way and add those to my second DataFrame before I append it to the first DataFrame.

In [157]:
data_response_2_dict = json.loads(data_response_2.text)

In [158]:
health_tweets_df_2 = pd.DataFrame(data_response_2_dict['data'])
health_tweets_df_2.head()
### Show the head to make sure I'm on the right track

Unnamed: 0,created_at,id,reply_settings,public_metrics,text,author_id,lang
0,2021-10-18T12:00:25.000Z,1450069252862615552,everyone,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",Thanks to Daniel Whelan Tiling &amp; Bathrooms...,1331543838410870784,en
1,2021-10-18T11:58:38.000Z,1450068805896527875,everyone,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",RT @StutterLoudly: Wore my @thisisourshotca sh...,267943190,en
2,2021-10-18T11:42:55.000Z,1450064850277830657,everyone,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",RT @JackieHoare3: Sleep problems are common po...,560185948,en
3,2021-10-18T11:24:43.000Z,1450060267765866501,everyone,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",I was Treated Successfully for COVID-19 by the...,2361438294,en
4,2021-10-18T11:14:39.000Z,1450057736285261825,everyone,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",RT @StutterLoudly: Wore my @thisisourshotca sh...,1381363018060161034,en


In [159]:
user_info_2 = pd.DataFrame(data_response_2_dict['includes']['users'])
user_info_2.head()
### Shows the head of the tweets to make sure I'm seeing different names and usernames from above

Unnamed: 0,id,name,username
0,1331543838410870784,ESDAAPP,esdaapp
1,267943190,AA&MDSIF,aamdsif
2,560185948,hali,halimadawood
3,2361438294,Don Feazelle,DFeazelle
4,1381363018060161034,This is Our Shot 🇨🇦 #ThisIsOurShotCA,thisisourshotca


In [166]:
health_tweets_df_2['name'] = user_info_2['name']
health_tweets_df_2['username'] = user_info_2['username']
health_tweets_df_2.head()
### I'm using the head() method just to show that I'm doing the process correctly.

Unnamed: 0,created_at,id,reply_settings,public_metrics,text,author_id,lang,name,username
0,2021-10-18T12:00:25.000Z,1450069252862615552,everyone,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",Thanks to Daniel Whelan Tiling &amp; Bathrooms...,1331543838410870784,en,ESDAAPP,esdaapp
1,2021-10-18T11:58:38.000Z,1450068805896527875,everyone,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",RT @StutterLoudly: Wore my @thisisourshotca sh...,267943190,en,AA&MDSIF,aamdsif
2,2021-10-18T11:42:55.000Z,1450064850277830657,everyone,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",RT @JackieHoare3: Sleep problems are common po...,560185948,en,hali,halimadawood
3,2021-10-18T11:24:43.000Z,1450060267765866501,everyone,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",I was Treated Successfully for COVID-19 by the...,2361438294,en,Don Feazelle,DFeazelle
4,2021-10-18T11:14:39.000Z,1450057736285261825,everyone,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",RT @StutterLoudly: Wore my @thisisourshotca sh...,1381363018060161034,en,This is Our Shot 🇨🇦 #ThisIsOurShotCA,thisisourshotca


## Appending the Second DataFrame
#### Now that I've got the correct DataFrame I can append the second set of 100 tweets 

In [175]:
second_df = health_tweets_df.append(health_tweets_df_2, ignore_index = True)
second_df.tail()
### I'm using tail to show that there's now 200 tweets in the DataFrame.

Unnamed: 0,reply_settings,author_id,id,created_at,public_metrics,lang,text,name,username
195,everyone,707975819667312640,1449311139884683265,2021-10-16T09:47:57.000Z,"{'retweet_count': 11, 'reply_count': 0, 'like_...",en,RT @WHOPhilippines: To our health care workers...,,
196,everyone,1207023905011093504,1449309871560605698,2021-10-16T09:42:54.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,#Wow*\n\nWith #AbsentLeadership* &amp;\n#Hidin...,,
197,everyone,1341201275694804995,1449281195053916165,2021-10-16T07:48:57.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,"Good morning, make sure you make it count.💪🏾😘\...",,
198,everyone,1341201275694804995,1449280329307631621,2021-10-16T07:45:31.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,"Good morning, make sure you make it count.💪🏾😘\...",,
199,everyone,1052522106720673793,1449278978565496832,2021-10-16T07:40:09.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,Compassion Conference \nInternational speakers...,,


## Finishing the DataFrame
#### Now I'll repeat the process from the second DataFrame to get my last set of 100 tweets to get up to 300 tweets.

In [176]:
data_response_2_dict['meta']

{'newest_id': '1450069252862615552',
 'oldest_id': '1449278978565496832',
 'result_count': 100,
 'next_token': 'b26v89c19zqg8o3fpds9swo6zejy31a63nfzda944qrnh'}

#### Here I'll set up the third url for my next request to the server.

In [177]:
third_final_url = final_url + '&next_token={}'.format(data_response_2_dict['meta']['next_token'])
third_final_url

'https://api.twitter.com/2/tweets/search/recent?query=%28%23frontlineworker%20OR%20%23healthcareworker%20OR%20%23frontline%29%20lang%3Aen&max_results=100&tweet.fields=public_metrics,created_at,author_id,lang,reply_settings&expansions=author_id&user.fields=username&next_token=b26v89c19zqg8o3fpds9swo6zejy31a63nfzda944qrnh'

#### Now I'll make the last request to the server for my final set of tweets.

In [178]:
data_response_3 = requests.request("GET", url = third_final_url, headers = header)

In [179]:
data_response_3_dict = json.loads(data_response_3.text)

In [180]:
health_tweets_df_3 = pd.DataFrame(data_response_3_dict['data'])
health_tweets_df_3.head()
### Show the head to make sure I'm on the right track

Unnamed: 0,text,created_at,author_id,lang,public_metrics,reply_settings,id
0,"Good morning, make sure you make it count.💪🏾😘\...",2021-10-16T07:37:54.000Z,1341201275694804995,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",everyone,1449278411311042562
1,"RT @Gillanjenner: Please be patient Londoners,...",2021-10-16T07:31:40.000Z,3618354350,en,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",everyone,1449276841907339264
2,RT @Pamela_Uba: Back to GMIT next Monday as a ...,2021-10-16T07:19:04.000Z,1158289945,en,"{'retweet_count': 5, 'reply_count': 0, 'like_c...",everyone,1449273670979948545
3,RT @Gillanjenner: The police already have 150 ...,2021-10-16T07:01:25.000Z,3618354350,en,"{'retweet_count': 2, 'reply_count': 0, 'like_c...",everyone,1449269230776500226
4,Wishing I was still in bed\n#keyworker\n#Covid...,2021-10-16T06:27:35.000Z,1214498427435925504,en,"{'retweet_count': 0, 'reply_count': 2, 'like_c...",everyone,1449260715022426113


In [181]:
user_info_3 = pd.DataFrame(data_response_3_dict['includes']['users'])
user_info_3.head()
### Shows the head of the tweets to make sure I'm seeing different names and usernames from above

Unnamed: 0,id,name,username
0,1341201275694804995,Lovely-Rowe’s Fitness,rowes_fitness
1,3618354350,LndOntRetweets,LndOntRetweets
2,1158289945,Marymcg,mcgrathmf
3,1214498427435925504,ⵣأروى 🇱🇾 🏴󠁧󠁢󠁥󠁮󠁧󠁿,lovemyhaters98
4,519943460,Scotty,ComBarca


In [182]:
health_tweets_df_3['name'] = user_info_3['name']
health_tweets_df_3['username'] = user_info_3['username']
health_tweets_df_3.head()
### I'm using the head() method just to show that I'm doing the process correctly.

Unnamed: 0,text,created_at,author_id,lang,public_metrics,reply_settings,id,name,username
0,"Good morning, make sure you make it count.💪🏾😘\...",2021-10-16T07:37:54.000Z,1341201275694804995,en,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",everyone,1449278411311042562,Lovely-Rowe’s Fitness,rowes_fitness
1,"RT @Gillanjenner: Please be patient Londoners,...",2021-10-16T07:31:40.000Z,3618354350,en,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",everyone,1449276841907339264,LndOntRetweets,LndOntRetweets
2,RT @Pamela_Uba: Back to GMIT next Monday as a ...,2021-10-16T07:19:04.000Z,1158289945,en,"{'retweet_count': 5, 'reply_count': 0, 'like_c...",everyone,1449273670979948545,Marymcg,mcgrathmf
3,RT @Gillanjenner: The police already have 150 ...,2021-10-16T07:01:25.000Z,3618354350,en,"{'retweet_count': 2, 'reply_count': 0, 'like_c...",everyone,1449269230776500226,ⵣأروى 🇱🇾 🏴󠁧󠁢󠁥󠁮󠁧󠁿,lovemyhaters98
4,Wishing I was still in bed\n#keyworker\n#Covid...,2021-10-16T06:27:35.000Z,1214498427435925504,en,"{'retweet_count': 0, 'reply_count': 2, 'like_c...",everyone,1449260715022426113,Scotty,ComBarca


## Appending the Third and Final DataFrame
#### Now that I've got the correct DataFrame I can append the third set of 100 tweets to complete the desired DataFrame of 300 tweets.

In [183]:
final_health_df = second_df.append(health_tweets_df_3, ignore_index = True)
final_health_df.tail()
### I'm using tail to show that there's now 300 tweets in the DataFrame.

Unnamed: 0,reply_settings,author_id,id,created_at,public_metrics,lang,text,name,username
295,everyone,539577630,1448884278666743819,2021-10-15T05:31:45.000Z,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",en,RT @UN_Pasifika: As #TeamFiji prepares to #Ope...,,
296,everyone,750947919273029632,1448883226903146501,2021-10-15T05:27:34.000Z,"{'retweet_count': 165, 'reply_count': 0, 'like...",en,RT @Apollo24x7: Our humble attempt at thanking...,,
297,everyone,226122086,1448880011117748243,2021-10-15T05:14:48.000Z,"{'retweet_count': 11, 'reply_count': 0, 'like_...",en,RT @WHOPhilippines: To our health care workers...,,
298,everyone,1008835429720715264,1448874310869741571,2021-10-15T04:52:09.000Z,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",en,RT @UN_Pasifika: As #TeamFiji prepares to #Ope...,,
299,everyone,1110823710,1448866712972038147,2021-10-15T04:21:57.000Z,"{'retweet_count': 11, 'reply_count': 0, 'like_...",en,RT @WHOPhilippines: To our health care workers...,,


## Creating a .CSV file for the DataFrame

In [184]:
final_health_df.to_csv('health_tweet_data.csv', header = True, sep = ',')

#### Showing that the directory contains the .csv file health_tweet_data.csv

In [187]:
ls

'likes' Report - Liam Merritt.ipynb
BLS Data Call.ipynb
Data in EMAT 9-21-21 Notes.ipynb
HipHopAPP_keys.txt
Notes 10-19-21 - Data Scraping.ipynb
Notes 9-23-2021 & 9-28-2021 & 9-30-2021 - Data In Emat.ipynb
Notes 9-30-2021 Data In EMAT.ipynb
Pandas Notes 9-14-21.ipynb
SQL Notes - 10-7-21.ipynb
Twitter API Notes.ipynb
Twitter Text Report - Liam Merritt.ipynb
Twitter_API_key.txt
bls_data.csv
chinook.db
chinook.zip
health_tweet_data.csv


## Final Report

#### Provide an overview that clearly states the driving question and links the question to the data collected, including a justification of the query you developed:

The driving question that brought me to develop my query was whether or not healthcare/front line workers were more positive or negative in their tweets and interactions on Twitter. I utilized the hashtags #frontline, #frontlineworker, and #healthcareworker in the English language to create a query that could shed some light on the question that I had. I can justify this query by showing that all of the tweets were related to healthcare in some way and the #frontline and #frontlineworker tweets were all directly related to Twitter users' experiences of working on the front line during the pandemic. I used the tweet fields that were specified in the assignment such as: Tweet ID, author name, author handle, Tweet text, engagements, and creation time. I also chose to use the extra tweet field that gave the reply settings because I wanted to see how open to communication the users were when discussing their attitude toward healthcare and front line work during the pandemic.

##### Describe the raw data structure and variables included in the data and document the data collection and any data wrangling needed to create a sensible DataFrame:

I think I described the raw data structure and variables pretty thoroughly throughout this notebook so just refer to the above notations that I've made during my data wrangling and collection. I also show snippets of the DataFrames above.

####  Include a clear and appropriate visualization:

I'll show the head and tail end of my final DataFrame here just to exemplify that I created a DataFrame with 300 tweets using the queries and tweet fields specified above.

In [190]:
final_health_df.head()

Unnamed: 0,reply_settings,author_id,id,created_at,public_metrics,lang,text,name,username
0,everyone,1325833659694407685,1450633234476326912,2021-10-20T01:21:29.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,#Mandates #BREAKING #news #Frontline workers n...,familiadeSantuccisays,FSantuccisays
1,everyone,1325833659694407685,1450632511365734401,2021-10-20T01:18:36.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",en,#Mandates #BREAKING #news #Frontline workers n...,TxTiny,TxTiny
2,everyone,18393144,1450629923022987264,2021-10-20T01:08:19.000Z,"{'retweet_count': 1, 'reply_count': 0, 'like_c...",en,RT @FSantuccisays: #Mandates #BREAKING #news #...,Mariana T.,marianatrobosky
3,everyone,4596301403,1450628663783956483,2021-10-20T01:03:19.000Z,"{'retweet_count': 3, 'reply_count': 0, 'like_c...",en,"RT @reliefweb: .@MSF's report, “Adding salt to...",PCDN,pcdnetwork
4,everyone,34911505,1450625935062020097,2021-10-20T00:52:28.000Z,"{'retweet_count': 3, 'reply_count': 0, 'like_c...",en,"RT @reliefweb: .@MSF's report, “Adding salt to...",LillyGrillzit,LillyGrillzit


In [191]:
final_health_df.tail()

Unnamed: 0,reply_settings,author_id,id,created_at,public_metrics,lang,text,name,username
295,everyone,539577630,1448884278666743819,2021-10-15T05:31:45.000Z,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",en,RT @UN_Pasifika: As #TeamFiji prepares to #Ope...,,
296,everyone,750947919273029632,1448883226903146501,2021-10-15T05:27:34.000Z,"{'retweet_count': 165, 'reply_count': 0, 'like...",en,RT @Apollo24x7: Our humble attempt at thanking...,,
297,everyone,226122086,1448880011117748243,2021-10-15T05:14:48.000Z,"{'retweet_count': 11, 'reply_count': 0, 'like_...",en,RT @WHOPhilippines: To our health care workers...,,
298,everyone,1008835429720715264,1448874310869741571,2021-10-15T04:52:09.000Z,"{'retweet_count': 4, 'reply_count': 0, 'like_c...",en,RT @UN_Pasifika: As #TeamFiji prepares to #Ope...,,
299,everyone,1110823710,1448866712972038147,2021-10-15T04:21:57.000Z,"{'retweet_count': 11, 'reply_count': 0, 'like_...",en,RT @WHOPhilippines: To our health care workers...,,


#### Assess the quality of the data collected:

The quality of the data is definitely useful for determining the attitude towards healthcare and front line work during the pandemic because it allows insights directly into the thoughts and machinations of people that work in these fields through their outward expression in their tweets. 

#### Weaknesses & Limitations of the data:

I'll mention a few weaknesses and limitations of the data. First, the data could be skewed or the person in the data could be lying in their tweet text which would limit the value of the data as a valid source of information. Second, the data could be biased because of political ideology or the server could have read a value incorrectly when the request was sent or when the tweet was originally posted. Third, the server did not send back all the names and usernames for each post so that's definitely a limitation for identifying the user that posted the tweet to determine further metrics about that user which might skew the analysis of the data. Fourth, the data is found by searching through recent hashtags, in which case the hashtag used could be totally irrelevant to the question I'm asking and therefore will skew the analysis towards outliers. Finally, the data doesn't take into account the whole of the pandemic as all of the tweets are from the last few days and the pandemic has definitely subsided slightly from how serious it was at the beginning.

#### Alternative approaches and potential next steps:

One alternative approach could have been to use different hashtags or more specific hashtags when forming the query URL. The query could have included multiple hashtags used in the same tweet as well as the singular hashtags in order to gain a myriad of more specified results. Another alternative approach for non-English speakers could be to look for tweets in another language such as Spanish and using Spanish hashtags related to the healthcare and frontline workers in Spanish speaking countries to give a more worldly view of the pandemic. A potential next step could be to find out how to answer my question by analyzing the DataFrame I created further. Another next step and alternative approach wrapped up in one could be to learn how to create a function that does the work of creating the original DataFrame, using the next_token for the second two DataFrames, and appending those DataFrames together to create my final DataFrame. I think the major next step that needs to be taken however is learning how to analyze the Twitter data and how to draw conclusions from the analysis of that data.