### Purpose of this file

In this file, I will hydrate tweets from this resource: https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset

This resource contains a subset of tweets, scraped every day since March 20th, that have geolocation data. 

We can use code from "get_location_from_geocoordinates.py" in order to see how to use lat/long info to get a person's location. 

For this first pass, we'll use the following dates:

1. (NOTE: not using this date, since for this dataset we don't have data from this date) March 9th: Governor DeSantis declares a State of Emergency
2. April 17th: DeSantis issues a statewide stay-at-home order following growing pressure to do so
3. May 18th: DeSantis says that Florida will begin full phase one of reopening, allowing gyms and restaurants to operate at 50% capacity, starting May 18.
4. June 5th: DeSantis announces that Florida could move into Phase 2 except south Florida, specifically Miami-Dade, Broward, and Palm Beach, which need to submit plans for reopening. Phase 2 in Florida begins, with bars allowed to open at 50% capacity with social distancing and sanitation.
5. July 2nd: Florida reports 10,000 new coronavirus cases in a single day, the biggest one-day increase in the state since the pandemic started, and more than any European country had at the height of their outbreaks.
6. September 25th: Governor Ron DeSantis fully opened the state of Florida by executive order on Friday. The order also prohibits local governments from imposing fines or shutting down businesses, or enforcing mask mandates
7. October 17th: Florida reported its highest COVID19 numbers in two onths. The seven-day average was more than 3,300 cases. Reporting anomalies made it more difficult to gather statistical trends. Positivity rate was 5.2%, with over 2,000 hospitalizations. 
8. December 17th:  Florida reported 13,148 new cases, largest since July 16th

All these dates correspond with important COVID-related events in Florida. I chose Florida since it's had a large range of different COVID-related events (e.g., openings, closings, shutdowns, etc.), rather than some other states that, say, had an initial lockdown and stayed in lockdown. 



In [155]:
import numpy as np
import pandas as pd
import os
import json
import datetime as datetime
import re
import nltk 
from nltk.corpus import stopwords
import emoji

pd.set_option('display.max_columns', None) # show all columns

### 1. Load tweets

Due to sharing restrictions, the public dataset doesn't have the actual tweets themselves. Rather, it has the tweet IDs. Therefore, we can "hydrate" the tweet IDs to recover the actual tweets

(Also, accessing the tweets requires an IEEE account, so the link might not work in the future? Accessing the tweets is easy with the website link above, however). 

In [9]:
# collect tweets from April 16th to April 17th
april16_17 = pd.read_csv("https://ieee-dataport.s3.amazonaws.com/open/14206/april16_april17.csv?response-content-disposition=attachment%3B%20filename%3D%22april16_april17.csv%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJOHYI4KJCE6Q7MIQ%2F20201217%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20201217T223856Z&X-Amz-SignedHeaders=Host&X-Amz-Expires=86400&X-Amz-Signature=c4fbf41e249dc1fb9f5f7be962a2ca05d5210d0a9a81293820f49c91efed3826", 
                         names = ["tweet_id", "sentiment_score"])

# collect tweets from April 17th to April 18th
april17_18 = pd.read_csv("https://ieee-dataport.s3.amazonaws.com/open/14206/april17_april18.csv?response-content-disposition=attachment%3B%20filename%3D%22april17_april18.csv%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJOHYI4KJCE6Q7MIQ%2F20201217%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20201217T223856Z&X-Amz-SignedHeaders=Host&X-Amz-Expires=86400&X-Amz-Signature=db65792c9a01221fd2e70f90dfa5dcbc2b5fd9649311e7ee6b11e810c69c0c60", 
                          names = ["tweet_id", "sentiment_score"])


In [10]:
april16_17.head()

Unnamed: 0,tweet_id,sentiment_score
0,1250641596887990272,0.125
1,1250646705516707840,0.0
2,1250647034253709315,0.0
3,1250655078744240134,0.170455
4,1250655491904147456,0.0


In [11]:
april17 = pd.concat([april16_17, april17_18])

In [12]:
april17

Unnamed: 0,tweet_id,sentiment_score
0,1250641596887990272,0.125000
1,1250646705516707840,0.000000
2,1250647034253709315,0.000000
3,1250655078744240134,0.170455
4,1250655491904147456,0.000000
...,...,...
368,1251359682712616961,0.000000
369,1251360432079634434,0.000000
370,1251364103618248705,-0.050000
371,1251366441015803912,-0.004444


In [15]:
april17.drop_duplicates(inplace=True)

We then can export these tweet IDs in a .csv file, and then we can use twarc, a command line Python tool, to get the tweets that we need. 

In [18]:
tweet_ids = list(april17["tweet_id"])

In [25]:
TWEET_ID_DIR = "../../data/tweets/tweet_ids/"

In [26]:
with open(TWEET_ID_DIR + "april17_tweets.csv", 'a+') as f: # a+ lets us both append and write
    for idx, tweet in enumerate(tweet_ids):
        if idx != len(tweet_ids) - 1:
            f.write(f"{tweet},\n")
        else:
            f.write(f"{tweet}")
    

Now, using these tweet IDs, let's hydrate them to recover the original tweets

First, you have to confirm your credentials. 

`twarc configure`

Then, submit the creds. After doing so successfully, you should get a message like this: 

`The credentials for default have been saved to your configuration file at /Users/mark/.twarc`

Afterwards, you can start hydrating the tweets. 

This can be done in the command line

You'd run something like this:

`twarc hydrate ids.txt > tweets.jsonl`

In my case, running the command from the root directory of this project, it looks something like this:

`twarc hydrate data/tweets/tweet_ids/april17_tweets.csv > data/tweets/hydrated_tweets/april17_tweets.jsonl`

In [37]:
HYDRATED_TWEETS_DIR = "../../data/tweets/hydrated_tweets/"

Now, let's convert the .jsonl file into a .json

In [34]:
os.listdir(HYDRATED_TWEETS_DIR)

['april17_tweets.jsonl']

In [39]:
jsonObj = pd.read_json(path_or_buf=HYDRATED_TWEETS_DIR + "april17_tweets.jsonl", lines=True)

In [49]:
jsonObj["place"][0]

{'id': '495a55057ac886b9',
 'url': 'https://api.twitter.com/1.1/geo/id/495a55057ac886b9.json',
 'place_type': 'city',
 'name': 'Montpelier',
 'full_name': 'Montpelier, VT',
 'country_code': 'US',
 'country': 'United States',
 'contained_within': [],
 'bounding_box': {'type': 'Polygon',
  'coordinates': [[[-72.6255728, 44.2348719],
    [-72.544556, 44.2348719],
    [-72.544556, 44.3127017],
    [-72.6255728, 44.3127017]]]},
 'attributes': {}}

In [48]:
jsonObj.head()

Unnamed: 0,created_at,id,id_str,full_text,truncated,display_text_range,entities,source,in_reply_to_status_id,in_reply_to_status_id_str,in_reply_to_user_id,in_reply_to_user_id_str,in_reply_to_screen_name,user,geo,coordinates,place,contributors,is_quote_status,retweet_count,favorite_count,favorited,retweeted,possibly_sensitive,lang,quoted_status_id,quoted_status_id_str,quoted_status_permalink,quoted_status,extended_entities
0,2020-04-16 04:26:07+00:00,1250641596887990272,1250641596887990272,Finally got to a color I love and a length I’m...,False,"[0, 240]","{'hashtags': [{'text': 'quarantine', 'indices'...","<a href=""http://instagram.com"" rel=""nofollow"">...",,,,,,"{'id': 32545952, 'id_str': '32545952', 'name':...","{'type': 'Point', 'coordinates': [44.15253213,...","{'type': 'Point', 'coordinates': [-72.5981195,...","{'id': '495a55057ac886b9', 'url': 'https://api...",,False,0,0,False,False,0.0,en,,,,,
1,2020-04-16 04:46:25+00:00,1250646705516707840,1250646705516707840,#wutang #wutangforever #corona @ Downtown Los ...,False,"[0, 77]","{'hashtags': [{'text': 'wutang', 'indices': [0...","<a href=""http://instagram.com"" rel=""nofollow"">...",,,,,,"{'id': 15293534, 'id_str': '15293534', 'name':...","{'type': 'Point', 'coordinates': [34.03742524,...","{'type': 'Point', 'coordinates': [-118.2487404...","{'id': '3b77caf94bfc81fe', 'url': 'https://api...",,False,0,0,False,False,0.0,en,,,,,
2,2020-04-16 04:47:43+00:00,1250647034253709315,1250647034253709312,"Swirling again @ Corona, California https://t....",False,"[0, 59]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""http://instagram.com"" rel=""nofollow"">...",,,,,,"{'id': 3283791091, 'id_str': '3283791091', 'na...","{'type': 'Point', 'coordinates': [33.8753, -11...","{'type': 'Point', 'coordinates': [-117.566, 33...","{'id': '5e4b6834e36e68fa', 'url': 'https://api...",,False,0,0,False,False,0.0,en,,,,,
3,2020-04-16 05:19:41+00:00,1250655078744240134,1250655078744240128,Does it feel like you want to #Crawl on the wa...,False,"[0, 240]","{'hashtags': [{'text': 'Crawl', 'indices': [30...","<a href=""http://instagram.com"" rel=""nofollow"">...",,,,,,"{'id': 392875184, 'id_str': '392875184', 'name...","{'type': 'Point', 'coordinates': [59.3307, 18....","{'type': 'Point', 'coordinates': [18.0605, 59....","{'id': 'd56c5babcffde8ef', 'url': 'https://api...",,False,0,0,False,False,0.0,en,,,,,
4,2020-04-16 05:21:19+00:00,1250655491904147456,1250655491904147456,Get your stanky booty to Cave Creek &amp; Care...,False,"[0, 184]","{'hashtags': [{'text': 'walmart', 'indices': [...","<a href=""http://instagram.com"" rel=""nofollow"">...",,,,,,"{'id': 524860845, 'id_str': '524860845', 'name...","{'type': 'Point', 'coordinates': [33.8304, -11...","{'type': 'Point', 'coordinates': [-111.964, 33...","{'id': '005e9bd60c4f1337', 'url': 'https://api...",,False,0,1,False,False,0.0,en,,,,,


### 2. Start processing tweets, getting the info that we care about

We likely only care about the following columns:

    • created_at
    • id
    • full_text
    • geo
    • coordinates
    • place (this has the city + state location, as a field called "full_name")
    • retweet_count
    • favorite_count
    
We also want to parse the "created_at" column (we can perhaps create 2 columns, one with the date and one with the hour)

In [50]:
df = jsonObj[["created_at", "id", "full_text", "geo", "coordinates", "place", "retweet_count", "favorite_count"]]

In [53]:
df["place"][0]["full_name"]

'Montpelier, VT'

In [59]:
df["place"][1]["full_name"].split(",")[1].strip()

'CA'

In [56]:
df["place"][1]

{'id': '3b77caf94bfc81fe',
 'url': 'https://api.twitter.com/1.1/geo/id/3b77caf94bfc81fe.json',
 'place_type': 'city',
 'name': 'Los Angeles',
 'full_name': 'Los Angeles, CA',
 'country_code': 'US',
 'country': 'United States',
 'contained_within': [],
 'bounding_box': {'type': 'Polygon',
  'coordinates': [[[-118.668404, 33.704538],
    [-118.155409, 33.704538],
    [-118.155409, 34.337041],
    [-118.668404, 34.337041]]]},
 'attributes': {}}

Now, for each tweet, let's get the states that they're in. We have a `place` column that has a dictionary with place information. For the tweets in the USA, we can get state-level information. 

In [77]:
def get_state_from_location(place):
    """
    Gets state info from place field
    Assumes dict input
    """
    
    if place is None:
        state = "NA"  
    elif place["country_code"] != "US":
        state = "NA"
    else:
        state = place["full_name"].split(",")[1].strip() # e.g., "Los Angeles, CA" --> "CA"
        
    return state
        

In [83]:
states = []

In [66]:
for idx, location_dict in enumerate(df["place"]):
    if idx > 2:
        break
    print(location_dict)
    print("-==============")

{'id': '495a55057ac886b9', 'url': 'https://api.twitter.com/1.1/geo/id/495a55057ac886b9.json', 'place_type': 'city', 'name': 'Montpelier', 'full_name': 'Montpelier, VT', 'country_code': 'US', 'country': 'United States', 'contained_within': [], 'bounding_box': {'type': 'Polygon', 'coordinates': [[[-72.6255728, 44.2348719], [-72.544556, 44.2348719], [-72.544556, 44.3127017], [-72.6255728, 44.3127017]]]}, 'attributes': {}}
{'id': '3b77caf94bfc81fe', 'url': 'https://api.twitter.com/1.1/geo/id/3b77caf94bfc81fe.json', 'place_type': 'city', 'name': 'Los Angeles', 'full_name': 'Los Angeles, CA', 'country_code': 'US', 'country': 'United States', 'contained_within': [], 'bounding_box': {'type': 'Polygon', 'coordinates': [[[-118.668404, 33.704538], [-118.155409, 33.704538], [-118.155409, 34.337041], [-118.668404, 34.337041]]]}, 'attributes': {}}
{'id': '5e4b6834e36e68fa', 'url': 'https://api.twitter.com/1.1/geo/id/5e4b6834e36e68fa.json', 'place_type': 'city', 'name': 'Corona', 'full_name': 'Corona

In [84]:
for location_dict in df["place"]:
    try:
        states.append(get_state_from_location(location_dict))
    except Exception as e:
        print(location_dict)
        print(e)

In [85]:
df.shape

(798, 8)

In [86]:
len(states)

798

In [87]:
df["US_state"] = states

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [89]:
df["US_state"].value_counts()

NA     486
CA      85
NY      47
TX      24
USA     23
FL      20
IL      10
GA      10
NJ       7
AZ       6
PA       6
OH       6
IN       5
MD       5
SC       5
TN       5
DC       4
LA       3
NV       3
AL       3
WA       3
CO       3
VT       3
NC       3
IA       2
MA       2
UT       2
WI       2
MO       2
CT       2
OR       1
KY       1
MS       1
VA       1
ID       1
HI       1
MI       1
MN       1
NE       1
RI       1
OK       1
Name: US_state, dtype: int64

Now that we have locations, let's also get the dates of the tweets

In [90]:
df.head()

Unnamed: 0,created_at,id,full_text,geo,coordinates,place,retweet_count,favorite_count,US_state
0,2020-04-16 04:26:07+00:00,1250641596887990272,Finally got to a color I love and a length I’m...,"{'type': 'Point', 'coordinates': [44.15253213,...","{'type': 'Point', 'coordinates': [-72.5981195,...","{'id': '495a55057ac886b9', 'url': 'https://api...",0,0,VT
1,2020-04-16 04:46:25+00:00,1250646705516707840,#wutang #wutangforever #corona @ Downtown Los ...,"{'type': 'Point', 'coordinates': [34.03742524,...","{'type': 'Point', 'coordinates': [-118.2487404...","{'id': '3b77caf94bfc81fe', 'url': 'https://api...",0,0,CA
2,2020-04-16 04:47:43+00:00,1250647034253709315,"Swirling again @ Corona, California https://t....","{'type': 'Point', 'coordinates': [33.8753, -11...","{'type': 'Point', 'coordinates': [-117.566, 33...","{'id': '5e4b6834e36e68fa', 'url': 'https://api...",0,0,CA
3,2020-04-16 05:19:41+00:00,1250655078744240134,Does it feel like you want to #Crawl on the wa...,"{'type': 'Point', 'coordinates': [59.3307, 18....","{'type': 'Point', 'coordinates': [18.0605, 59....","{'id': 'd56c5babcffde8ef', 'url': 'https://api...",0,0,
4,2020-04-16 05:21:19+00:00,1250655491904147456,Get your stanky booty to Cave Creek &amp; Care...,"{'type': 'Point', 'coordinates': [33.8304, -11...","{'type': 'Point', 'coordinates': [-111.964, 33...","{'id': '005e9bd60c4f1337', 'url': 'https://api...",0,1,AZ


In [117]:
dates = []
months = []
days = []
hours = []

In [118]:
# format = "2020-04-16"

for timestamp in df["created_at"]:
    hour = pd.to_datetime(timestamp).hour
    dt_obj = pd.to_datetime(timestamp).date()
    year = dt_obj.year
    month = dt_obj.month
    day = dt_obj.day
    
    hours.append(hour)
    months.append(month)
    days.append(day)
    
    if month < 10:
        month = f"0{month}"
    
    dates.append(f"{year}-{month}-{day}")

In [119]:
df["date_of_tweet"] = dates
df["month_of_tweet"] = months
df["day_of_tweet"] = days
df["hour_of_tweet"] = hours

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using

Let's parse the full text and check for counts of certain words as well. 

Let's do the following:

Cleaning steps:

1. Remove punctuation
2. Do string split
3. Remove links

Processing:
1. Make all the words lowercase
2. Remove stopwords
3. Stem/lemmatize (maybe?)

Then, for analysis,

1. Create a new column for all the hashtags (and add all the hashtags, per tweet)
2. Do word counts of specific words
3. Do LDA/topic modeling

In [134]:
PUNCTUATION ='''!()-[]{};:'"\,<>./?@$%^&*_~''' # keep hashtags
STOPWORDS = stopwords.words("english")

In [140]:
def remove_emoji(string):
    text = text.encode("utf-8")
    allchars = [str for str in text.decode('utf-8')]
    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
    clean_text = ' '.join([str for str in text.decode('utf-8').split() if not any(i in str for i in emoji_list)])
    return clean_text

In [230]:
def clean_text(text):
    """
        Removes punctuation, does string split, and removes links
    """
    
    return_arr = []
    
    # remove punctuation
    text_no_punctuation = ""
    
    for char in text:
        if char not in PUNCTUATION:
            text_no_punctuation = text_no_punctuation + char
            
    # remove emojis
    text_no_punctuation = remove_emoji(text_no_punctuation)
    text_no_punctuation = re.sub(r'\\U[a-zA-Z0-9]{8}', '', text_no_punctuation)
    
    # remove \n and \t
    text_no_punctuation = re.sub(r'\n', '', text_no_punctuation)
    text_no_punctuation = re.sub(r'\t', '', text_no_punctuation)
    
    # remove escape sequences
    text_no_escape = ""
    
    for char in text_no_punctuation:
        try:
            char.encode('ascii')
            text_no_escape = text_no_escape + char # this'll catch chars that don't have an ascii equivalent (e.g., emojis)
        except:
            pass
    
    # add space between # and another char before it (e.g., split yes#baseball into yes #baseball)
    text_no_escape = re.sub(r"([a-zA-Z0-9]){1}#", r"\1 #", text_no_escape)
    
    # other preprocessing
    text_arr = text_no_escape.split(' ')
    
    for word in text_arr:
        
        # clean words
        word = word.lower()
        
        if "http" not in word and word.strip() != '' and word not in STOPWORDS:
            return_arr.append(word)
            
    return return_arr


In [231]:
df["cleaned_text"] = df["full_text"].apply(clean_text)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Now, let's deal with hashtags. Let's create a new column that contains all the hashtags, a column that counts how many hashtags there are, and a third column that has the text array (tokenized text) without hashtags

In [239]:
hashtags_arr = []
num_hashtags_arr = []
text_no_hashtags_arr = []

In [240]:
for tokenized_text in df["cleaned_text"]:
    hashtag_lst = []
    text_no_hashtags_lst = []
    
    for word in tokenized_text:
        if '#' in word:
            hashtag_lst.append(word)
        else:
            text_no_hashtags_lst.append(word)
    
    hashtags_arr.append(hashtag_lst)
    num_hashtags_arr.append(len(hashtag_lst))
    text_no_hashtags_arr.append(text_no_hashtags_lst)

In [241]:
df["hashtags"] = hashtags_arr
df["hashtags_count"] = num_hashtags_arr
df["cleaned_text_no_hashtags"] = text_no_hashtags_arr

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [243]:
df[["cleaned_text", "hashtags", "hashtags_count", "cleaned_text_no_hashtags"]]

Unnamed: 0,cleaned_text,hashtags,hashtags_count,cleaned_text_no_hashtags
0,"[finally, got, color, love, length, im, okay, ...","[#quarantine, #covid, #corona, #haircolor, #ha...",11,"[finally, got, color, love, length, im, okay, ..."
1,"[#wutang, #wutangforever, #corona, downtown, l...","[#wutang, #wutangforever, #corona]",3,"[downtown, los, angeles]"
2,"[swirling, corona, california]",[],0,"[swirling, corona, california]"
3,"[feel, like, want, #crawl, walls, get, heres, ...","[#crawl, #quarantine, #staygolden, #staysafe, ...",9,"[feel, like, want, walls, get, heres, dont, dr..."
4,"[get, stanky, booty, cave, creek, amp, carefre...","[#walmart, #cavecreek, #toiletpaper, #quaranti...",6,"[get, stanky, booty, cave, creek, amp, carefre..."
...,...,...,...,...
793,"[whats, say, cap, #swipeleft, #swipeleft, #run...","[#swipeleft, #swipeleft, #runnersofinstagram, ...",14,"[whats, say, cap]"
794,"[#musicphillpromotions, #never, leave, home, w...","[#musicphillpromotions, #never, #mask, #protec...",8,"[leave, home, without, virus, jamaica, kingsto..."
795,"[home, workout, hamstringy, like, gym, still, ...",[],0,"[home, workout, hamstringy, like, gym, still, ..."
796,"[thank, much, #lukebryan, playing, dock, coron...","[#lukebryan, #countrymusic, #country, #music, ...",13,"[thank, much, playing, dock, corona, virus]"


Now, let's get the columns that we'll actually use:

In [248]:
df_small = df[["id", "full_text", "retweet_count", "favorite_count", 
               "US_state", "date_of_tweet", "month_of_tweet", "day_of_tweet", 
               "hour_of_tweet", "cleaned_text", "hashtags", "hashtags_count", "cleaned_text_no_hashtags"]]

In [249]:
df_small.head()

Unnamed: 0,id,full_text,retweet_count,favorite_count,US_state,date_of_tweet,month_of_tweet,day_of_tweet,hour_of_tweet,cleaned_text,hashtags,hashtags_count,cleaned_text_no_hashtags
0,1250641596887990272,Finally got to a color I love and a length I’m...,0,0,VT,2020-04-16,4,16,4,"[finally, got, color, love, length, im, okay, ...","[#quarantine, #covid, #corona, #haircolor, #ha...",11,"[finally, got, color, love, length, im, okay, ..."
1,1250646705516707840,#wutang #wutangforever #corona @ Downtown Los ...,0,0,CA,2020-04-16,4,16,4,"[#wutang, #wutangforever, #corona, downtown, l...","[#wutang, #wutangforever, #corona]",3,"[downtown, los, angeles]"
2,1250647034253709315,"Swirling again @ Corona, California https://t....",0,0,CA,2020-04-16,4,16,4,"[swirling, corona, california]",[],0,"[swirling, corona, california]"
3,1250655078744240134,Does it feel like you want to #Crawl on the wa...,0,0,,2020-04-16,4,16,5,"[feel, like, want, #crawl, walls, get, heres, ...","[#crawl, #quarantine, #staygolden, #staysafe, ...",9,"[feel, like, want, walls, get, heres, dont, dr..."
4,1250655491904147456,Get your stanky booty to Cave Creek &amp; Care...,0,1,AZ,2020-04-16,4,16,5,"[get, stanky, booty, cave, creek, amp, carefre...","[#walmart, #cavecreek, #toiletpaper, #quaranti...",6,"[get, stanky, booty, cave, creek, amp, carefre..."


Now that we have these steps finalized, let's do this for the other tweets in the dates of interest.

### 3. Perform same steps as above, for other tweets

### 3. Export tweets, with locations