### Purpose of this file

In this file, I will hydrate tweets from this resource: https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset

This resource contains a subset of tweets, scraped every day since March 20th, that have geolocation data. 

We can use code from "get_location_from_geocoordinates.py" in order to see how to use lat/long info to get a person's location. 

For this first pass, we'll use the following dates:

1. (NOTE: not using this date, since for this dataset we don't have data from this date) March 9th: Governor DeSantis declares a State of Emergency
2. April 17th: DeSantis issues a statewide stay-at-home order following growing pressure to do so
3. May 18th: DeSantis says that Florida will begin full phase one of reopening, allowing gyms and restaurants to operate at 50% capacity, starting May 18.
4. June 5th: DeSantis announces that Florida could move into Phase 2 except south Florida, specifically Miami-Dade, Broward, and Palm Beach, which need to submit plans for reopening. Phase 2 in Florida begins, with bars allowed to open at 50% capacity with social distancing and sanitation.
5. July 2nd: Florida reports 10,000 new coronavirus cases in a single day, the biggest one-day increase in the state since the pandemic started, and more than any European country had at the height of their outbreaks.
6. September 25th: Governor Ron DeSantis fully opened the state of Florida by executive order on Friday. The order also prohibits local governments from imposing fines or shutting down businesses, or enforcing mask mandates
7. October 17th: Florida reported its highest COVID19 numbers in two onths. The seven-day average was more than 3,300 cases. Reporting anomalies made it more difficult to gather statistical trends. Positivity rate was 5.2%, with over 2,000 hospitalizations. 
8. December 17th:  Florida reported 13,148 new cases, largest since July 16th

All these dates correspond with important COVID-related events in Florida. I chose Florida since it's had a large range of different COVID-related events (e.g., openings, closings, shutdowns, etc.), rather than some other states that, say, had an initial lockdown and stayed in lockdown. 



In [17]:
import numpy as np
import pandas as pd
import os

### 1. Load tweets

Due to sharing restrictions, the public dataset doesn't have the actual tweets themselves. Rather, it has the tweet IDs. Therefore, we can "hydrate" the tweet IDs to recover the actual tweets

(Also, accessing the tweets requires an IEEE account, so the link might not work in the future? Accessing the tweets is easy with the website link above, however). 

In [9]:
# collect tweets from April 16th to April 17th
april16_17 = pd.read_csv("https://ieee-dataport.s3.amazonaws.com/open/14206/april16_april17.csv?response-content-disposition=attachment%3B%20filename%3D%22april16_april17.csv%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJOHYI4KJCE6Q7MIQ%2F20201217%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20201217T223856Z&X-Amz-SignedHeaders=Host&X-Amz-Expires=86400&X-Amz-Signature=c4fbf41e249dc1fb9f5f7be962a2ca05d5210d0a9a81293820f49c91efed3826", 
                         names = ["tweet_id", "sentiment_score"])

# collect tweets from April 17th to April 18th
april17_18 = pd.read_csv("https://ieee-dataport.s3.amazonaws.com/open/14206/april17_april18.csv?response-content-disposition=attachment%3B%20filename%3D%22april17_april18.csv%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJOHYI4KJCE6Q7MIQ%2F20201217%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20201217T223856Z&X-Amz-SignedHeaders=Host&X-Amz-Expires=86400&X-Amz-Signature=db65792c9a01221fd2e70f90dfa5dcbc2b5fd9649311e7ee6b11e810c69c0c60", 
                          names = ["tweet_id", "sentiment_score"])


In [10]:
april16_17.head()

Unnamed: 0,tweet_id,sentiment_score
0,1250641596887990272,0.125
1,1250646705516707840,0.0
2,1250647034253709315,0.0
3,1250655078744240134,0.170455
4,1250655491904147456,0.0


In [11]:
april17 = pd.concat([april16_17, april17_18])

In [12]:
april17

Unnamed: 0,tweet_id,sentiment_score
0,1250641596887990272,0.125000
1,1250646705516707840,0.000000
2,1250647034253709315,0.000000
3,1250655078744240134,0.170455
4,1250655491904147456,0.000000
...,...,...
368,1251359682712616961,0.000000
369,1251360432079634434,0.000000
370,1251364103618248705,-0.050000
371,1251366441015803912,-0.004444


In [15]:
april17.drop_duplicates(inplace=True)

We then can export these tweet IDs in a .csv file, and then we can use twarc, a command line Python tool, to get the tweets that we need. 

In [18]:
tweet_ids = list(april17["tweet_id"])

In [25]:
TWEET_ID_DIR = "../../data/tweets/tweet_ids/"

In [26]:
with open(TWEET_ID_DIR + "april17_tweets.csv", 'a+') as f: # a+ lets us both append and write
    for idx, tweet in enumerate(tweet_ids):
        if idx != len(tweet_ids) - 1:
            f.write(f"{tweet},\n")
        else:
            f.write(f"{tweet}")
    

Now, using these tweet IDs, let's hydrate them to recover the original tweets

[]

### 2. Get locations from tweets

### 3. Export tweets, with locations