In [None]:
#this code was used to scrape historic tweets using getoldtweets3, we had to go back and use tweepy after we found
#that we couldn't identify coordinates given an intersection involving a highway

## Imports ##

In [4]:
import pandas as pd
import GetOldTweets3 as got

## Function that users input search criteria in ##

Search Criteria
- query - specific string to search for (ex: Woosley Fire)
- coordinate - exact latitude and longitude (ex: 34.0259, -118.7798) - we'll need another function to convert the user's input to this
- distance - distance in miles to search for (ex: 40)
- datefrom - date to start search from (ex: 2018-11-12)
- dateto - date to search to (non-inclusive)(ex: type 2018-11-20 for all tweets ending before 11/20/18)

In [86]:
def TweetSearch(query,coordinate_string,distance,datefrom,dateto):
    full_text = []
    author = []
    creation_time = []
    hashtags = []
    
    distance = str(distance)+'mi'
    
    tweetCriteria = got.manager.TweetCriteria()\
        .setQuerySearch(query)\
        .setNear(coordinate_string)\
        .setWithin(distance)\
        .setSince(datefrom)\
        .setUntil(dateto)
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    for tweet in tweets:
        full_text.append(tweet.text)
        author.append(tweet.username)
        creation_time.append(tweet.formatted_date)
        hashtags.append(tweet.hashtags)
    
    return pd.DataFrame([full_text,author,creation_time,hashtags],\
                        index=['Full Text','Author','Creation Time','Hashtags']).T

In [107]:
# Example string to call this function
dftest = TweetSearch('Woosley Fire','34.0259, -118.7798',40,'2018-11-10','2018-11-13')

In [109]:
dftest.head()

Unnamed: 0,Full Text,Author,Creation Time,Hashtags
0,The sky and ocean reflect the fury of the Wool...,HomiHormasji,Mon Nov 12 22:27:08 +0000 2018,
1,THANK YOU to all the tireless uniformed person...,ladyfaceale,Mon Nov 12 22:20:23 +0000 2018,#veterans #woosleyfire
2,ANGELINO CROSSFIT - WOOLSEY FIRE DONATIONS. TO...,angelino_xfit,Mon Nov 12 22:02:20 +0000 2018,#angelinocrossfitfamily #donations #woolseyfir...
3,"Woolsey Fire from Santa Monica. Over 95,000 ac...",MrHerget,Mon Nov 12 21:16:49 +0000 2018,
4,Topanga canyon north and south all lanes close...,TotalTrafficLA,Mon Nov 12 20:14:31 +0000 2018,#MalibuPacificPalisades


In [89]:
dftest.shape

(118, 4)

## Using above function to get all Woosley Tweets ##

In [92]:
dfWoosleyAll = TweetSearch('Woosley Fire','34.0259, -118.7798',40,'2018-11-08','2018-12-01')

In [104]:
dfWoosleyAll.shape

(257, 4)

In [94]:
dfWoosley = TweetSearch('Woosley','34.0259, -118.7798',40,'2018-11-08','2018-12-01')

In [98]:
dfFire = TweetSearch('fire','34.0259, -118.7798',40,'2018-11-08','2018-12-01')

In [96]:
dfWoosley.shape

(214, 4)

In [99]:
dfFire.shape

(1909, 4)

In [105]:
dfWoosleyFull = pd.concat([dfWoosleyAll,dfWoosley,dfFire],ignore_index=True)

In [106]:
dfWoosleyFull.shape

(2380, 4)

In [111]:
dfWoosleyFull.drop_duplicates(subset='Full Text',inplace=True)

In [112]:
dfWoosleyFull.shape

(1964, 4)

In [113]:
dfWoosleyFull.to_csv('FullWoosleyTweets.csv',index=False)

## Notes / Reasoning ##

We explored using the official twitter API via the Tweepy library but ended up not using this for a number of reasons
1) The official API limits you to searching the past 7 days or so. Given our disaster that we're training our model on was the Woosley Fire, we had to use the GetOldTweets3 library to find and pull in those tweets
2) The official API doesn't appear to have a way to filter on location that we were able to find. Given our generic search criteria, we wanted to make sure that we were able to location filter our tweets

Note
1) we are inputting in a long/lat because bad strings will break the function