# Project 5: Leveraging Social Media to Map Disasters
## Data Collection

In [1]:
# Import libraries
from twitterscraper import query_tweets
import pandas as pd
import datetime as dt

INFO: {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; de-de) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27'}


Each member of the team was responsible for gathering tweets on different natural disasters. The natural disasters that are included in the data collection was: 
- Hurricane Harvey 
- Major floods during July 2019
- Montecito mudslides of January 2018
- Noreaster of March 2018
- Tornado outbreak during April 2019

Each of these incidents represents a different type of natural disaster. This wide range of disasters was specifically chosen so that the model could be trained on many different types of emergencies. This should give us a model that is able to be used in many different situations, and not just a single type of natural disaster. Each specific incident was chosen for its timeliness and the accompanying use of Twitter during those events. For Hurricane Harvey, there was even a specific hashtag for people to use when tweeting about it. As the use of Twitter has been increasing over time, we wanted to get the most amount of possible results, and chose disasters that were fairly recent, capturing this upward trend.

### Hurricane Harvey Tweets

THis search is going to be using the key word #hurricaneharvey. The date range covers the two days that the hurricane was most active in the Houston area.

In [None]:
begin_date = dt.date(2017,8,25)
end_date = dt.date(2017,8,26)

limit = 3000

lang = "english"

tweets = query_tweets("#hurricaneharvey",
                      begindate=begin_date,
                      enddate=end_date,
                      limit=limit,
                      lang=lang)
hh = pd.DataFrame(t.__dict__ for t in tweets) 
hh.to_csv('hurricane_harvey_general.csv')

Run above cell four times in order to get enough data for the time frame during Hurricane Harvey. Increase date range by 2 every iteration. Merge data frames as one and export as csv.

### Flood Tweets

This search is going be using the key words flood, flooding or floods. The date range was specified to be the two days that covered the highest amount of rainfall and accompanying floods.

In [None]:
begin_date = dt.date(2019,7,21)
end_date = dt.date(2019,7,22)

limit = 3000

lang = "english"

tweets = query_tweets("flood OR flooding OR floods",
                      begindate=begin_date,
                      enddate=end_date,
                      limit=limit,
                      lang='english')
df = pd.DataFrame(t.__dict__ for t in tweets)
df.to_csv("flood_072119-072219.csv")

### Tornado Tweets

This search is going to be using key words tornado or #tornado. The date range targets the three days where there was the highest amount of tornado activity in the southern US.

In [None]:
begin_date = dt.date(2019,4,13)
end_date = dt.date(2019,4,15)

limit = 3000

lang = "english"

tweets = query_tweets("tornado OR #tornado",
                      begindate=begin_date,
                      enddate=end_date,
                      limit=limit,
                      lang='english')
df = pd.DataFrame(t.__dict__ for t in tweets)
df.to_csv('tornados_41319-41519.csv')

### Noreaster Tweets

This search is going to be using key words noreaster or #noreaster. The date range represents the five days where there was high storm activity and following days of issues.

In [None]:
begin_date = dt.date(2018,3,1)
end_date = dt.date(2018,3,5)

limit = 3000

lang = "english"

tweets = query_tweets("noreaster OR #noreaster",
                      begindate=begin_date,
                      enddate=end_date,
                      limit=limit,
                      lang='english')
df = pd.DataFrame(t.__dict__ for t in tweets)
df.to_csv('noreaster_030118-030518.csv')

### Mudslide Tweets

This search is going to be using key words mudslide, mudslides or montecito. The date range covers the day leading up to the start of the mudslides, and the days following the main mudslide activity.

In [None]:
begin_date = dt.date(2018,1,8)
end_date = dt.date(2018,1,12)
limit = 3000
lang = "english"

tweets = query_tweets("montecito OR mudslide OR mudslides",
                      begindate=begin_date,
                      enddate=end_date,
                      limit=limit,
                      lang=lang)
df.to_csv('mudslides.csv')

## Sources

- https://blog.twitter.com/en_sea/topics/insights/2018/5-Tips-for-using-Twitter-during-emergencies-and-natural-disaster.html