# Contents
1. [Tweets](#Collecting-Tweets-with-TweetScraper)
2. [Weather Data](#Weather-Data)
3. [Outage Data](#Outage-Data)

## Collecting Tweets with TweetScraper

We collected tweets using [TweetScraper](https://github.com/jonbakerfish/TweetScraper), but it should be noted that we only arrived at this solution after spending about a week trying other techniques without successfully compiling a usable collection of historical tweets.

Here is a brief summary of the issues we had:

- *Twitter's free API only allowed access to the seven most recent days of historical tweets. Further access required a costly subscription.*
- *While using the [TwitterScraper](https://github.com/taspinar/twitterscraper) package, we quickly exceeded the maximum number of API requests allowed and were prevented from pulling additional tweets.*
- *Before out access was blocked, TwitterScraper appeared to return 400 tweets per request, but, once we set the tweets in a pandas dataframe, we discovered that each request pulled only 20 unique tweets (repeated 20 times).*

#### Collecting Tweets

We ran TweetScraper in the terminal, using the query "power outage conedison". 

![](./images/tweetscaper_terminal.png)

This returned 4,375 tweets ranging from 2007-07-20 to the time of the scrape. TweetScraper returned each tweet as an individual JSON file, saved to the folder *"../TweetScraper/Data/tweet"*. We copied this folder to the present directory, and used the following code to extract the tweets and add them to a Pandas Dataframe.

#### Converting JSON files into a Pandas dataframe

In [1]:
import json
import os
tweets = []
for file in os.listdir('tweet/'):
    filename = 'tweet/' + str(file)
    if filename[7:10].isdigit():
        with open(filename) as tweetfile:
            pyresponse = json.loads(tweetfile.read())
            tweets.append(pyresponse)

In [2]:
import pandas as pd
df = pd.DataFrame(tweets)
df.head()

Unnamed: 0,ID,datetime,has_media,is_reply,is_retweet,medias,nbr_favorite,nbr_reply,nbr_retweet,text,url,user_id,usernameTweet
0,267716394043465728,2012-11-11 14:52:16,,False,False,,0,0,0,ConEd : NY Sandy power outages slip; costs...,/RealJezzy/status/267716394043465728,504279674,RealJezzy
1,298660085255770112,2013-02-05 00:11:27,,False,False,,0,0,0,@ ConEdison \n Power outage in queens,/meirBGNY/status/298660085255770112,909594764,meirBGNY
2,21860028236,2010-08-22 17:33:44,,False,False,,0,0,0,Power outage in the Whitestone section of Q...,/olgushka1/status/21860028236,68886571,olgushka1
3,1069021302667833344,2018-12-01 19:11:54,,False,False,,0,1,0,about that power outage that is now over.....,/jimcasale/status/1069021302667833344,12393522,jimcasale
4,767077182124486656,2016-08-20 15:13:46,,False,False,,3,0,7,.@ConEdison is responding to a power outag...,/NotifyNYC/status/767077182124486656,16145875,NotifyNYC


In [3]:
len(df)

4374

In [5]:
df.to_csv('./data/tweets_scrape_first_df.csv', index=False)

## Weather Data

Weather data was gathered from Kaggle's [Historic Hourly Weather](https://www.kaggle.com/selfishgene/historical-hourly-weather-data#weather_description.csv) dataset, which includes hourly weather data for New York City from 10/1/2012 to 10/27/2017.

## Outage Data

Power outage data was gathered from NYC Open Data's [OEM Emergency Notification](https://data.cityofnewyork.us/Public-Safety/OEM-Emergency-Notifications/8vv7-7wx3/data) database, which includes data on official NYC Office of Emergency Management notifications dating back to 2009.

Power outage notifications will contain the term 'Power Outage' in the *Notification Title* column.

<a id="#Collecting-Tweets-with-TweetScraper"></a>
test