# 4: Power Outage Identification: Outage Map and Next Steps

### Contents:
- [Imports](#Imports)
- [Map Creation](#Map-Creation)
- [Conclusion and Next Steps](#Conclusion-and-Next-Steps)

## Imports

In [None]:
#importing the packages
from bokeh.plotting import figure, show, output_file, ColumnDataSource,output_notebook
from bokeh.tile_providers import get_provider, Vendors
from bokeh.layouts import column, row, widgetbox, layout
from bokeh.models import Slider, Toggle,DateSlider,DateRangeSlider,sliders,callbacks
from bokeh.models.callbacks import CustomJS
from bokeh.models import SaveTool
from bokeh.models.widgets import DateRangeSlider
from bokeh.io import curdoc,export_svgs
import pandas as pd
from geopy.geocoders import Nominatim
import numpy as np
import matplotlib.pyplot as plt
from datetime import date

## Map Creation

Since we had to use a scraper for collecting Twitter data, we were unable to pull locations that were more detailed than a city. Plotting that on a map means we’d have a lot of dots on top of each other. However, we thought it would be interesting to plot outages by time for our selected cities. 

The following code generates and exports an interactive html map. You can adjust the sliders at the top to select year and month. The dots on the map will grow based on the number of tweets talking about power outages. 

In [11]:
#reading in the file
tweet_df = pd.read_csv('./datasets/final_scored_data.csv')
tweet_df.drop(columns=['Unnamed: 0'], inplace = True)
tweet_df.head()

Unnamed: 0,id,text,timestamp,user,location,datestamp,date_place,combined_text,all_text,score_power_out,score_not_out,blackout
0,9.732586e+17,Power outage in the area causing delays. treat...,2018-03-12 18:04:57,TotalTrafficAUS,Austin,2018-03-12,"2018-03-12, Austin","dry, hot, cold ||",Power outage in the area causing delays. treat...,0.827468,0.815104,power_out
1,1.036794e+18,Aaaaaand the power is out. (@ La Casa De Los K...,2018-09-04 1:52:41,Daragaya,Austin,2018-09-04,"2018-09-04, Austin","downpour, hot, hot ||",Aaaaaand the power is out. (@ La Casa De Los K...,0.792333,0.781662,power_out
2,1.035702e+18,Lack of #fridaynightlights here... #powerouta...,2018-09-01 1:33:58,MartinGarza,Austin,2018-09-01,"2018-09-01, Austin","dry, scorching, hot ||",Lack of #fridaynightlights here... #powerouta...,0.802746,0.796409,power_out
3,1.021838e+18,Multiple signals on flash due to a power outag...,2018-07-24 19:23:17,TotalTrafficAUS,Austin,2018-07-24,"2018-07-24, Austin","downpour, scorching, hot ||",Multiple signals on flash due to a power outag...,0.826952,0.812297,power_out
4,6.906223e+17,ATXoutage update: outage was caused by a hit p...,2016-01-22 19:49:32,Austin_CP,Austin,2016-01-22,"2016-01-22, Austin","downpour, warm, cold ||",ATXoutage update: outage was caused by a hit p...,0.82209,0.827145,not_out


In [12]:
#assigning a lat/long for each city
geolocator = Nominatim(user_agent = 'power_outage_app')
city_locations = {}
for city in tweet_df['location'].unique():
    location = geolocator.geocode(city)
    city_locations[city] = {'lat':location.latitude,
                            'long':location.longitude}

In [13]:
#creating lat/long columns for city locations
tweet_df['lat'] = tweet_df['location'].apply(lambda x: city_locations[x]['lat'])
tweet_df['long'] = tweet_df['location'].apply(lambda x: city_locations[x]['long'])

In [14]:
#confirming that lat/longs imported correctly
tweet_df.head()

Unnamed: 0,id,text,timestamp,user,location,datestamp,date_place,combined_text,all_text,score_power_out,score_not_out,blackout,lat,long
0,9.732586e+17,Power outage in the area causing delays. treat...,2018-03-12 18:04:57,TotalTrafficAUS,Austin,2018-03-12,"2018-03-12, Austin","dry, hot, cold ||",Power outage in the area causing delays. treat...,0.827468,0.815104,power_out,30.271129,-97.7437
1,1.036794e+18,Aaaaaand the power is out. (@ La Casa De Los K...,2018-09-04 1:52:41,Daragaya,Austin,2018-09-04,"2018-09-04, Austin","downpour, hot, hot ||",Aaaaaand the power is out. (@ La Casa De Los K...,0.792333,0.781662,power_out,30.271129,-97.7437
2,1.035702e+18,Lack of #fridaynightlights here... #powerouta...,2018-09-01 1:33:58,MartinGarza,Austin,2018-09-01,"2018-09-01, Austin","dry, scorching, hot ||",Lack of #fridaynightlights here... #powerouta...,0.802746,0.796409,power_out,30.271129,-97.7437
3,1.021838e+18,Multiple signals on flash due to a power outag...,2018-07-24 19:23:17,TotalTrafficAUS,Austin,2018-07-24,"2018-07-24, Austin","downpour, scorching, hot ||",Multiple signals on flash due to a power outag...,0.826952,0.812297,power_out,30.271129,-97.7437
4,6.906223e+17,ATXoutage update: outage was caused by a hit p...,2016-01-22 19:49:32,Austin_CP,Austin,2016-01-22,"2016-01-22, Austin","downpour, warm, cold ||",ATXoutage update: outage was caused by a hit p...,0.82209,0.827145,not_out,30.271129,-97.7437


In [15]:
#looking at counts for blackouts
tweet_df['blackout'].value_counts()

power_out    3355
not_out      2329
Name: blackout, dtype: int64

In [16]:
#function to set x and y
def wgs84_to_web_mercator(df, lon="long", lat="lat"):
    k = 6378137
    df["x"] = df[lon] * (k * np.pi/180.0)
    df["y"] = np.log(np.tan((90 + df[lat]) * np.pi/360.0)) * k
    return df
wgs84_to_web_mercator(tweet_df)

Unnamed: 0,id,text,timestamp,user,location,datestamp,date_place,combined_text,all_text,score_power_out,score_not_out,blackout,lat,long,x,y
0,9.732586e+17,Power outage in the area causing delays. treat...,2018-03-12 18:04:57,TotalTrafficAUS,Austin,2018-03-12,"2018-03-12, Austin","dry, hot, cold ||",Power outage in the area causing delays. treat...,0.827468,0.815104,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
1,1.036794e+18,Aaaaaand the power is out. (@ La Casa De Los K...,2018-09-04 1:52:41,Daragaya,Austin,2018-09-04,"2018-09-04, Austin","downpour, hot, hot ||",Aaaaaand the power is out. (@ La Casa De Los K...,0.792333,0.781662,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
2,1.035702e+18,Lack of #fridaynightlights here... #powerouta...,2018-09-01 1:33:58,MartinGarza,Austin,2018-09-01,"2018-09-01, Austin","dry, scorching, hot ||",Lack of #fridaynightlights here... #powerouta...,0.802746,0.796409,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
3,1.021838e+18,Multiple signals on flash due to a power outag...,2018-07-24 19:23:17,TotalTrafficAUS,Austin,2018-07-24,"2018-07-24, Austin","downpour, scorching, hot ||",Multiple signals on flash due to a power outag...,0.826952,0.812297,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
4,6.906223e+17,ATXoutage update: outage was caused by a hit p...,2016-01-22 19:49:32,Austin_CP,Austin,2016-01-22,"2016-01-22, Austin","downpour, warm, cold ||",ATXoutage update: outage was caused by a hit p...,0.822090,0.827145,not_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
5,6.906182e+17,"Power is out for ~1,004 customers in the follo...",2016-01-22 19:33:06,Austin_CP,Austin,2016-01-22,"2016-01-22, Austin","downpour, warm, cold ||","Power is out for ~1,004 customers in the follo...",0.825847,0.814874,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
6,6.769004e+17,"Power out for ~1,361 customers in the N Lamar/...",2015-12-15 23:03:37,Austin_CP,Austin,2015-12-15,"2015-12-15, Austin","drizzle, hot, cold ||","Power out for ~1,361 customers in the N Lamar/...",0.813399,0.783403,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
7,6.659410e+17,"Power outage affecting 3,197 customers Burnet/...",2015-11-15 17:14:33,Austin_CP,Austin,2015-11-15,"2015-11-15, Austin","dry, hot, warm ||","Power outage affecting 3,197 customers Burnet/...",0.837226,0.827736,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
8,6.601950e+17,"About 2,500 Austin Energy customers without po...",2015-10-30 20:42:04,Austin_CP,Austin,2015-10-30,"2015-10-30, Austin","drizzle, hot, warm ||","About 2,500 Austin Energy customers without po...",0.798741,0.785283,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06
9,6.579008e+17,Today's races canceled and power is out at the...,2015-10-24 12:45:51,logandj,Austin,2015-10-24,"2015-10-24, Austin","torrential, hot, hot ||",Today's races canceled and power is out at the...,0.795371,0.786216,power_out,30.271129,-97.743700,-1.088078e+07,3.538449e+06


In [17]:
#sorting values by date
tweet_df.sort_values(by = ['datestamp'],inplace = True)

In [18]:
#creating columns for day, month, and year from the date column
tweet_df['year'] = tweet_df['datestamp'].apply(lambda x: int(x.split('-')[0]))
tweet_df['month'] = tweet_df['datestamp'].apply(lambda x: int(x.split('-')[1]))
tweet_df['day'] = tweet_df['datestamp'].apply(lambda x: int(x.split('-')[2]))

In [19]:
tweet_df['dum_power_out'] = tweet_df['blackout'].apply(lambda x: 1 if x == 'power_out' else 0)

In [20]:
tweet_df_group = tweet_df.groupby(by = ['location','year','month'])[['dum_power_out']].sum().reset_index()

In [21]:
tweet_df_group['lat'] = tweet_df_group['location'].apply(lambda x: city_locations[x]['lat'])
tweet_df_group['long'] = tweet_df_group['location'].apply(lambda x: city_locations[x]['long'])

In [22]:
wgs84_to_web_mercator(tweet_df_group)

Unnamed: 0,location,year,month,dum_power_out,lat,long,x,y
0,Austin,2014,1,12,30.271129,-97.743700,-1.088078e+07,3.538449e+06
1,Austin,2014,2,9,30.271129,-97.743700,-1.088078e+07,3.538449e+06
2,Austin,2014,3,9,30.271129,-97.743700,-1.088078e+07,3.538449e+06
3,Austin,2014,4,21,30.271129,-97.743700,-1.088078e+07,3.538449e+06
4,Austin,2014,5,14,30.271129,-97.743700,-1.088078e+07,3.538449e+06
5,Austin,2014,6,36,30.271129,-97.743700,-1.088078e+07,3.538449e+06
6,Austin,2014,7,16,30.271129,-97.743700,-1.088078e+07,3.538449e+06
7,Austin,2014,8,16,30.271129,-97.743700,-1.088078e+07,3.538449e+06
8,Austin,2014,9,10,30.271129,-97.743700,-1.088078e+07,3.538449e+06
9,Austin,2014,10,2,30.271129,-97.743700,-1.088078e+07,3.538449e+06


In [None]:
# creating a Bokeh map
tweets = tweet_df_group
visible_tweets = tweet_df_group[(tweet_df_group['year']==2014) & (tweet_df_group['month']==1)]

visible_source = ColumnDataSource(data = dict(x = visible_tweets['x'],
                                              y = visible_tweets['y'],
                                              location = visible_tweets['location'],
                                              year = visible_tweets['year'],
                                              month = visible_tweets['month'],
                                              power_out_count = visible_tweets['dum_power_out'],
                                              radii = (visible_tweets['dum_power_out']*2000)+200
                                           )
                                 )
tweet_source = ColumnDataSource(data = dict(x = tweets['x'],
                                            y = tweets['y'],
                                            location = tweets['location'],
                                            year = tweets['year'],
                                            month = tweets['month'],
                                            power_out_count = tweets['dum_power_out'],
                                            radii = (tweets['dum_power_out']*2000)+200
                                           )
                               )
    
TOOLS="hover,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select"
# seting the hover text
TOOLTIPS =[
    ('City', '@location'),
    ('# of power outage tweets','@power_out_count'),
    ('Month','@month'),
    ('Year','@year')
]

# callback function
callback = CustomJS(args=dict(tweet_source=tweet_source,visible_source = visible_source), code="""
    
    var tweets_data = tweet_source.data;
    var visible_data = visible_source.data;
    var month = month.value;
    var year = year.value;
    
    visible_data.x = []
    visible_data.y = []
    visible_data.power_out_count = []
    visible_data.radii = []
    visible_data.location = []
    visible_data.month = []
    visible_data.year = []
    
    for(var i =0; i < tweets_data.x.length; i++) {
        if ((tweets_data.year[i] == year)&(tweets_data.month[i]==month)) {
            visible_data.x.push(tweets_data.x[i]);
            visible_data.y.push(tweets_data.y[i]);
            visible_data.power_out_count.push(tweets_data.power_out_count[i]);
            visible_data.radii.push(tweets_data.radii[i]);
            visible_data.location.push(tweets_data.location[i]);
            visible_data.month.push(tweets_data.month[i]);
            visible_data.year.push(tweets_data.year[i])
        }
    }
    
    visible_source.change.emit();
""")

month_slider = Slider(start=1,end=12, value=1, step=1, title="Month", callback = callback)
callback.args['month'] = month_slider
year_slider = Slider(start=2014, end=2019, value=2014, step=1,title="Year",callback = callback)
callback.args['year'] = year_slider


p = figure(title='Map of Tweets About Power Outages',x_range=(-15187814, -6458032), y_range=(2505715, 6567666),
           x_axis_type="mercator", y_axis_type="mercator",plot_height = 500,tools = TOOLS, tooltips = TOOLTIPS)
p.add_tile(get_provider(Vendors.STAMEN_TONER_BACKGROUND))

p.circle('x','y',source = visible_source, radius = 'radii',fill_color = (29,161,242), size = 10, fill_alpha = 0.5)


layout = column(year_slider,month_slider,p)
curdoc().add_root(layout)
output_file("./visualizations/outage_map.html", title="outage_map.py")
output_notebook()
show(layout)  # open a browser

## Conclusion and Next Steps

We were able to create a prototype that can classify a tweet as being a legitimate power outage. However, there are some limitations to it that would need to be addressed before rolling it out to classify more data. Because we had to use a Twitter scraper instead of Twitter’s api, our location data is not as exact as it could be. And even if we could use the api, not all users list location data on their Twitter profile out of privacy concerns. Also, the evaluation process for accuracy is a manual one that rolling out countrywide would demand resources and time for review as well.

However, we do see great possibilities in a more widespread rollout, based on live data. Had we more time, the next steps we would have taken and recommend considering are the following:

1. Test the model on live data: tweets from Twitter’s api and weather data from Dark Sky’s api.

2. Use K-means clustering to group power outage findings to better be able to confirm an entire area is without power. Sectioning out clusters by region and by weather would be interesting to look at as well. We only looked at the weather options that were consistently available for all our select cities, but looking into more detail on things like wind speed would be beneficial here.

3. Explore other dimensionality/data reduction methods besides t-SNE, such as using principal component analysis before data preprocessing.