## Mapping Disasters Based on Location Keyword Extraction

Of tweets scraped from Twitter, only 30% of tweets actually contain location tagging. This means that 70% of all Tweets have no location-related information built in, making it difficult to aggregate locations and flag an actual suspected disaster. The content of tweets, however, will often mention a location (ie. "Forest fire in LA County") when a disaster is occurring. 

The goal of this notebook, therefore, is to develop an algorithm which can, in real time, scan all scraped tweets for location keywords. Once relevant locations have been extracted, they can be paired with their corresponding latitude and longitude coordinates and plotted on a Google map. 

This functionality with Google maps is a paid API with a ~7-day trial period, and therefore will not be immediately reproduceable.

In [None]:
#!pip install gmplot

In [None]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import itertools

import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

import re
from collections import defaultdict
from gmplot import gmplot

import warnings
from statistics import mean

In [None]:
#read in cleaned Tweets

df = pd.read_csv('data/sample_tweets_to_predict.csv')
df.rename(columns = {'text':'tweet'}, inplace = True)



In [None]:
def process_tweet():
    tokenizer = RegexpTokenizer('\w+|\$[\d\.]\S+')
    lem = WordNetLemmatizer()
    STOPWORDS = set(stopwords.words('english'))
    df['processed_tweets'] = df['tweet'].apply(tokenizer.tokenize)
    df['processed_tweets'] = df['processed_tweets'].apply(lambda row: list([lem.lemmatize(i) for i in row]))
    df['processed_tweets'] = df['processed_tweets'].apply(lambda x:[i for i in x if i not in STOPWORDS] )
    
    return df
 
#call the function
process_tweet()
df.head()

In [None]:
#read in city data: 

cities = pd.read_csv('https://raw.git.generalassemb.ly/noahszuckerman/project-5-natty-ds/master/data/worldcities.csv?token=AAAH7Z45CMEZ3KD2BQCU23LAGU4UQ', encoding = 'latin')
cities.drop(columns = ['city_ascii', 'iso2', 'iso3', 'admin_name', 'capital',
       'population', 'id'], axis = 1, inplace = True)

#Create a dictionary to decrease computational complexity when searching over tweets for keywords.
select_cities = cities.loc[(cities['country'] == 'United States') | (cities['country'] == 'Canada')]
select_cities = select_cities['city'].tolist()
select_cities = [x.lower() for x in select_cities]

latitude = cities['lat'].tolist()
longitude = cities['lng'].tolist()

select_coords = list(zip(cities['lat'], cities['lng']))
city_dict = dict(zip(select_cities,select_coords))


coords_dict = defaultdict(dict)
for x, y, z in zip(select_cities, latitude, longitude):
    coords_dict[x] = y, z

In [None]:
location = []
array = [[]]


for tweet in df['processed_tweets']:
    
    for word in tweet:
        try:
            city_dict[word]
            location.append(tweet)
            location.append(word)
            array.append(location)
            location = []       
                
        except:
            location.append(tweet)
            location.append('No Location Available')
            array.append(location)
            location = []
            

In [None]:
X = pd.DataFrame(array, columns = ['tweet', 'location'])
X.drop([0], inplace = True)

X['tweet'] = [' '.join(map(str, l)) for l in X['tweet']]

X.drop_duplicates(inplace = True)
X.head()

## Location Coordinates 

For each location extracted from tweets, the below functions map the corresponding longitude and latitude coordinates to the location.

In [None]:
coordinates_list = [] 
cord_list = []
lat = []
long = []
latitude_list = []
longitude_list = []


for word in X['location']:
    try:
        coords_dict[word]
        lat.append(coords_dict[word][0])
        latitude_list.append(lat)
        long.append(coords_dict[word][1])
        longitude_list.append(long)
        lat = []
        long = []
        
        

    except:
        lat = []
        long =[]

flat_lat = [item for sublist in latitude_list for item in sublist]
flat_long = [item for sublist in longitude_list for item in sublist]    

## Mapping with Google Maps

For each location, the below code superimposes the locations over a Google map using the gmplot module. 



In [None]:

gmap3 = gmplot.GoogleMapPlotter(39.822949,-121.41392, 26, apikey = "INSERT YOUR API KEY HERE")
  
# scatter method of map object  
# scatter points on the google map 
gmap3.scatter(flat_lat, flat_long, size = 1000, marker = True, color = 'red' ) 
# Get an output html file of all plots  
gmap3.draw("maps/my_map.html")


In [None]:
#Create GMapOptions object with map zoom
map_options = GMapOptions(flat_lat.mean(), lng = flat_long.mean(), map_type='roadmap', zoom=8)
api_key = os.environ['APIKey']


In [None]:
# CREATING A HEAT MAP WITH LOCATIONS 


#declare the center of the map, and how much we want the map zoomed in
gmap = gmplot.GoogleMapPlotter(0, 0, 2)
# plot heatmap
gmap.heatmap(flat_lat, flat_long)
gmap.scatter(flat_lat, flat_long, c='r', marker=True)
#Your Google_API_Key
gmap.apikey = "INSERT YOUR API KEY HERE"
# save it to html
gmap.draw("./maps/country_heatmap.html")