# Getting Twitter Data for Cities that Have Not Declared a Climate Emergency #

This notebook focuses on getting Twitter data (all tweets) from the 10 largest cities (population-wise) that have declared a climate emergency. 

    -Philadephia
    -Houston
    -Phoenix
    -San Antonio
    -Dallas
    -Jacksonville
    -Fort Worth
    -Charlotte
    -Columbus
    -Indianapolis

This first portion of the notebook is dedicated to printing an output that we'll use for a library called "Twitterscraper." This package uses CL for data collection. We'll load in the data back into this notebook. 

https://github.com/taspinar/twitterscraper
    
Once the data from twitterscraper is loaded, for the last portion, we'll then merge all of the cities' data into one large dataset for analysis. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import json      # library for working with JSON-formatted text strings
import pprint as pp    # library for cleanly printing Python data structures
import seaborn as sns
import twitterscraper as ts
from twitterscraper import query_tweets #library downloaded
import os as os

import subprocess #this enables us to pass CL code directly from Jupyter Notebooks 
from subprocess import Popen

INFO: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13'}


## Creating a Twitterscraper Command ## 

The code below scrapes Twitter accounts from each city, scrapes *all* of their tweets, and makes one big JSON file. Rather than pasting the command into the CL, this function uses "subprocess" (a standard library already with Python) to pass the command directly through Jupyter Notebooks. 


In [11]:
def json_to_df(json_files):
    data_frames = []
    
    for file in json_files:
        print (file)
        if os.path.isfile(file): 
            with open(file) as f:
                data = json.load(f)
        
            d = {'username': [x['username'] for x in data],
            'time': [x['timestamp'] for x in data],
            'tweet': [x['text'] for x in data],
            'likes': [x['likes'] for x in data],
            'replies': [x['replies'] for x in data],
            'user_ID' : [x['screen_name'] for x in data]}
    
            data_frames.append(pd.DataFrame.from_dict(d))
    return data_frames

def combine_data(data_frames): #this will allow us to merge dataframes "*" allows us to pass X dataframes
    return pd.concat(data_frames)


def buildQuery(accounts):
    scraper_query = ''
    
    #this builds our search query
    for index, each_account in enumerate (accounts):
        next_index = index + 1 #this is so that we don't have an extra "OR" at the end, it "knows" the last thing
        if next_index > len(accounts) - 1: 
            scraper_query = scraper_query + "from:"+ each_account
        else:
            scraper_query = scraper_query + "from:"+ each_account + " OR "
            
    return scraper_query

def launch(command, output):
    print (command)
    
    outputFile = open(output, 'w+')
    p = Popen(command, stdout=outputFile, stderr=outputFile, universal_newlines=True)
    output, errors = p.communicate()
    #p.wait() # Wait for sub process to finish before moving on to make frame 
    
    if errors:
        print (errors)
    outputFile.close()
            
def scrape(accounts):
    data_files = []
    
    for user in accounts:
        path_to_output_file = user + ".txt" #we'll get both txt and json, but just ignore txt
        path_to_data_file = user + ".JSON"
        data_files.append(path_to_data_file)
        
        query = 'from: ' + user
        command = ["twitterscraper", query, 
                   "--lang", "en", "--all", "-ow", "-p", "40", "-o", path_to_data_file]
        launch(command, path_to_output_file)
 
    return data_files 

Below, I created a list of all the accounts I wish to scrape (I broke it up into 3 "searches" because this process is extremely time-consuming). However, using "scrape()" you can input all the accounts, it'll just an hour or so to get all the data.

In [8]:
climate_emergency_accounts = ["PhiladelphiaGov", "GreenworksPhila", "PhillyOTIS", "PhilaParkandRec",
                              "PhilaHsgAuthPHA", "PHLPlanDevelop", "PhilaOEM", "SEPTA",
                              "HoustonTX", "GreenHoustonTx", "METROHouston", "HPARD", "HoustonHCDD", 
                              "HoustonPlanning", "HoustonOEM", "HouPublicWorks", "METROHouston", 
                              "CityofPhoenixAZ", "phxenvironment", "StreetsPHX", "PhoenixParks", "PhoenixParks", 
                              "PHXPlanandDev", "ResilientPHX", "TalkingTrashPHX", "PhoenixMetroBus",
                              "COSAGOV", "COSAsustainable", "SAParksandRec", "SATomorrow2040", "SanAntonioOEM", 
                              "CityOfDallas", "DallasClimate", "DallasParkRec", "DallasPlanUD", 
                              "DallasOEM", "dartmedia", "dallaszerowaste", 
                              "CityofJax", "JaxReady", "JTAFLA", 
                              "CityofFortWorth", "FortWorthParks", "FWOEM", "TrinityMetro", 
                              "TarrantCountyTX",
                              "CLTgov", "CLTSustainable", "CharlotteDOT", "HNScharlotte", 
                              "CharMeckEM", "CATSRideTransit", "CLTWater", 
                              "ColumbusGov", "SustainableCol1", "CbusMetroParks", "ColumbusDP", "COTABus", 
                              "Indy_CIO", "SustainIndy", "IndyParksandRec", "IndyDPW", 
                              "IndyGoBus", "IndyDMD", "Indy_CIO", "IndyDBNS"]
                             
climate_emergency_output = scrape(climate_emergency_accounts) 

['twitterscraper', 'from: PhiladelphiaGov', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhiladelphiaGov.JSON']
['twitterscraper', 'from: GreenworksPhila', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'GreenworksPhila.JSON']
['twitterscraper', 'from: PhillyOTIS', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhillyOTIS.JSON']
['twitterscraper', 'from: PhilaParkandRec', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaParkandRec.JSON']
['twitterscraper', 'from: PhilaHsgAuthPHA', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaHsgAuthPHA.JSON']
['twitterscraper', 'from: PHLPlanDevelop', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PHLPlanDevelop.JSON']
['twitterscraper', 'from: PhilaOEM', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaOEM.JSON']
['twitterscraper', 'from: SEPTA', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'SEPTA.JSON']
['twitterscraper', 'from: HoustonTX', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'HoustonTX.JSO

## Converting JSONs to DataFrames ##

json_to_df() takes the json list output above and converts all the data into a list of dataframes. 

In [12]:
dataframe_1 = json_to_df(climate_emergency_output)

PhiladelphiaGov.JSON
GreenworksPhila.JSON
PhillyOTIS.JSON
PhilaParkandRec.JSON
PhilaHsgAuthPHA.JSON
PHLPlanDevelop.JSON
PhilaOEM.JSON
SEPTA.JSON
HoustonTX.JSON
GreenHoustonTx.JSON
METROHouston.JSON
HPARD.JSON
HoustonHCDD.JSON
HoustonPlanning.JSON
HoustonOEM.JSON
HouPublicWorks.JSON
METROHouston.JSON
CityofPhoenixAZ.JSON
phxenvironment.JSON
StreetsPHX.JSON
PhoenixParks.JSON
PhoenixParks.JSON
PHXPlanandDev.JSON
ResilientPHX.JSON
TalkingTrashPHX.JSON
PhoenixMetroBus.JSON
COSAGOV.JSON
COSAsustainable.JSON
SAParksandRec.JSON
SATomorrow2040.JSON
SanAntonioOEM.JSON
CityOfDallas.JSON
DallasClimate.JSON
DallasParkRec.JSON
DallasPlanUD.JSON
DallasOEM.JSON
dartmedia.JSON
dallaszerowaste.JSON
CityofJax.JSON
JaxReady.JSON
JTAFLA.JSON
CityofFortWorth.JSON
FortWorthParks.JSON
FWOEM.JSON
TrinityMetro.JSON
TarrantCountyTX.JSON
CLTgov.JSON
CLTSustainable.JSON
CharlotteDOT.JSON
HNScharlotte.JSON
CharMeckEM.JSON
CATSRideTransit.JSON
CLTWater.JSON
ColumbusGov.JSON
SustainableCol1.JSON
CbusMetroParks.JSON
C

In [13]:
merge1 = combine_data(dataframe_1)

Unnamed: 0,username,time,tweet,likes,replies,user_ID
0,City of Philadelphia,2009-04-29T11:54:23,Statement from Mayor Nutter on Arlen Spector h...,0,0,PhiladelphiaGov
1,Avenue of the Arts,2009-04-29T11:27:17,Thx4RT @princss6 RT @PhiladelphiaGov Statement...,0,0,avenueofthearts
2,Bob Garrett,2009-04-29T11:24:27,RT @princss6: RT @PhiladelphiaGov Statement fr...,0,0,BobGarrett
3,🇺🇸🇺🇸princss6 @cbs #cancelbobheartsabishola,2009-04-29T11:22:40,RT @PhiladelphiaGov Statement from Mayor Nutte...,0,0,princss6
4,Avenue of the Arts,2009-04-29T11:19:56,RT @PhiladelphiaGov Statement from Mayor Nutte...,0,0,avenueofthearts
...,...,...,...,...,...,...
271,Indianapolis City-County Council,2019-04-09T00:29:42,"Brian Madison, Director of @IndyDBNS, has step...",3,0,IndyCouncil
272,Indianapolis City-County Council,2019-04-09T00:27:05,@JacksonforIndy and @IndyBlakeJ have pointed o...,2,0,IndyCouncil
273,Indianapolis City-County Council,2019-04-09T00:17:25,@mike_mcquillen has offered an amendment to Pr...,0,0,IndyCouncil
274,Indianapolis City-County Council,2019-04-09T00:15:40,"Council is considering Proposal 149, which app...",4,0,IndyCouncil


In [16]:
to_keep = ["PhiladelphiaGov", "GreenworksPhila", "PhillyOTIS", "PhilaParkandRec",
          "PhilaHsgAuthPHA", "PHLPlanDevelop", "PhilaOEM", "SEPTA",
          "HoustonTX", "GreenHoustonTx", "METROHouston", "HPARD", "HoustonHCDD", 
          "HoustonPlanning", "HoustonOEM", "HouPublicWorks", "METROHouston", 
          "CityofPhoenixAZ", "phxenvironment", "StreetsPHX", "PhoenixParks", "PhoenixParks", 
          "PHXPlanandDev", "ResilientPHX", "TalkingTrashPHX", "PhoenixMetroBus",
          "COSAGOV", "COSAsustainable", "SAParksandRec", "SATomorrow2040", "SanAntonioOEM", 
          "CityOfDallas", "DallasClimate", "DallasParkRec", "DallasPlanUD", 
          "DallasOEM", "dartmedia", "dallaszerowaste", 
          "CityofJax", "JaxReady", "JTAFLA", 
          "CityofFortWorth", "FortWorthParks", "FWOEM", "TrinityMetro", 
          "TarrantCountyTX",
          "CLTgov", "CLTSustainable", "CharlotteDOT", "HNScharlotte", 
          "CharMeckEM", "CATSRideTransit", "CLTWater", 
          "ColumbusGov", "SustainableCol1", "CbusMetroParks", "ColumbusDP", "COTABus", 
          "Indy_CIO", "SustainIndy", "IndyParksandRec", "IndyDPW", 
          "IndyGoBus", "IndyDMD", "Indy_CIO", "IndyDBNS"]

final_results = merge1[~merge1['user_ID'].isin(to_keep) == False] # the code above got all mentions & replies
final_results

Unnamed: 0,username,time,tweet,likes,replies,user_ID
0,City of Philadelphia,2009-04-29T11:54:23,Statement from Mayor Nutter on Arlen Spector h...,0,0,PhiladelphiaGov
5,City of Philadelphia,2009-04-29T11:54:23,Statement from Mayor Nutter on Arlen Spector h...,0,0,PhiladelphiaGov
10,City of Philadelphia,2009-08-31T10:52:31,STATEMENT FROM MAYOR NUTTER ON TODAY’S PENNSYL...,0,0,PhiladelphiaGov
12,City of Philadelphia,2009-08-27T10:59:46,STATEMENT FROM MAYOR NUTTER ON SENATE APPROVAL...,0,0,PhiladelphiaGov
13,City of Philadelphia,2009-08-06T10:58:26,STATEMENT FROM MAYOR NUTTER ON PASSAGE OF HOUS...,0,0,PhiladelphiaGov
...,...,...,...,...,...,...
264,Indy DBNS,2019-07-18T16:38:45,Good afternoon #Indy! We’re counting down to t...,8,0,IndyDBNS
266,Indy DBNS,2019-05-24T18:05:16,Update from our friendly neighbors of @Speedwa...,2,0,IndyDBNS
267,Indy DBNS,2019-05-16T00:44:05,"If available, join us 2morrow @IndyParksandRec...",2,0,IndyDBNS
268,Indy DBNS,2019-05-10T14:42:22,The board is requesting clarification on the u...,1,0,IndyDBNS


In [17]:
final_results.to_csv("Final Results ND.csv")