# Getting Twitter Data for Cities that Have NOT Declared a Climate Emergency #

This notebook focuses on getting Twitter data (all tweets) from the 10 largest cities (population-wise) that have declared a climate emergency. 

  

This first portion of the notebook is dedicated to printing an output that we'll use for a library called "Twitterscraper." This package uses CL for data collection. We'll load in the data back into this notebook. 

https://github.com/taspinar/twitterscraper
    
Once the data from twitterscraper is loaded, for the last portion, we'll then merge all of the cities' data into one large dataset for analysis. 

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import json      # library for working with JSON-formatted text strings
import pprint as pp    # library for cleanly printing Python data structures
import seaborn as sns
import twitterscraper as ts
from twitterscraper import query_tweets #library downloaded
import os as os

import subprocess #this enables us to pass CL code directly from Jupyter Notebooks 
from subprocess import Popen

## Creating a Twitterscraper Command ## 

The code below scrapes Twitter accounts from each city, scrapes *all* of their tweets, and makes one big JSON file. Rather than pasting the command into the CL, this function uses "subprocess" (a standard library already with Python) to pass the command directly through Jupyter Notebooks. 


In [5]:
def json_to_df(json_files):
    data_frames = []
    
    for file in json_files:
        print (file)
        with open(file) as f:
            data = json.load(f)
        
        d = {'username': [x['username'] for x in data],
        'time': [x['timestamp'] for x in data],
        'tweet': [x['text'] for x in data],
        'likes': [x['likes'] for x in data],
        'replies': [x['replies'] for x in data],
        'user_ID' : [x['screen_name'] for x in data]}
    
        data_frames.append(pd.DataFrame.from_dict(d))
    return data_frames

def combine_data(data_frames): #this will allow us to merge dataframes "*" allows us to pass X dataframes
    return pd.concat(data_frames)


def buildQuery(accounts):
    scraper_query = ''
    
    #this builds our search query
    for index, each_account in enumerate (accounts):
        next_index = index + 1 #this is so that we don't have an extra "OR" at the end, it "knows" the last thing
        if next_index > len(accounts) - 1: 
            scraper_query = scraper_query + "from:"+ each_account
        else:
            scraper_query = scraper_query + "from:"+ each_account + " OR "
            
    return scraper_query

def launch(command, output):
    print (command)
    
    outputFile = open(output, 'w+')
    p = Popen(command, stdout=outputFile, stderr=outputFile, universal_newlines=True)
    output, errors = p.communicate()
    #p.wait() # Wait for sub process to finish before moving on to make frame 

    #TCB - having issues with my erros,trying to ex that out
    if errors:
        print (errors)
    outputFile.close()
            
def scrape(accounts):
    data_files = []
    
    for user in accounts:
        path_to_output_file = user + ".txt" #we'll get both txt and json, but just ignore txt
        path_to_data_file = user + ".JSON"
        data_files.append(path_to_data_file)
        
        query = 'from: ' + user
        command = ["twitterscraper", query, 
                   "--lang", "en", "--all", "-ow", "-p", "40", "-o", path_to_data_file]
        launch(command, path_to_output_file)
 
    return data_files 

Below, I created a list of all the accounts I wish to scrape (I broke it up into 3 "searches" because this process is extremely time-consuming). However, using "scrape()" you can input all the accounts, it'll just an hour or so to get all the data.

In [11]:
#1
phila_accounts = ["PhiladelphiaGov", "GreenworksPhila", "PhillyOTIS", "PhilaParkandRec", "PhilaHsgAuthPHA", "PHLPlanDevelop", "PhilaOEM", "SEPTA"] #add accounts to scrape here
houston_accounts = ["HPARD","HoustonHCDD", "HoustonPlanning", "HoustonOEM", "HouPublicWorks", "HoustonTX", "GreenHoustonTx", "METROHouston", "HoustonOEM"] #add accounts to scrape here
phx_accounts = ["CityofPhoenixAZ", "phxenvironment" , "StreetsPHX", "PhoenixParks", "PhoenixParks", "PHXPlanandDev", "ResilientPHX", "TalkingTrashPHX", "PhoenixMetroBus"] #add accounts to scrape here

#2
sanantonio_accounts = ["COSAGOV", "COSAsustainable", "SAParksandRec", "SATomorrow2040", "SanAntonioOEM"]
dallas_accounts = ["CityOfDallas", "DallasClimate", "DallasParkRec", "DallasPlanUD", "DallasOEM", "dartmedia", "dallaszerowaste"] 
jax_accounts = ["CityofJax", "JaxReady", "JTAFLA"] 

#3
ftworth_accounts = ["CityofFortWorth", "FortWorthParks", "FWOEM", "TrinityMetro", "TarrantCountyTX"] 
charlotte_accounts = ["CLTgov", "CLTSustainable", "CharlotteDOT", "HNScharlotte", "CharMeckEM", "CATSRideTransit", "CLTWater"] 
columbus_accounts = ["ColumbusGov", "SustainableCol1", "CbusMetroParks", "ColumbusDP", "COTABus"] 
indianapolis_accounts = ["Indy_CIO", "SustainIndy", "IndyParksandRec", "IndyDPW", "IndyGoBus", "IndyDMD", "Indy_CIO", "IndyDBNS"] 


In [6]:
# one big file

nonclimate_accounts = ["PhiladelphiaGov", "GreenworksPhila", "PhillyOTIS", "PhilaParkandRec", "PhilaHsgAuthPHA", 
                       "PHLPlanDevelop", "PhilaOEM", "SEPTA", "HPARD","HoustonHCDD", "HoustonPlanning", "HoustonOEM", 
                       "HouPublicWorks", "HoustonTX", "GreenHoustonTx", "METROHouston", "HoustonOEM", "CityofPhoenixAZ", 
                       "phxenvironment" , "StreetsPHX", "PhoenixParks", "PhoenixParks", "PHXPlanandDev", "ResilientPHX", 
                       "TalkingTrashPHX", "PhoenixMetroBus", "COSAGOV", "COSAsustainable", "SAParksandRec", "SATomorrow2040", 
                       "SanAntonioOEM", "CityOfDallas", "DallasClimate", "DallasParkRec", "DallasPlanUD", "DallasOEM", 
                       "dartmedia", "dallaszerowaste", "CityofJax", "JaxReady", "JTAFLA", "CityofFortWorth", "FortWorthParks", 
                       "FWOEM", "TrinityMetro", "TarrantCountyTX", "CLTgov", "CLTSustainable", "CharlotteDOT", "HNScharlotte", 
                       "CharMeckEM", "CATSRideTransit", "CLTWater", "ColumbusGov", "SustainableCol1", "CbusMetroParks", 
                       "COTABus", "Indy_CIO", "SustainIndy", "IndyParksandRec", "IndyDPW", "IndyGoBus", "IndyDMD", "Indy_CIO", 
                       "IndyDBNS"] 
 
 
                             

nonclimate_output = scrape(nonclimate_accounts) 

['twitterscraper', 'from: PhiladelphiaGov', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhiladelphiaGov.JSON']
['twitterscraper', 'from: GreenworksPhila', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'GreenworksPhila.JSON']
['twitterscraper', 'from: PhillyOTIS', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhillyOTIS.JSON']
['twitterscraper', 'from: PhilaParkandRec', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaParkandRec.JSON']
['twitterscraper', 'from: PhilaHsgAuthPHA', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaHsgAuthPHA.JSON']
['twitterscraper', 'from: PHLPlanDevelop', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PHLPlanDevelop.JSON']
['twitterscraper', 'from: PhilaOEM', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaOEM.JSON']
['twitterscraper', 'from: SEPTA', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'SEPTA.JSON']
['twitterscraper', 'from: HPARD', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'HPARD.JSON']
['tw

In [53]:
# broken into 3 groups

nonclimate_accounts1 = ["PhiladelphiaGov", "GreenworksPhila", "PhillyOTIS", "PhilaParkandRec", "PhilaHsgAuthPHA", "PHLPlanDevelop", "PhilaOEM", "SEPTA", "HPARD","HoustonHCDD", "HoustonPlanning", "HoustonOEM", "HouPublicWorks", "HoustonTX", "GreenHoustonTx", "METROHouston", "HoustonOEM", "CityofPhoenixAZ", "phxenvironment" , "StreetsPHX", "PhoenixParks", "PhoenixParks", "PHXPlanandDev", "ResilientPHX", "TalkingTrashPHX", "PhoenixMetroBus"] 
                             

nonclimate1_output = scrape(nonclimate_accounts1) 

['twitterscraper', 'from: PhiladelphiaGov', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhiladelphiaGov.JSON']
['twitterscraper', 'from: GreenworksPhila', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'GreenworksPhila.JSON']
['twitterscraper', 'from: PhillyOTIS', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhillyOTIS.JSON']
['twitterscraper', 'from: PhilaParkandRec', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaParkandRec.JSON']
['twitterscraper', 'from: PhilaHsgAuthPHA', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaHsgAuthPHA.JSON']
['twitterscraper', 'from: PHLPlanDevelop', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PHLPlanDevelop.JSON']
['twitterscraper', 'from: PhilaOEM', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'PhilaOEM.JSON']
['twitterscraper', 'from: SEPTA', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'SEPTA.JSON']
['twitterscraper', 'from: HPARD', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'HPARD.JSON']
['tw

In [67]:
nonclimate_accounts_2 = ["COSAGOV", "COSAsustainable", "SAParksandRec", "SATomorrow2040", "SanAntonioOEM", "CityOfDallas", "DallasClimate", "DallasParkRec", "DallasPlanUD", "DallasOEM", "dartmedia", "dallaszerowaste", "CityofJax", "JaxReady", "JTAFLA"] 

climate_emergency_output_2 = scrape(nonclimate_accounts_2)
nonclimate_output_2 = climate_emergency_output_2

['twitterscraper', 'from: COSAGOV', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'COSAGOV.JSON']
['twitterscraper', 'from: COSAsustainable', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'COSAsustainable.JSON']
['twitterscraper', 'from: SAParksandRec', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'SAParksandRec.JSON']
['twitterscraper', 'from: SATomorrow2040', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'SATomorrow2040.JSON']
['twitterscraper', 'from: SanAntonioOEM', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'SanAntonioOEM.JSON']
['twitterscraper', 'from: CityOfDallas', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'CityOfDallas.JSON']
['twitterscraper', 'from: DallasClimate', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'DallasClimate.JSON']
['twitterscraper', 'from: DallasParkRec', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'DallasParkRec.JSON']
['twitterscraper', 'from: DallasPlanUD', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'Dall

In [62]:
nonclimate_accounts_3 = ["CityofFortWorth", "FortWorthParks", "FWOEM", "TrinityMetro", "TarrantCountyTX", "CLTgov", "CLTSustainable", "CharlotteDOT", "HNScharlotte", "CharMeckEM", "CATSRideTransit", "CLTWater", "ColumbusGov", "SustainableCol1", "CbusMetroParks", "COTABus", "Indy_CIO", "SustainIndy", "IndyParksandRec", "IndyDPW", "IndyGoBus", "IndyDMD", "Indy_CIO", "IndyDBNS"] 


nonclimate_output_3 = scrape(nonclimate_accounts_3)


['twitterscraper', 'from: CityofFortWorth', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'CityofFortWorth.JSON']
['twitterscraper', 'from: FortWorthParks', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'FortWorthParks.JSON']
['twitterscraper', 'from: FWOEM', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'FWOEM.JSON']
['twitterscraper', 'from: TrinityMetro', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'TrinityMetro.JSON']
['twitterscraper', 'from: TarrantCountyTX', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'TarrantCountyTX.JSON']
['twitterscraper', 'from: CLTgov', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'CLTgov.JSON']
['twitterscraper', 'from: CLTSustainable', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'CLTSustainable.JSON']
['twitterscraper', 'from: CharlotteDOT', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'CharlotteDOT.JSON']
['twitterscraper', 'from: HNScharlotte', '--lang', 'en', '--all', '-ow', '-p', '40', '-o', 'HNScharlotte.JSON'

In [22]:
#ignore the below, my original function didn't return a list with .JSON - and I didn't want to re-run the scraping process. 

climate_emergency_accounts = ["SeattleOPCD", "CityofSeattle", "seattledot", "SeattleOSE", "kcmetrobus", 
                             "LACity", "LADOTofficial", "lacountyparks", "HCIDLA", "Planning4LA", "metrolosangeles", "PortofLA", 
                             "NYC_DOT", "NYCParks", "NYCHA", "NYCPlanning", "nycemergencymgt", "MTA"]

climate_emergency_output_1 = []
for account in climate_emergency_accounts:
    climate_emergency_output_1.append(account + ".JSON")
    
print (climate_emergency_output_1)

['SeattleOPCD.JSON', 'CityofSeattle.JSON', 'seattledot.JSON', 'SeattleOSE.JSON', 'kcmetrobus.JSON', 'LACity.JSON', 'LADOTofficial.JSON', 'lacountyparks.JSON', 'HCIDLA.JSON', 'Planning4LA.JSON', 'metrolosangeles.JSON', 'PortofLA.JSON', 'NYC_DOT.JSON', 'NYCParks.JSON', 'NYCHA.JSON', 'NYCPlanning.JSON', 'nycemergencymgt.JSON', 'MTA.JSON']


## Converting JSONs to DataFrames ##

json_to_df() takes the json list output above and converts all the data into a list of dataframes. 

In [68]:
dataframe_1 = json_to_df(nonclimate1_output)

dataframe_2 = json_to_df(nonclimate_output_2)

dataframe_3 = json_to_df(nonclimate_output_3)

PhiladelphiaGov.JSON
GreenworksPhila.JSON
PhillyOTIS.JSON
PhilaParkandRec.JSON
PhilaHsgAuthPHA.JSON
PHLPlanDevelop.JSON
PhilaOEM.JSON
SEPTA.JSON
HPARD.JSON
HoustonHCDD.JSON
HoustonPlanning.JSON
HoustonOEM.JSON
HouPublicWorks.JSON
HoustonTX.JSON
GreenHoustonTx.JSON
METROHouston.JSON
HoustonOEM.JSON
CityofPhoenixAZ.JSON
phxenvironment.JSON
StreetsPHX.JSON
PhoenixParks.JSON
PhoenixParks.JSON
PHXPlanandDev.JSON
ResilientPHX.JSON
TalkingTrashPHX.JSON
PhoenixMetroBus.JSON
COSAGOV.JSON
COSAsustainable.JSON
SAParksandRec.JSON
SATomorrow2040.JSON
SanAntonioOEM.JSON
CityOfDallas.JSON
DallasClimate.JSON
DallasParkRec.JSON
DallasPlanUD.JSON
DallasOEM.JSON
dartmedia.JSON
dallaszerowaste.JSON
CityofJax.JSON
JaxReady.JSON
JTAFLA.JSON
CityofFortWorth.JSON
FortWorthParks.JSON
FWOEM.JSON
TrinityMetro.JSON
TarrantCountyTX.JSON
CLTgov.JSON
CLTSustainable.JSON
CharlotteDOT.JSON
HNScharlotte.JSON
CharMeckEM.JSON
CATSRideTransit.JSON
CLTWater.JSON
ColumbusGov.JSON
SustainableCol1.JSON
CbusMetroParks.JSON
COT

In [43]:
len(dataframe_3)

24

In [None]:
merge1 = combine_data(dataframe_1)
merge2 = combine_data(dataframe_2)
merge3 = combine_data(dataframe_3)

frames = [merge1, merge2, merge3]

result = pd.concat(frames)
result

to_keep = ["PhiladelphiaGov", "GreenworksPhila", "PhillyOTIS", "PhilaParkandRec", "PhilaHsgAuthPHA", "PHLPlanDevelop", 
           "PhilaOEM", "SEPTA", "HPARD","HoustonHCDD", "HoustonPlanning", "HoustonOEM", "HouPublicWorks", "HoustonTX", 
           "GreenHoustonTx", "METROHouston", "HoustonOEM", "CityofPhoenixAZ", "phxenvironment" , "StreetsPHX", "PhoenixParks", 
           "PhoenixParks", "PHXPlanandDev", "ResilientPHX", "TalkingTrashPHX", "PhoenixMetroBus", "COSAGOV", "COSAsustainable",
           "SAParksandRec", "SATomorrow2040", "SanAntonioOEM", "CityOfDallas", "DallasClimate", "DallasParkRec", "DallasPlanUD", 
           "DallasOEM", "dartmedia", "dallaszerowaste", "CityofJax", "JaxReady", "JTAFLA", "CityofFortWorth", "FortWorthParks", 
           "FWOEM", "TrinityMetro", "TarrantCountyTX", "CLTgov", "CLTSustainable", "CharlotteDOT", "HNScharlotte", "CharMeckEM", 
           "CATSRideTransit", "CLTWater", "ColumbusGov", "SustainableCol1", "CbusMetroParks", "ColumbusDP", "COTABus", "Indy_CIO", 
           "SustainIndy", "IndyParksandRec", "IndyDPW", "IndyGoBus", "IndyDMD", "Indy_CIO", "IndyDBNS"] 

                        

final_results = result[~result['user_ID'].isin(to_keep) == False] # the code above got all mentions & replies
len(final_results)

In [10]:
dataframe_1 = json_to_df(nonclimate_output)

result = pd.concat(dataframe_1)
result

to_keep = ["PhiladelphiaGov", "GreenworksPhila", "PhillyOTIS", "PhilaParkandRec", "PhilaHsgAuthPHA", "PHLPlanDevelop", 
           "PhilaOEM", "SEPTA", "HPARD","HoustonHCDD", "HoustonPlanning", "HoustonOEM", "HouPublicWorks", "HoustonTX", 
           "GreenHoustonTx", "METROHouston", "HoustonOEM", "CityofPhoenixAZ", "phxenvironment" , "StreetsPHX", "PhoenixParks", 
           "PhoenixParks", "PHXPlanandDev", "ResilientPHX", "TalkingTrashPHX", "PhoenixMetroBus", "COSAGOV", "COSAsustainable",
           "SAParksandRec", "SATomorrow2040", "SanAntonioOEM", "CityOfDallas", "DallasClimate", "DallasParkRec", "DallasPlanUD", 
           "DallasOEM", "dartmedia", "dallaszerowaste", "CityofJax", "JaxReady", "JTAFLA", "CityofFortWorth", "FortWorthParks", 
           "FWOEM", "TrinityMetro", "TarrantCountyTX", "CLTgov", "CLTSustainable", "CharlotteDOT", "HNScharlotte", "CharMeckEM", 
           "CATSRideTransit", "CLTWater", "ColumbusGov", "SustainableCol1", "CbusMetroParks", "ColumbusDP", "COTABus", "Indy_CIO", 
           "SustainIndy", "IndyParksandRec", "IndyDPW", "IndyGoBus", "IndyDMD", "Indy_CIO", "IndyDBNS"] 

                        

final_results = result[~result['user_ID'].isin(to_keep) == False] # the code above got all mentions & replies
final_results.head()

PhiladelphiaGov.JSON
GreenworksPhila.JSON
PhillyOTIS.JSON
PhilaParkandRec.JSON
PhilaHsgAuthPHA.JSON
PHLPlanDevelop.JSON
PhilaOEM.JSON
SEPTA.JSON
HPARD.JSON
HoustonHCDD.JSON
HoustonPlanning.JSON
HoustonOEM.JSON
HouPublicWorks.JSON
HoustonTX.JSON
GreenHoustonTx.JSON
METROHouston.JSON
HoustonOEM.JSON
CityofPhoenixAZ.JSON
phxenvironment.JSON
StreetsPHX.JSON
PhoenixParks.JSON
PhoenixParks.JSON
PHXPlanandDev.JSON
ResilientPHX.JSON
TalkingTrashPHX.JSON
PhoenixMetroBus.JSON
COSAGOV.JSON
COSAsustainable.JSON
SAParksandRec.JSON
SATomorrow2040.JSON
SanAntonioOEM.JSON
CityOfDallas.JSON
DallasClimate.JSON
DallasParkRec.JSON
DallasPlanUD.JSON
DallasOEM.JSON
dartmedia.JSON
dallaszerowaste.JSON
CityofJax.JSON
JaxReady.JSON
JTAFLA.JSON
CityofFortWorth.JSON
FortWorthParks.JSON
FWOEM.JSON
TrinityMetro.JSON
TarrantCountyTX.JSON
CLTgov.JSON
CLTSustainable.JSON
CharlotteDOT.JSON
HNScharlotte.JSON
CharMeckEM.JSON
CATSRideTransit.JSON
CLTWater.JSON
ColumbusGov.JSON
SustainableCol1.JSON
CbusMetroParks.JSON
COT

Unnamed: 0,username,time,tweet,likes,replies,user_ID
1,City of Philadelphia,2010-04-13T11:01:49,MILITARY TRAINING EXERCISE TO TAKE PLACE IN GR...,0,0,PhiladelphiaGov
7,City of Philadelphia,2010-03-12T11:50:39,STATEMENT FROM MAYOR NUTTER FOLLOWING PRESIDEN...,0,0,PhiladelphiaGov
9,City of Philadelphia,2010-02-24T12:58:03,CITY EMPLOYEE SENTENCED FOR THEFT FROM NON-PRO...,0,0,PhiladelphiaGov
15,City of Philadelphia,2010-02-12T13:43:45,@SEPTA has the 65 Bus been detoured up 63rd St...,0,0,PhiladelphiaGov
16,City of Philadelphia,2010-02-11T15:50:33,Update from Mayor Nutter on the City's storm r...,0,0,PhiladelphiaGov


In [65]:
final_results.to_csv("Non-Climate_Final Results.csv")

In [None]:
df = pd.read_csv("Non-Climate_Final Results.csv")
len(df)

In [4]:
len(nonclimate_output)

65