# Getting Twitter Data for Cities that Have Declared a Climate Emergency #

This notebook focuses on getting Twitter data (all tweets) from the 10 largest cities (population-wise) that have declared a climate emergency. 

    -Los Angeles
    -Seattle 
    -Denver
    -New York
    -Chicago
    -San Diego
    -San Jose
    -Austin
    -San Francisco
    -Boston

This first portion of the notebook is dedicated to printing an output that we'll use for a library called "Twitterscraper." This package uses CL for data collection. We'll load in the data back into this notebook. 

https://github.com/taspinar/twitterscraper
    
Once the data from twitterscraper is loaded, for the last portion, we'll then merge all of the cities' data into one large dataset for analysis. 

In [304]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import json      # library for working with JSON-formatted text strings
import pprint as pp    # library for cleanly printing Python data structures
import seaborn as sns
import twitterscraper as ts
from twitterscraper import query_tweets #library downloaded
import os as os

import subprocess #this enables us to pass CL code directly from Jupyter Notebooks 
from subprocess import Popen

## Creating a Twitterscraper Command ## 

The code below scrapes Twitter accounts from each city, scrapes *all* of their tweets, and makes one big JSON file. Rather than pasting the command into the CL, this function uses "subprocess" (a standard library already with Python) to pass the command directly through Jupyter Notebooks. 


In [1]:
def json_to_df(*json_files):
    data_frames = []
    
    for file in json_files:
        with open(json_file) as f:
            data = json.load(f)
        
        d = {'username': [x['username'] for x in data],
        'time': [x['timestamp'] for x in data],
        'tweet': [x['text'] for x in data],
        'likes': [x['likes'] for x in data],
        'replies': [x['replies'] for x in data],
        'user_ID' : [x['screen_name'] for x in data]}
    
        data_frames.append(pd.DataFrame.from_dict(d))
    return data_frames

def combine_data(*data_frames): #this will allow us to merge dataframes "*" allows us to pass X dataframes
    return pd.concat(data_frames)

def buildQuery(accounts):
    scraper_query = ''
    
    #this builds our search query
    for index, each_account in enumerate (accounts):
        next_index = index + 1 #this is so that we don't have an extra "OR" at the end, it "knows" the last thing
        if next_index > len(accounts) - 1: 
            scraper_query = scraper_query + "from:"+ each_account
        else:
            scraper_query = scraper_query + "from:"+ each_account + " OR "
            
    return scraper_query

def launch(command, output):
    print (command)
    
    outputFile = open(output, 'w+')
    p = Popen(command, stdout=outputFile, stderr=outputFile, universal_newlines=True)
    output, errors = p.communicate()
    #p.wait() # Wait for sub process to finish before moving on to make frame 
    
    if errors:
        print (errors)
    myoutput.close()
            
def scrape(accounts):
    data_files = []
    
    for user in accounts:
        path_to_output_file = user + ".txt" #we'll get both txt and json, but just ignore txt
        path_to_data_file = user + ".JSON"
        data_files.append(path_to_data_file)
        
        query = 'from: ' + user
        command = ["twitterscraper", query, 
                   "--lang", "en", "--all", "-ow", "-p", "40", "-o", path_to_data_file]
        launch(command, path_to_output_file)
 

    #twitterscraper_query = buildQuery(accounts)
    #command = ["twitterscraper", twitterscraper_query, "--lang", "en", "-o", path_to_data_file, "--all", "-ow", "-p", "40"]
    #launch(command, path_to_output_file)
    
    return data_files 

In [377]:

climate_emergency_frame = json_to_df(climate_emergency_output)
climate_emergency_frame['username'].value_counts()


King County Metro 🚏 🚌🚎⛴🚐                              2921
seattledot                                            2884
LA Metro                                              2808
NYC DOT                                               1270
NYC Parks                                             1075
NYCHA                                                  678
MTA. Stay Home. Stop the Spread.                       511
NYC Emergency Management                               332
Los Angeles County Parks & Recreation                  228
LADOT                                                  200
Port of Los Angeles                                    152
Seattle Office of Planning & Community Development     114
NYCPlanning                                             80
City of Seattle                                         72
Seattle OSE                                             63
City of Los Angeles                                     62
Los Angeles City Planning                               