# CA Gubernatorial and State Elections

This is a new twitter scrape notebook for the California Gubernatorial and other state elections.

### Gubernatorial
The current list of announced and possible candidates that the project will be tracking are the ones listed on _La Times_ article [_California's next governor: Who's running, who's on the fence?_](http://www.latimes.com/politics/la-pol-ca-california-governor-list-2018-htmlstory.html) 

This is a list of the candidates listed in the article as of September 6th, 2017:

**Declared**
* [Gavin Newsom - D](https://twitter.com/GavinNewsom) [02/10/2015](http://www.latimes.com/local/politics/la-me-pol-gavin-newsom-20150212-story.html)
* [John Chiang - D](https://twitter.com/JohnChiangCA) [05/17/2016](http://www.latimes.com/politics/la-pol-sac-essential-poli-john-chiang-jumps-into-californias-2018-governor-1463506797-htmlstory.html)
* [Antonio Villaraigosa - D](https://twitter.com/antonio4ca) [11/10/2016](http://www.dailynews.com/2016/11/10/former-la-mayor-antonio-villaraigosa-launches-bid-for-california-governor/)
* [Delaine Eastin - D](https://twitter.com/DelaineEastin) [11/01/2016](https://ballotpedia.org/Delaine_Eastin)
* [John Cox - R](https://twitter.com/TheRealJohnHCox) [03/07/2017](https://en.wikipedia.org/wiki/John_H._Cox#2018_California_gubernatorial_election)
* [Travis Allen- R](https://twitter.com/JoinTravisAllen) [06/22/2017](https://ballotpedia.org/Travis_Allen)
* [Zoltan Istvan - L](https://twitter.com/zoltan_istvan) [02/12/2017](http://www.newsweek.com/zoltan-istvan-california-governor-libertarian-555088)

**Potential**
* [Kevin Faulconer - R](https://twitter.com/Kevin_Faulconer)
* [Eric Garcetti - D](https://twitter.com/ericgarcetti)
* [Tom Steyer - D](https://twitter.com/TomSteyer)
* [Ashley Sweargin - R](https://twitter.com/ashleycvcf)
* [Steve Westly - D](https://twitter.com/SteveWestly)

### Senate
This is a compiled list from our staff meetings and [Wikipedia](https://en.wikipedia.org/wiki/United_States_Senate_election_in_California,_2018) of likely candidates for the 2018 senate race along with people of interest in CA politics such as Kamala Harris and Jerry Brown.

**Declared**
* [Topher Brennan - D](https://twitter.com/tophertbrennan)
* [Dianne Feinstein - D](https://twitter.com/SenFeinstein)
* [Pat Harris - D](https://twitter.com/PatHarrisCA)
* [David Hildebrand - D](https://twitter.com/David4SenateCA)
* Douglas Howard Pierce - D (no twitter account)
* [John Melendez - D](https://twitter.com/stutteringjohnm)
* [Joe Sanberg - D](https://twitter.com/JosephNSanberg)
* [Steve Stokes - D](https://twitter.com/Stokes4Senate)
* [Kevin de León - D](https://twitter.com/kdeleon)
* Timothy Charles Kalemkarian - R (no twitter account)
* [Caren Lancona - R](https://twitter.com/Carenlancona4Se)
* Stephen James Schrader - R (no twitter account)

**Potential**
* [Eric Garcetti - D](https://twitter.com/ericgarcetti)
* [Loretta Sanchez - D](https://twitter.com/LorettaSanchez)
* [Brad Sherman - D](https://twitter.com/BradSherman)
* [Tom Steyer - D](https://twitter.com/TomSteyer)
* [Eric Swalwell - D](https://twitter.com/RepSwalwell)
* [Kevin Faulconer - R](https://twitter.com/Kevin_Faulconer)
* [Caitlyn Jenner - R](https://twitter.com/Caitlyn_Jenner)
* [Ashley Sweargin - R](https://twitter.com/ashleycvcf)

### People of Interest
This is a list of top California officials and politicians that may be useful for this research. 

**CA Exec**
* [Jerry Brown - D](https://twitter.com/jerrybrowngov)
* [Xavier Becerra - D](https://twitter.com/AGBecerra)
* [Betty Yee - D](https://twitter.com/BettyYeeforCA)
* [Dave Jones - D](https://twitter.com/CA_DaveJones)
* [Tom Torlakson - D](https://twitter.com/TomTorlakson)

**CA Legislature**
* [Toni G. Atkins - D](https://twitter.com/toniatkins)
* [Bill Monning - D](https://twitter.com/billmonning)
* [Jean Fuller - R](https://twitter.com/JeanFuller)
* [Anthony Rendon - D](https://twitter.com/Rendon63rd)
* [Chris Holden - D](https://twitter.com/ChrisHoldenNews)
* [Chad Mayes - R](https://twitter.com/ChadMayesCA)

**CA Senate**
* [Kamala Harris](https://twitter.com/KamalaHarris)

In [2]:
# coding: utf-8

In [3]:
# import necessary python packages
import sys
#sys.path.append("/usr/local/lib/python2.7/site-packages")
import tweepy #https://github.com/tweepy/tweepy
import dropbox #https://www.dropbox.com/developers-v1/core/docs/python
import csv
import time
import os
from datetime import datetime
from collections import defaultdict
import logging
import gspread
import pandas as pd
import numpy as np
from openpyxl import load_workbook
from unidecode import unidecode

#Twitter and Dropbox API credentials
import api_cred as ac

In [4]:
# setup debug logging
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

In [5]:
# modify print precison for easier debugging
np.set_printoptions(precision=20)

# Helper Functions

In [6]:
def authenticate_twitter():
  auth = tweepy.OAuthHandler(ac.consumer_key, ac.consumer_secret)
  auth.set_access_token(ac.access_key, ac.access_secret)
  api = tweepy.API(auth)
  return api

In [7]:
def get_new_tweets(tweet_name, since_id):
  api = authenticate_twitter()
  tweets = []
  new_tweets = api.user_timeline(screen_name = tweet_name, since_id = since_id, count = 200)
  tweets.extend(new_tweets)
  if len(tweets) > 0:
    max_id = tweets[-1].id - 1
  while (len(new_tweets) > 0):
    new_tweets = api.user_timeline(screen_name = tweet_name, since_id = since_id, count = 200, max_id = max_id)
    tweets.extend(new_tweets)
    max_id = tweets[-1].id - 1
  
  tweets = [[tweet.id_str, tweet.created_at, tweet.text, "", "", "",tweet.retweet_count, tweet.favorite_count] for tweet in tweets]
  logger.info("Downloading %d tweets from %s" % (len(tweets), tweet_name))
  return tweets[::-1]

In [8]:
def get_lists(df):
  # put twitter handles, last acquired tweet ID, tweet count and store them in respective lists
  names = filter(lambda x: x > 0, df.iloc[:, 1])
  max_ids = df.iloc[:, 2]
  counts = df.iloc[:, 3]
  
  # save the number of entries
  indices = range(1,len(names)+1)
  
  lists = zip(names, max_ids, counts, indices)
  del lists[0] # the first one is column title
  return lists

In [9]:
def load_sheets(path):
  sheet_book = load_workbook(path)
  sheet_writer = pd.ExcelWriter(path, engine='openpyxl')
  sheet_writer.book = sheet_book
  sheet_writer.sheets = dict((ws.title, ws) for ws in sheet_book.worksheets)
  logger.info("Downloaded %s" % path)
  return sheet_writer

# Write to Sheets ↓

The functions below write data to the currently local sheets.

## Scrape Tweets and save them to respective excel file

*twitter_list.xlsx* contains the list of candidates for each tweets excel file along with the metadata.

*cand_tweets.xlsx* contains the tweets for all announced candidates

*spec_tweets.xlsx* contains the tweets for all the speculated candidates

*rep_tweets.xlsx* contains the tweets for all current CA represenatives of interest for the CA 2018 Elections.

In [66]:
#############################################
# collect_data(tweet_sheet, twitter_list, twitter_sheet)
# This function pulls new latest tweets and appends them to the correct excel file
# params: tweet_sheet - the path to the excel file to save the new tweets 
#         twitter_list - path to file that contains the twitter handles, last tweet pulled id, tweet counts, and 
#                        last pull date
#         twitter_sheet - the correct sheet of accounts corresponding to the tweet sheet passed in
# returns: n/a
def collect_data(tweet_sheet, twitter_list, twitter_sheet):
  # start timer
  start = time.time()
  logger.info("Start...")
  # dp_client = authenticate_dropbox()
  
  # process the paths so they are passable to load_sheets
  tweets_path = os.path.expanduser(tweet_sheet)
  twitter_path = os.path.expanduser(twitter_list)
  
  # load and prepare list of twitter accounts    
  list_writer = load_sheets(twitter_path)
  list_df = pd.read_excel(twitter_path, twitter_sheet)
  list_df = list_df.dropna(thresh=4)
  
  # list_df['Last_Pulled'] = pd.to_datetime(list_df['Last_Pulled'], errors='coerce') 
  # properly load spreadsheet to append new data
  tweet_writer = load_sheets(tweets_path)
  
  # loop through the list of Cand/PACs and updates each tweet sheet appropriately
  for index, row in list_df.iterrows():       
    name, since_id, count = row[1], row[2],row[3]
    
    new_tweets = get_new_tweets(name, 1)
    # if there are no new tweets continue to the next account
    if (len(new_tweets) > 0):
      # turn the new tweets into a dataframe and write them to the corresponding excel sheet
      df = pd.DataFrame(new_tweets)
      
      # update since_id, count, and last_pull date in tweet list
      list_df.iat[index,2] = new_tweets[len(new_tweets)-1][0] # since_id
      list_df.iat[index,3] = count + len(new_tweets) # last_pull
      list_df.iat[index,4] = pd.to_datetime(time.strftime("%m/%d/%Y %H:%M:%S"), errors='coerce') # last_pull date
      
      logger.info("Updated new tweets on spreadsheet for %s" % name)
      time.sleep(100)
  
  # write the updated list and save the changes to the excel sheets
  list_df.to_excel(list_writer, sheet_name=twitter_sheet, index=False)
  tweet_writer.save()
  list_writer.save()
  
  logger.info("Done appending new tweets")
  # stop timer and print time elapsed for the current data pull
  end = time.time()
  logger.info("Time Elapsed: %d", float((end-start))/60)

#### Tweets Pull Variables

In [67]:
# set file pathway variables an expand to HOME
data_dir = '~/Dropbox/Summer_of_Tweets/ca_working_sheets/'

# the excel sheets containing the tweets
cand_tweets = "cand_tweets.xlsx"
spec_tweets = "spec_tweets.xlsx"
rep_tweets = "rep_tweets.xlsx"

# the excel file containing the accounts and its sheets
twitter_list = "Twitter_List.xlsx"
cand_sheet = "cand"
spec_sheet = "speculated"
rep_sheet = "reps"

In [None]:
collect_data(data_dir + cand_tweets, data_dir + twitter_list, cand_sheet)
#print 
collect_data(data_dir + spec_tweets, data_dir + twitter_list, spec_sheet)
#print 
collect_data(data_dir + rep_tweets, data_dir + twitter_list, rep_sheet)

INFO:__main__:Start...
INFO:__main__:Downloaded /Users/SoloMune/Dropbox/Summer_of_Tweets/ca_working_sheets/Twitter_List.xlsx
INFO:__main__:Downloaded /Users/SoloMune/Dropbox/Summer_of_Tweets/ca_working_sheets/cand_tweets.xlsx
INFO:__main__:Downloading 3248 tweets from GavinNewsom
INFO:__main__:Updated new tweets on spreadsheet for GavinNewsom
INFO:__main__:Downloading 2101 tweets from JohnChiangCA
INFO:__main__:Updated new tweets on spreadsheet for JohnChiangCA


In [13]:
# testing this to fix the encoding issue with Antony V tweets
def FormatString(s):
  if isinstance(s, unicode):
    try:
      s.encode('ascii')
      return s
    except:
      return unidecode(s)
  else:
    return s
  