# Filename: get_data.pynb

# Purpose:

This script will download the tweets we need for our project and put them into a pandas DataFrame. There are several ways we can get the data we need for our project.

As of right now, we use two methods to get the data we need: through using specific keywords regarding the 2020 Presidential Election and by scraping tweets with hashtags regarding the 2020 Presidential election and its candidates.

## Install Libraries

In [2]:
#!pip install tweepy # Run this line only if you don't have tweepy installed

Collecting tweepy
  Downloading https://files.pythonhosted.org/packages/36/1b/2bd38043d22ade352fc3d3902cf30ce0e2f4bf285be3b304a2782a767aec/tweepy-3.8.0-py2.py3-none-any.whl
Collecting requests-oauthlib>=0.7.0 (from tweepy)
  Downloading https://files.pythonhosted.org/packages/c2/e2/9fd03d55ffb70fe51f587f20bcf407a6927eb121de86928b34d162f0b1ac/requests_oauthlib-1.2.0-py2.py3-none-any.whl
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->tweepy)
  Downloading https://files.pythonhosted.org/packages/05/57/ce2e7a8fa7c0afb54a0581b14a65b56e62b5759dbc98e80627142b8a3704/oauthlib-3.1.0-py2.py3-none-any.whl (147kB)
Installing collected packages: oauthlib, requests-oauthlib, tweepy
Successfully installed oauthlib-3.1.0 requests-oauthlib-1.2.0 tweepy-3.8.0


In [1]:
import pandas as pd

import tweepy 
from tweepy import OAuthHandler

### Fetch Tweets using Twitter API with `Tweepy`

The below class is made using [this code](https://www.kaggle.com/amar09/sentiment-analysis-on-scrapped-tweets?source=post_page-----1804db3478ac----------------------) from Kaggle User [Amardeep Chauhan](https://www.kaggle.com/amar09). 

The class uses `tweepy` to access the Twitter API and fetch tweets relating to a specified keyword. 

The keywords we will use are:

- 2020 Presidential Election
- #2020Election
- #2020PresidentialElection
- #Election2020
- #KnowThe2020Candidates
- #POTUS2020
- #2020America
- political candidates
- Names of all candidates

In [37]:
# Keys and tokens
consumer_key = ''
consumer_secret = ''

access_token = ''
access_token_secret = ''

In [12]:
class TwitterClient(object):
    """
    Initialization method. Creates a tweepy API object in order to use tweets.
    """
    def __init__(self):
        try:
            # Create OAuthHandler Object
            auth = OAuthHandler(consumer_key, consumer_secret)
            
            # Set access token and secret token
            auth.set_access_token(access_token, access_token_secret)
            
            # Create tweepy API object to fetch tweets
            self.api = tweepy.API(auth, wait_on_rate_limit = True, wait_on_rate_limit_notify = True)
            
        except tweepy.TweepError as e:
            print(f'Error: Tweeter Authentication Failed - \n{str(e)}')
            
    """
    Fetches tweets using a specified query. Stores the tweets in a list after
    parsing them, which means extracting only the text and appending only unique tweets 
    to the resultant list.
    
    self: The TwitterClient object that will help us use the Twitter API.
    query: The specified query to search for.
    max_tweets: The maximum number of tweets in total to fetch. The default is 1000.
    
    Returns a list of unique tweets relating to the keyword. The length is not
    necessarily going to have max_tweets entries due to retweets. 
    """
    def get_tweets(self, query, max_tweets = 1000):
        tweets = []
        since_Id = None
        max_id = -1
        tweet_count = 0
        tweets_per_query = 100
        
        print('Fetching tweets for query:', query + '...')
        
        while tweet_count < max_tweets:
            try:
                # Try searching for tweets that have a max_id <= 0 i.e. older than -1.
                if(max_id <= 0):
                    if(not since_Id): # Get any tweets relating to the query
                        new_tweets = self.api.search(q = query, count = tweets_per_query)
                        
                    else: # Get tweets more recent that since_Id
                        new_tweets = self.api.search(q = query, count = tweets_per_query, since_id = since_Id)
                else:
                    if(not since_Id):
                        new_tweets = self.api.search(q = query, count = tweets_per_query, max_id = str(max_id - 1))
                        
                    else:
                        new_tweets = self.api.search(q = query, count = tweets_per_query, max_id = str(max_id - 1), 
                                                     since_id = since_Id)
                
                if not new_tweets:
                    print('No more tweets found.')
                    break
                
                # Start parsing the list of tweets
                for tweet in new_tweets:
                    parsed_tweet = {}
                    parsed_tweet['tweets'] = tweet.text
                    
                    # Append parsed tweet to tweets list
                    if tweet.retweet_count > 0: 
                        if parsed_tweet not in tweets: # If tweet has retweets, ensure that its appended only once
                            tweets.append(parsed_tweet)
                    else:
                        tweets.append(parsed_tweet)
                
                tweet_count += len(new_tweets)
                print('\tDownloaded {0} tweets'.format(tweet_count))
                max_id = new_tweets[-1].id # Prepare to get tweets that are older than the returned tweets 
                
            except tweepy.TweepError as e: # Exit if there are any errors
                print('Tweepy Error: ' + str(e))
                break
                
        print('Finished!\n')
        
        # Return list of tweets
        return tweets
                    

In [13]:
twitter_client = TwitterClient()

In [33]:
# List of queries
queries = ['2020 Presidential Election', '#2020Election', '#2020PresidentialElection', 
            '#Election2020', '#KnowThe2020Candidates', '#POTUS2020', '#2020America', 'political candidates', 
            'michael bennet', 'joe biden', 'cory booker', 'steve bullock', 'pete buttigieg', 'julian castro', 
            'bill de blasio', 'john delaney', 'tulsi gabbard', 'kamala harrisamy klobuchar', 'wayne messam', 'beto o rourke', 
            'tim ryan', 'bernie sanders', 'joe sestak', 'tom steyer', 'elizabeth warren', 'marianne williamson', 
            'andrew yang', '#yanggang', 'mark sanford', 'donald trump', 'joe walsh', 'william weld']

In [34]:
# Start fetching tweets
all_tweets = []

for query in queries:
    tweets = twitter_client.get_tweets(query, max_tweets = 5000)
    all_tweets.extend(tweets)

Fetching tweets for query: 2020 Presidential Election...
	Downloaded 99 tweets
	Downloaded 199 tweets
	Downloaded 299 tweets
	Downloaded 399 tweets
	Downloaded 499 tweets
	Downloaded 599 tweets
	Downloaded 699 tweets
	Downloaded 799 tweets
	Downloaded 899 tweets
	Downloaded 999 tweets
	Downloaded 1099 tweets
	Downloaded 1199 tweets
	Downloaded 1299 tweets
	Downloaded 1399 tweets


Rate limit reached. Sleeping for: 318


	Downloaded 1499 tweets
	Downloaded 1599 tweets
	Downloaded 1699 tweets
	Downloaded 1799 tweets
	Downloaded 1899 tweets
	Downloaded 1999 tweets
	Downloaded 2099 tweets
	Downloaded 2199 tweets
	Downloaded 2299 tweets
	Downloaded 2399 tweets
	Downloaded 2499 tweets
	Downloaded 2599 tweets
	Downloaded 2699 tweets
	Downloaded 2799 tweets
	Downloaded 2899 tweets
	Downloaded 2999 tweets
	Downloaded 3099 tweets
	Downloaded 3199 tweets
	Downloaded 3299 tweets
	Downloaded 3399 tweets
	Downloaded 3499 tweets
	Downloaded 3599 tweets
	Downloaded 3699 tweets
	Downloaded 3799 tweets
	Downloaded 3899 tweets
	Downloaded 3999 tweets
	Downloaded 4099 tweets
	Downloaded 4199 tweets
	Downloaded 4299 tweets
	Downloaded 4399 tweets
	Downloaded 4499 tweets
	Downloaded 4599 tweets
	Downloaded 4699 tweets
	Downloaded 4799 tweets
	Downloaded 4899 tweets
	Downloaded 4999 tweets
	Downloaded 5099 tweets
Finished!

Fetching tweets for query: #2020Election...
	Downloaded 100 tweets
	Downloaded 200 tweets
	Downloaded

Rate limit reached. Sleeping for: 760


	Downloaded 3000 tweets
	Downloaded 3100 tweets
	Downloaded 3200 tweets
	Downloaded 3300 tweets
	Downloaded 3400 tweets
	Downloaded 3500 tweets
	Downloaded 3600 tweets
	Downloaded 3700 tweets
	Downloaded 3800 tweets
	Downloaded 3900 tweets
	Downloaded 4000 tweets
	Downloaded 4100 tweets
	Downloaded 4200 tweets
	Downloaded 4300 tweets
	Downloaded 4400 tweets
	Downloaded 4500 tweets
	Downloaded 4600 tweets
	Downloaded 4700 tweets
	Downloaded 4800 tweets
	Downloaded 4900 tweets
	Downloaded 5000 tweets
Finished!

Fetching tweets for query: michael bennet...
	Downloaded 100 tweets
	Downloaded 200 tweets
	Downloaded 300 tweets
	Downloaded 400 tweets
	Downloaded 500 tweets
	Downloaded 600 tweets
	Downloaded 700 tweets
	Downloaded 800 tweets
	Downloaded 900 tweets
	Downloaded 1000 tweets
	Downloaded 1100 tweets
	Downloaded 1200 tweets
	Downloaded 1300 tweets
	Downloaded 1400 tweets
	Downloaded 1500 tweets
	Downloaded 1600 tweets
	Downloaded 1700 tweets
	Downloaded 1800 tweets
	Downloaded 1900 

Rate limit reached. Sleeping for: 762


	Downloaded 3100 tweets
	Downloaded 3200 tweets
	Downloaded 3300 tweets
	Downloaded 3400 tweets
	Downloaded 3500 tweets
	Downloaded 3600 tweets
	Downloaded 3700 tweets
	Downloaded 3800 tweets
	Downloaded 3900 tweets
	Downloaded 4000 tweets
	Downloaded 4100 tweets
	Downloaded 4200 tweets
	Downloaded 4300 tweets
	Downloaded 4400 tweets
	Downloaded 4500 tweets
	Downloaded 4600 tweets
	Downloaded 4700 tweets
	Downloaded 4800 tweets
	Downloaded 4900 tweets
	Downloaded 5000 tweets
Finished!

Fetching tweets for query: pete buttigieg...
	Downloaded 100 tweets
	Downloaded 200 tweets
	Downloaded 300 tweets
	Downloaded 400 tweets
	Downloaded 500 tweets
	Downloaded 600 tweets
	Downloaded 700 tweets
	Downloaded 800 tweets
	Downloaded 900 tweets
	Downloaded 1000 tweets
	Downloaded 1100 tweets
	Downloaded 1200 tweets
	Downloaded 1300 tweets
	Downloaded 1400 tweets
	Downloaded 1500 tweets
	Downloaded 1600 tweets
	Downloaded 1700 tweets
	Downloaded 1800 tweets
	Downloaded 1900 tweets
	Downloaded 2000 

Rate limit reached. Sleeping for: 685


	Downloaded 1100 tweets
	Downloaded 1200 tweets
	Downloaded 1300 tweets
	Downloaded 1400 tweets
	Downloaded 1500 tweets
	Downloaded 1600 tweets
	Downloaded 1700 tweets
	Downloaded 1800 tweets
	Downloaded 1900 tweets
	Downloaded 2000 tweets
	Downloaded 2100 tweets
	Downloaded 2200 tweets
	Downloaded 2300 tweets
	Downloaded 2400 tweets
	Downloaded 2500 tweets
	Downloaded 2551 tweets
No more tweets found.
Finished!

Fetching tweets for query: tulsi gabbard...
	Downloaded 100 tweets
	Downloaded 200 tweets
	Downloaded 300 tweets
	Downloaded 400 tweets
	Downloaded 500 tweets
	Downloaded 600 tweets
	Downloaded 700 tweets
	Downloaded 800 tweets
	Downloaded 900 tweets
	Downloaded 1000 tweets
	Downloaded 1100 tweets
	Downloaded 1200 tweets
	Downloaded 1300 tweets
	Downloaded 1400 tweets
	Downloaded 1500 tweets
	Downloaded 1600 tweets
	Downloaded 1700 tweets
	Downloaded 1800 tweets
	Downloaded 1900 tweets
	Downloaded 2000 tweets
	Downloaded 2100 tweets
	Downloaded 2200 tweets
	Downloaded 2300 twe

Rate limit reached. Sleeping for: 722


	Downloaded 400 tweets
	Downloaded 500 tweets
	Downloaded 600 tweets
	Downloaded 700 tweets
	Downloaded 800 tweets
	Downloaded 900 tweets
	Downloaded 999 tweets
	Downloaded 1099 tweets
	Downloaded 1199 tweets
	Downloaded 1299 tweets
	Downloaded 1399 tweets
	Downloaded 1499 tweets
	Downloaded 1599 tweets
	Downloaded 1699 tweets
	Downloaded 1799 tweets
	Downloaded 1899 tweets
	Downloaded 1999 tweets
	Downloaded 2099 tweets
	Downloaded 2199 tweets
	Downloaded 2299 tweets
	Downloaded 2399 tweets
	Downloaded 2499 tweets
	Downloaded 2599 tweets
	Downloaded 2699 tweets
	Downloaded 2799 tweets
	Downloaded 2899 tweets
	Downloaded 2999 tweets
	Downloaded 3099 tweets
	Downloaded 3199 tweets
	Downloaded 3299 tweets
	Downloaded 3399 tweets
	Downloaded 3499 tweets
	Downloaded 3599 tweets
	Downloaded 3699 tweets
	Downloaded 3799 tweets
	Downloaded 3899 tweets
	Downloaded 3999 tweets
	Downloaded 4099 tweets
	Downloaded 4199 tweets
	Downloaded 4299 tweets
	Downloaded 4399 tweets
	Downloaded 4499 tweets

Rate limit reached. Sleeping for: 754


	Downloaded 2757 tweets
	Downloaded 2857 tweets
	Downloaded 2957 tweets
	Downloaded 3057 tweets
	Downloaded 3157 tweets
	Downloaded 3257 tweets
	Downloaded 3357 tweets
	Downloaded 3457 tweets
	Downloaded 3557 tweets
	Downloaded 3657 tweets
	Downloaded 3757 tweets
	Downloaded 3857 tweets
	Downloaded 3957 tweets
	Downloaded 4057 tweets
	Downloaded 4157 tweets
	Downloaded 4257 tweets
	Downloaded 4357 tweets
	Downloaded 4457 tweets
	Downloaded 4557 tweets
	Downloaded 4657 tweets
	Downloaded 4757 tweets
	Downloaded 4857 tweets
	Downloaded 4957 tweets
	Downloaded 5057 tweets
Finished!

Fetching tweets for query: andrew yang...
	Downloaded 100 tweets
	Downloaded 200 tweets
	Downloaded 300 tweets
	Downloaded 400 tweets
	Downloaded 500 tweets
	Downloaded 600 tweets
	Downloaded 700 tweets
	Downloaded 800 tweets
	Downloaded 900 tweets
	Downloaded 1000 tweets
	Downloaded 1100 tweets
	Downloaded 1200 tweets
	Downloaded 1300 tweets
	Downloaded 1400 tweets
	Downloaded 1500 tweets
	Downloaded 1600 twe

Rate limit reached. Sleeping for: 769


	Downloaded 400 tweets
	Downloaded 500 tweets
	Downloaded 600 tweets
	Downloaded 700 tweets
	Downloaded 800 tweets
	Downloaded 900 tweets
	Downloaded 1000 tweets
	Downloaded 1100 tweets
	Downloaded 1200 tweets
	Downloaded 1300 tweets
	Downloaded 1400 tweets
	Downloaded 1500 tweets
	Downloaded 1600 tweets
	Downloaded 1700 tweets
	Downloaded 1800 tweets
	Downloaded 1900 tweets
	Downloaded 2000 tweets
	Downloaded 2100 tweets
	Downloaded 2200 tweets
	Downloaded 2300 tweets
	Downloaded 2400 tweets
	Downloaded 2500 tweets
	Downloaded 2600 tweets
	Downloaded 2700 tweets
	Downloaded 2800 tweets
	Downloaded 2900 tweets
	Downloaded 3000 tweets
	Downloaded 3100 tweets
	Downloaded 3200 tweets
	Downloaded 3300 tweets
	Downloaded 3400 tweets
	Downloaded 3500 tweets
	Downloaded 3600 tweets
	Downloaded 3700 tweets
	Downloaded 3800 tweets
	Downloaded 3900 tweets
	Downloaded 4000 tweets
	Downloaded 4100 tweets
	Downloaded 4200 tweets
	Downloaded 4300 tweets
	Downloaded 4400 tweets
	Downloaded 4500 tweet

In [35]:
all_tweets_df = pd.DataFrame(all_tweets)
all_tweets_df.shape

(36696, 1)

In [38]:
all_tweets_df.head()

Unnamed: 0,tweets
0,"@belcherjody1 IF no voter ID required by 2020,..."
1,RT @matthewjdowd: “As we approach this 2020 pr...
2,RT @SisiLiliDidi: #AndrewYang polls 14%--18% a...
3,RT @PRIMONUTMEG: Are you paying attention to t...
4,"RT @davidsirota: Fact check: Zero Pinnochios, ..."


In [36]:
# Save to a .csv file
all_tweets_df.to_csv('tweets2020.csv')