# Gathering Tweets

In this notebook you will see how I gathered all the tweets for this preject. 
We used a Twitter API to grab the tweets. Then we used a Python library called Tweepy to gather all the tweets.
A majority of the time was spent here because this was going to be our data for the project. 
Below you can see the libraries we imported for this notebook. 

In [1]:
import json
import tweepy
import pandas as pd
import numpy as np
import datetime
import time

# Research and code usage
Since this was the first time I either of us had used a Twitter API we had to do some initial research to get the project started. A lot of this code was taken from the sources below. 
- [Twitter Tutorial](http://socialmedia-class.org/twittertutorial.html): This link is to a basic tutorial on how to use Tweepy. It is a great source on how you can get your API and how you can authorize your keys.
- [Code for 'tweets_to_dataframe' function](https://www.youtube.com/watch?v=WX0MDddgpA4&list=PL5tcWHG-UPH2zBfOz40HSzcGUPAVOOnu1&index=3): This is a link to a video that is very descriptive on how to use the Tweepy library. 
- [Tweepy doc](http://docs.tweepy.org/en/v3.8.0/api.html): This is the Tweepy documentation.

# Authorizing Twitter API

In [2]:
# Getting our keys set-up so we can start pulling some tweets.
# Be mindful when you get your keys that you need all four.
ACCESS_TOKEN = # ACCESS TOKEN (removed for privacy)
ACCESS_SECRET = # ACCESS SECRET (removed for privacy)
CONSUMER_KEY = # CONSUMER KEY (removed for privacy)
CONSUMER_SECRET = #CONSUMER SECRET (removed for privacy)

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth)

# Functions to pull tweets and turn them into dataframes

In [3]:
# The code for this cell was heavily influenced and taken from the YouTube link below.
# This is a function that was used to easily change the pulled tweets 
# which come in as a JSON file into a dataframe
# https://www.youtube.com/watch?v=WX0MDddgpA4&list=PL5tcWHG-UPH2zBfOz40HSzcGUPAVOOnu1&index=3
# There were some adaptations that were made by the help of Shreya Shenoy
def tweets_to_dataframe(tweets):
    df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['tweets'])
    df['id'] = np.array([tweet.id for tweet in tweets])
    df['name'] = np.array([tweet.author.name for tweet in tweets])
    df['location'] = np.array([tweet.user.location for tweet in tweets])
    df['coordinates'] = np.array([tweet.coordinates for tweet in tweets])
    df['created_at'] = np.array([tweet.created_at for tweet in tweets])
    df['favorite_count'] = np.array([tweet.favorite_count for tweet in tweets])
    df['geo'] = np.array([tweet.geo for tweet in tweets])
    df['place'] = np.array([tweet.place for tweet in tweets])
    df['source'] = np.array([tweet.source for tweet in tweets])

    return df

In [4]:
# This function was used to easily pull the tweets
# It also calls on the previous function 'tweets_to_dataframe'
# this is to make the process into one line of code. 
# The for loop was influenced by Tim Book from his office hours on April 16th 2020
def grabbing_tweets(query, date, num_iter):
    # Create empty
    df_list = []
    # Using the date as a reference to pull older tweets
    current_date = date
    
    # Run a for loop to iterate the amount of times you want to grab however many tweets
    for _ in range(num_iter):
        # Setting the api parameters
        raw_json = api.search(q=query, count=100, until=current_date)
        
        # Turning raw_json into a dataframe
        df = tweets_to_dataframe(raw_json)
        
        # Turning the date column into a datetime quality. 
        # This also removes the time the tweet was made
        # The API can only take in YYYY-MM-DD
        df['created_at'] = df['created_at'].apply(datetime.datetime.date)
        
        # Append the newly created dataframe to the df_list
        df_list.append(df)
        
        # Take the latest date to use as a reference for pulling older tweets
        current_date = df['created_at'].min()
        
        # Give some time in between pulls as to not ask for too many requests in a short time
        time.sleep(60)
    
    # Then pull all the dataframes in df_list into one dataframe
    return pd.concat(df_list, axis=0)

# Pulling the tweets

This section is all pulling of tweets. 
There are a lot of tables, but that was to check to see the dates that were pulled.
The Twitter API only allows you to pull tweets a week back. 

## Pulling CloroxTweets

In [15]:
df_clorox = grabbing_tweets('#clorox', '2020-05-13', 15)

In [16]:
df_clorox.shape

(561, 10)

In [17]:
df_clorox

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
0,@Jim_Howard_13 @RepAndyBiggsAZ @RandPaul just ...,1.260339e+18,corporationsgonewild 🌊,any town USA,,2020-05-12,1.0,,,Twitter Web App
1,Latest study of #Trump's proposed #COVID19 dru...,1.260334e+18,Darius,"Berkeley, CA",,2020-05-12,0.0,,,Twitter for Android
2,Restock!!\n\nClorox Fraganzia Multi-Purpose Cl...,1.260322e+18,Find 😷 Essentials & Save 💰 Shopping Online,,,2020-05-12,0.0,,,Twitter Web App
3,Who knew the #NewNormal would be bartering #to...,1.260322e+18,Jeff Harvey,"Washington, DC",,2020-05-12,0.0,,,Twitter Web App
4,RT @ABC7: Searching high and low for #Clorox d...,1.260316e+18,jeaned62803,,,2020-05-12,0.0,,,Twitter for iPhone
...,...,...,...,...,...,...,...,...,...,...
65,@trish_regan - For sure #Clorox #MadeInAmerica...,1.257260e+18,Trading Range EURUSD,,,2020-05-04,0.0,,,Twitter Web App
66,RT @AedynBrooks: Bwhahaha #toofunny #Clorox ht...,1.257251e+18,Sophi Frost,North Carolina,,2020-05-04,0.0,,,Twitter Web App
67,RT @ledsjam_trump: POTUS readies for COVID-19 ...,1.257241e+18,Blink2XAmerica,,,2020-05-04,0.0,,,Twitter for iPhone
68,RT @AedynBrooks: Bwhahaha #toofunny #Clorox ht...,1.257237e+18,JeanMead12,"Pocklington, England",,2020-05-04,0.0,,,Twitter for Android


In [18]:
# Wanted to checked out of all the tweets we pulled how many coordinates we were getting.
df_clorox.loc[~df_clorox['coordinates'].isnull()]

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
41,I have searched from Knoxville to Ooltewah to ...,1.260204e+18,Wendy King 🇺🇸😀🍊,"Chattanooga, TN","{'type': 'Point', 'coordinates': [-85.68161077...",2020-05-12,0.0,"{'type': 'Point', 'coordinates': [35.04049246,...",Place(_api=<tweepy.api.API object at 0x11430af...,Instagram
36,Mother’s Day basket for joyciejoel. Look close...,1.259539e+18,Scottie A.,"Alameda, CA","{'type': 'Point', 'coordinates': [-122.0615, 3...",2020-05-10,0.0,"{'type': 'Point', 'coordinates': [37.948, -122...",Place(_api=<tweepy.api.API object at 0x11430af...,Instagram
18,Getting the office ready for grand re-opening!...,1.259213e+18,Cheryl Lee-Pow,"Rockville, MD","{'type': 'Point', 'coordinates': [-77.14630461...",2020-05-09,0.0,"{'type': 'Point', 'coordinates': [39.08103807,...",Place(_api=<tweepy.api.API object at 0x11430af...,Instagram
23,We only get a few here and there but we like t...,1.258489e+18,Kendalls Hardware,"St. Paul, MN","{'type': 'Point', 'coordinates': [-93.07325315...",2020-05-07,0.0,"{'type': 'Point', 'coordinates': [44.96629517,...",Place(_api=<tweepy.api.API object at 0x11430af...,Instagram
27,We have a few of these in with our order today...,1.258061e+18,Kendalls Hardware,"St. Paul, MN","{'type': 'Point', 'coordinates': [-93.07325315...",2020-05-06,0.0,"{'type': 'Point', 'coordinates': [44.96629517,...",Place(_api=<tweepy.api.API object at 0x11430af...,Instagram
46,Hangout with my best friends🙂 #Lysol #Clorox #...,1.257334e+18,St-Onge Jean,"Shawinigan, Québec","{'type': 'Point', 'coordinates': [-72.7519499,...",2020-05-04,1.0,"{'type': 'Point', 'coordinates': [46.5425899, ...",Place(_api=<tweepy.api.API object at 0x11430af...,Instagram


In [23]:
# Even though we already pulled tweets that were "#clorox"
# I wanted to see if there would be more tweets drawn without the "#" character
# As you can see below we were able to grab 400 more tweets
# There is a liklihood of doubles, but those can be removed in the cleaning process

df_clorox_2 = grabbing_tweets('clorox', '2020-05-13', 10)

In [24]:
df_clorox_2.shape

(900, 10)

In [26]:
df_clorox_2.loc[~df_clorox_2['coordinates'].isnull()]

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source


In [28]:
df_clorox_2.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
95,@realDonaldTrump So what? Fauci knows the tru...,1.257459e+18,Karen Diedo,,,2020-05-04,0.0,,,Twitter for iPhone
96,RT @gtconway3d: The wall hasn’t been built. Th...,1.257459e+18,loma1954,,,2020-05-04,0.0,,,Twitter for Android
97,🔥Hot Items back on stock🔥\n\nPlus many more es...,1.257459e+18,WholesaleLiquidation,"695 RED OAK RD STOCKBRIDGE, GA",,2020-05-04,0.0,,,Hootsuite Inc.
98,RT @gtconway3d: The wall hasn’t been built. Th...,1.257459e+18,Jeanne C. 🌲🌳🌿🌱🌷🌸🌺🥀,Oregon,,2020-05-04,0.0,,,Twitter for iPad
99,RT @gtconway3d: The wall hasn’t been built. Th...,1.257459e+18,Monctonscout,"Moncton, NB",,2020-05-04,0.0,,,Twitter for Android


## Pulling for Clorox Wipes

In [29]:
df_clorox_wipes = grabbing_tweets('#cloroxwipes', '2020-05-13', 15)

In [31]:
df_clorox_wipes.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
71,If you can't find #cloroxwipes get some dishwa...,1.258166e+18,☄Starlight Tigress☄,,,2020-05-06,0.0,,,Twitter for Android
72,RT @OSUcoachD: Walked into Walmart at 6pm and ...,1.258025e+18,Mr. ORANGE-POWER,"Edmond, OK",,2020-05-06,0.0,,,Twitter Web App
73,Walked into Walmart at 6pm and was totally blo...,1.257853e+18,Coach Davis,"Stillwater, OK",,2020-05-06,35.0,,,Twitter for iPhone
74,🐝They are the bees knees!💀#memezlab #murderhor...,1.257828e+18,Memezlab☣️,"Missouri, USA",,2020-05-06,1.0,,,Twitter for iPhone
75,RT @SellableStuff: Clorox Healthcare Hydrogen ...,1.257429e+18,Dunn Designs LLC,"Florida, USA",,2020-05-04,0.0,,,Twitter Web App


In [32]:
df_clorox_wipes.shape

(76, 10)

In [33]:
df_cloroxwipes_nohash = grabbing_tweets('cloroxwipes', '2020-05-13', 15)

In [35]:
df_cloroxwipes_nohash.shape

(80, 10)

In [36]:
df_cloroxwipes_nohash.head()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
0,RT @LaRayeBrown: They came!!!!\n#CloroxWipes h...,1.26035e+18,Helen,,,2020-05-12,0.0,,,Twitter Web App
1,@realDonaldTrump https://t.co/qBDHtCt2zs\nHomi...,1.260283e+18,Somebody,in an undisclosed location,,2020-05-12,2.0,,,Twitter for iPad
2,Americans don’t care about testing. We just wa...,1.260234e+18,tleighlu,"Baltimore, MD",,2020-05-12,0.0,,,Twitter for iPhone
3,Here’s When #CloroxWipes Will Be Back In Store...,1.260119e+18,Matthew Williams,"Robinson, TX",,2020-05-12,0.0,,,Twitter Web App
4,#repost @tarzianhardware\n・・・\n🚨CLOROX WIPES 🚨...,1.260022e+18,Park Slope 5th Ave,"iPhone: 40.731445,-74.276581",,2020-05-12,0.0,,,Instagram


## Pulling for Lysol

In [37]:
df_lysol = grabbing_tweets('#lysol', '2020-05-13', 15)

In [38]:
df_lysol.shape

(540, 10)

In [39]:
df_lysol

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
0,@juakogallardo De eso al #lysol sólo hay un pa...,1.260358e+18,😷@DENUNCIA CIUDADANA🤧,Colombia,,2020-05-12,1.0,,,Twitter for Android
1,Dear @Lysol can you please speed up your facto...,1.260348e+18,AntoniettaNacc,,,2020-05-12,0.0,,,Twitter for iPhone
2,"Sad but honestly, who did not know this was co...",1.260335e+18,Tamara McLanahan,Ethereally here...,,2020-05-12,0.0,,,Twitter Web App
3,"Lysol All-Purpose Cleaner Trigger, Lemon Breez...",1.260324e+18,Find 😷 Essentials & Save 💰 Shopping Online,,,2020-05-12,0.0,,,Twitter Web App
4,Kroger has had it! Don't even ask about the da...,1.260307e+18,AmyEller,"Atlanta, GA",,2020-05-12,0.0,,,Twitter for Android
...,...,...,...,...,...,...,...,...,...,...
40,It's bad enough that #Dems &amp; #China conspi...,1.257359e+18,Skeptic,,,2020-05-04,0.0,,,Twitter Web App
41,@Ya_Boi_Striker @bonesisvegan @marchtbood @rea...,1.257357e+18,Roy Phillips,NYC,,2020-05-04,1.0,,,Twitter for Android
42,@Lysol where are you hiding all the sponges?? ...,1.257354e+18,The Borderline Bombshell,"Missouri, USA",,2020-05-04,0.0,,,Twitter Web App
43,When you’re using #Lysol and see this! @z100Ne...,1.257351e+18,Robby Bridges,New England,,2020-05-04,1.0,,,Twitter for iPhone


## Pulling for Hydrogen Peroxide

In [40]:
df_hydrogen_peroxide = grabbing_tweets('hydrogenperoxide', '2020-05-13', 15)

In [41]:
df_hydrogen_peroxide.shape

(47, 10)

In [43]:
df_hydrogen_peroxide.head()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
0,RT @themachinemaker: Indian Peroxide Limited (...,1.260119e+18,Nuberg Engineering Ltd.,Global,,2020-05-12,0.0,,,Twitter for Android
1,RT @Machinist_WWM: #INDIANPEROXIDELIMITED dona...,1.260119e+18,Nuberg Engineering Ltd.,Global,,2020-05-12,0.0,,,Twitter for Android
2,RT @Machinist_WWM: #INDIANPEROXIDELIMITED dona...,1.260116e+18,Arun Tyagi,New Delhi,,2020-05-12,0.0,,,Twitter for Android
3,Available from Santa Cruz Animal Health!\n\n• ...,1.259907e+18,Santa Cruz Biotechnology,"Dallas, TX",,2020-05-11,0.0,,,TweetDeck
4,Available from Santa Cruz Animal Health!\n\n• ...,1.259907e+18,ChemCruz,"Dallas, TX",,2020-05-11,0.0,,,TweetDeck


In [45]:
df_hydrogen_peroxide['tweets'][3]

'Available from Santa Cruz Animal Health!\n\n• UltraCruz® #HandSanitizing Gel\n• UltraCruz® Isopropyl Alcohol (70% &amp; 99… https://t.co/xuKXdlAgqk'

## Pulling for Facemask

In [46]:
df_facemask = grabbing_tweets('facemask', '2020-05-13', 15)

In [47]:
df_facemask.shape

(865, 10)

In [50]:
df_facemask.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
95,RT @Koreaboo: B.I. Makes Generous 1000 Mask Do...,1.257456e+18,flowerbin131,"Porto, Portugal",,2020-05-04,0.0,,,Twitter for Android
96,"Where's her facemask???? Arrest her, officer! ...",1.257456e+18,Pyrnassius RS,Australia,,2020-05-04,0.0,,,Twitter Web App
97,76 in Lagos already? Ncdc &amp; the presidenc...,1.257456e+18,F.A.O,"Abuja, Nigeria",,2020-05-04,32.0,,,Twitter for Android
98,Available po ngayon po. Salamat po. Pick up on...,1.257456e+18,💙,,,2020-05-04,0.0,,,Twitter for Android
99,RT @Koreaboo: B.I. Makes Generous 1000 Mask Do...,1.257456e+18,cabbage,,,2020-05-04,0.0,,,Twitter for iPhone


## Pulling for Isopropyl Alcohol

In [51]:
df_isopropyl_alcohol = grabbing_tweets('isopropyl alcohol', '2020-05-13', 15)

In [52]:
df_isopropyl_alcohol.shape

(876, 10)

In [53]:
df_isopropyl_alcohol.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
72,RT @AkhileshSingi: @SayftyCom @HeForShe A6. Us...,1.25735e+18,srinu,"Hyderabad, India",,2020-05-04,0.0,,,Twitter for iPhone
73,@ricketyoldshack i heard something about isopr...,1.257349e+18,Harry Tuttle,cyberspace,,2020-05-04,0.0,,,Twitter Web App
74,RT @AkhileshSingi: @SayftyCom @HeForShe A6. Us...,1.257348e+18,Rohit Kankalla,,,2020-05-04,0.0,,,Twitter for Android
75,@PGenium @lakin1013 @osullivanauthor @itsJeffT...,1.257348e+18,Rudy Colludy,,,2020-05-04,0.0,,,Twitter for Android
76,Highline Wellness CBD Hand Sanitizer contains ...,1.257347e+18,Shape Magazine,"New York, NY",,2020-05-04,0.0,,,True Anthem


## Pulling for Paper Towels

In [54]:
df_papertowels = grabbing_tweets('#papertowels', '2020-05-13', 15)

In [56]:
df_papertowels.shape

(45, 10)

In [55]:
df_papertowels.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
40,Look what I scored this morning! Take time to ...,1.25771e+18,Linda Takahashi,,,2020-05-05,5.0,,Place(_api=<tweepy.api.API object at 0x11430af...,Twitter for iPhone
41,Do not flush #wipes or #papertowels down the #...,1.257698e+18,Irela Bague,"Coral Gables, Florida",,2020-05-05,0.0,,,Buffer
42,Cont 5- Now people are going out in crowds #Sh...,1.257671e+18,Angel,"New Bern,N.C",,2020-05-05,1.0,,,Twitter for Android
43,Bounty Quick-Size White/16 is in stock.. Get i...,1.257426e+18,papertoweljan,,,2020-05-04,0.0,,,Zapier.com
44,My drawing \npaper towels 🖇\n#jimin #jiminfana...,1.257356e+18,seesaw.jikook ⁷,sope,,2020-05-04,5.0,,,Twitter for Android


In [57]:
hand_sanitizer = grabbing_tweets('#handsanitizer', '2020-05-13', 10)

In [58]:
hand_sanitizer.shape

(896, 10)

In [59]:
hand_sanitizer.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
95,La Palm Hand Sanitizer 8oz is NOW AVAILABLE!!\...,1.257389e+18,Triple Vitamin,,,2020-05-04,0.0,,,Social Media Publisher App
96,NEW Hand Sanitizer Packets Single Use. Keep a ...,1.257389e+18,Gel II Manicure,,,2020-05-04,0.0,,,Social Media Publisher App
97,RT @stevengregory: #handsanitizer missing from...,1.257389e+18,Christine Klein - Text TRUMP to 88022,"California, USA",,2020-05-04,0.0,,,Twitter Web App
98,EASIEST work or home sanitizer recipe:\n\n1. R...,1.257389e+18,OK Truckee,,,2020-05-04,0.0,,,Twitter Web App
99,RT @stevengregory: #handsanitizer missing from...,1.257388e+18,John and Ken,Southern California,,2020-05-04,0.0,,,Twitter Web App


## Pulling for Disinfectant

In [60]:
disinfectant = grabbing_tweets('disinfectant', '2020-05-13', 10)

In [61]:
disinfectant.shape

(900, 10)

In [62]:
disinfectant.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
95,@WillBlackWriter Great article Will! Spot on. ...,1.257457e+18,💧Denise Allen 🧚‍♀️🧚‍🧚‍♀️Taungurung country 💜,Socialist Republic Victoria,,2020-05-04,2.0,,,Twitter for Android
96,"@PolitiBunny @OrdyPackard CDC be damned, it’s ...",1.257457e+18,Wuhan O’Houlihan,"Oklahoma City, OK",,2020-05-04,2.0,,,Twitter for iPhone
97,RT @JoeBiden: UV light? Injecting disinfectant...,1.257456e+18,joejustjoe,United States,,2020-05-04,0.0,,,Twitter for Android
98,@bradpickett444 @WordySmyth @PatriotKimmie @An...,1.257456e+18,Rumpturds,,,2020-05-04,0.0,,,Twitter Web App
99,RT @kia_kan: @riversshx @DanRather In a surgic...,1.257456e+18,Cynthia Seym ®™Text Trump to 88022,"North Carolina, USA",,2020-05-04,0.0,,,Twitter for Android


## Pulling for Toilet Paper

In [63]:
toilet_paper = grabbing_tweets('toiletpaper', '2020-05-13', 10)

In [64]:
toilet_paper.shape

(875, 10)

In [66]:
toilet_paper.tail()

Unnamed: 0,tweets,id,name,location,coordinates,created_at,favorite_count,geo,place,source
75,Scott Rapid-Dissolving Toilet Paper 8 rolls in...,1.257415e+18,Toilet Paper Tracker,,,2020-05-04,0.0,,,xboxonelocator
76,"#Teachers, #Librarians &amp; #Parents: Here's ...",1.257414e+18,Donna Gephart,,,2020-05-04,6.0,,,Twitter Web App
77,RT @ReinasWorld2020: @rebel_fla @deplorable_ru...,1.257413e+18,Hurricane,,,2020-05-04,0.0,,,Twitter for Android
78,That explains a lot 😂 ...,1.257412e+18,Jessica Weber,"Ontario, Canada",,2020-05-04,0.0,,,Instagram
79,RT @ReinasWorld2020: @rebel_fla @deplorable_ru...,1.257412e+18,Melinda Roberts ⭐⭐⭐,The Right,,2020-05-04,0.0,,,Twitter for Android


# Concating All the Dataframes into One

In [69]:
all_commodities = pd.concat([df_clorox,
            df_clorox_2,
            df_clorox_wipes,
            df_cloroxwipes_nohash,
            df_lysol,
            df_hydrogen_peroxide,
            df_facemask, 
            df_isopropyl_alcohol,
            df_papertowels, 
            hand_sanitizer,
            disinfectant,
            toilet_paper], axis = 0)

In [70]:
all_commodities.shape

(6661, 10)

# Checking for Duplicates and Removing Them

In [71]:
bool_series = all_commodities['id'].duplicated(keep=False)
all_commodities = all_commodities[~bool_series]
all_commodities.shape

(6189, 10)

# Turning all the Tweets into a CSV

In [72]:
all_commodities.to_csv('datasets/all_commodities_tweets.csv', index=False)