# Access Twitter API

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

## Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://apps.twitter.com/ and create a sample application.

Choose any name for your application, write a description and use `http://google.com` for the website. Depending on your country, you may need to first have your phone number updated in your Twitter profile.

Click on the **Key and Access Tokens** tab. There are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 

* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

The first time you execute the notebook, add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

In [142]:
import pickle
import os
import twitter
import json
from collections import Counter
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

In [4]:
if not os.path.exists('secret_twitter_credentials.pkl'):
    Twitter={}
    Twitter['Consumer Key'] = '' # enter your consumer key here, then after storing it into your pkl file delete it 
    Twitter['Consumer Secret'] = '' # your consumer secret
    Twitter['Access Token'] = ''    # your access token
    Twitter['Access Token Secret'] = '' # your Access Token Secret
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))

## Authorizing an application to access Twitter account data

In [9]:
# creates an authentication object
auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

# authentication is used to create a Twitter API object
twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

type(twitter_api)

twitter.api.Twitter

## Retrieving trends in a particular location

Trending topics are hashtags (#example) or the words that are currently popular. They are categorized by country or city.

Twitter identifies locations using the [Yahoo! Where On Earth ID](http://woeid.rosselliot.co.nz). The Yahoo! Where On Earth ID for the entire world is 1.

In [33]:
WORLD_WOE_ID = 1
US_WOE_ID = 23424977
LOCAL_WOE_ID = 2486340 #The WOEID for Sacramento - you can change it to your local WOEID

Returns top 50 popular trends in the specified location:

In [34]:
world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

In [65]:
print type(us_trends)

trends = us_trends[0]
print type(trends)
print trends.keys()

<class 'twitter.api.TwitterListResponse'>
<type 'dict'>
[u'created_at', u'trends', u'as_of', u'locations']


In [68]:
print trends['locations']
print trends['created_at']
print trends['as_of']

trends_tr = trends['trends']
print type(trends_tr)
print 'Number of trends:', len(trends_tr)
print 'The most popular trend:', trends_tr[0]

[{u'woeid': 23424977, u'name': u'United States'}]
2018-05-04T23:13:18Z
2018-05-04T23:19:10Z
<type 'list'>
Number of trends: 50
The most popular trend: {u'url': u'http://twitter.com/search?q=%22Matt+Harvey%22', u'query': u'%22Matt+Harvey%22', u'tweet_volume': 18595, u'name': u'Matt Harvey', u'promoted_content': None}


## Displaying API responses as JSON format

The API response is in JSON format, which is a data format used to transfer data on the web. It is roughly equivalent to nested Python dictionaries and lists.

In [77]:
print((json.dumps(trends_tr, indent=1))) # indent every paranethesis by a character

[
 {
  "url": "http://twitter.com/search?q=%22Matt+Harvey%22", 
  "query": "%22Matt+Harvey%22", 
  "tweet_volume": 18595, 
  "name": "Matt Harvey", 
  "promoted_content": null
 }, 
 {
  "url": "http://twitter.com/search?q=%22DJ+Khaled%22", 
  "query": "%22DJ+Khaled%22", 
  "tweet_volume": 102294, 
  "name": "DJ Khaled", 
  "promoted_content": null
 }, 
 {
  "url": "http://twitter.com/search?q=%22Monomoy+Girl%22", 
  "query": "%22Monomoy+Girl%22", 
  "tweet_volume": null, 
  "name": "Monomoy Girl", 
  "promoted_content": null
 }, 
 {
  "url": "http://twitter.com/search?q=%23MillennialBedtimeStories", 
  "query": "%23MillennialBedtimeStories", 
  "tweet_volume": null, 
  "name": "#MillennialBedtimeStories", 
  "promoted_content": null
 }, 
 {
  "url": "http://twitter.com/search?q=%23NRAConvention", 
  "query": "%23NRAConvention", 
  "tweet_volume": 34500, 
  "name": "#NRAConvention", 
  "promoted_content": null
 }, 
 {
  "url": "http://twitter.com/search?q=%23YeahYoureWelcome", 
  "query

## Computing the intersection of two sets of trends

Collect the names of most popular trends:

In [70]:
trends_set = {}
trends_set['world'] = set([trend['name'] 
                        for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name'] 
                     for trend in us_trends[0]['trends']]) 

trends_set['local'] = set([trend['name'] 
                     for trend in local_trends[0]['trends']]) 

Print the trend names:

In [80]:
for loc in ['world','us','local']:
    print '-'*10,loc
    print(','.join(trends_set[loc]))

---------- world
#تكفي_ياولي_العهد_مكرمه_نقل,井上堯之さん死去,#الاهلي_الترجي,#تيار_المستقبل,#propagandalive,#FosStoTounel,#KohLanta,#YeahYoureWelcome,#シンカリオン,#LOS40PrimaveraPop,#quartogrado,#PetroJuventudYFuturo,#كلمه_لاعداء_السعوديه,#MillennialBedtimeStories,#هل_تتزوج_مطلقه_او_عانس,#NRAConvention,#NoRolêMTVHits,Luana Piovani,#ASCPSG,#Aldosivi,Monomoy Girl,#latelate,#NewSongJulianSERRANO,Brighton,#VolverteAVer,#GipsyKings3,こどもの日,DJ Khaled,#MeVieneLaWeaCuando,EnAnlamlıİyiKi UlasAstepe,#FridaySdvFcs,#العربي_السالميه,Rashford,#DeusSalveORei100,Matt Harvey,#FridayNightDinner,#سعوديات_نطلب_اسقاط_الولايه667,HACKED BY MB,#NosOrganizamosY,#MeadeConoceMisCausas,Arda Turan,#onstorm,#FSRADIOSUR,Copa América,#poweroflovegr,#NinjaWarrior5,#الحب_مع_الوقت_يكون,#Aitana1M,#RoarForChange,#LaCorrida
---------- us
MSU 0,Courtney Barnett,Richard Marx,#WPXIStorm,Smash Mouth,Amber Alert,Riley Nash,Larry Hunter,#MayThe4BeWithYou,Jason Adam,Fort Mill,Michael Skakel,Tedy Bruschi,Manafort,Rushing Fall,#CyberAware,#Mille

In [74]:
print(( '='*10,'intersection of world and US'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10,'intersection of US and local'))
print((trends_set['local'].intersection(trends_set['us'])))

set([u'DJ Khaled', u'#MillennialBedtimeStories', u'#NRAConvention', u'Monomoy Girl', u'#YeahYoureWelcome', u'Matt Harvey', u'Brighton'])
set([u'Courtney Barnett', u'Richard Marx', u'#WPXIStorm', u'Smash Mouth', u'Amber Alert', u'Riley Nash', u'Larry Hunter', u'#MayThe4BeWithYou', u'Fort Mill', u'Michael Skakel', u'Tedy Bruschi', u'David Turpin', u'Manafort', u'Rushing Fall', u'#MillennialBedtimeStories', u'#VRLA2018', u'#NRAConvention', u'#USOW2018', u'#JEGS200', u'Kim Reynolds', u'Monomoy Girl', u'Charlie Rose and CBS News', u'DJ Khaled', u'Brighton', u'#CarbLoadASongOrBand', u'57,000 Hondurans', u'Coach Rocks', u'#CrimeCon', u'#GGGVanes', u'#JunotDiaz', u'#SouthernNats', u'Cliff Avril', u'Jason Adam', u'#YeahYoureWelcome', u'Waymo', u'#BHAMUN', u'Mitt Romney', u'Billy Turner', u'#ECU18', u'Alvin Gentry', u'#MexicoSeries', u'Matt Harvey', u'#dearwhitepeopleseason2'])


## Collecting search results

Set the variable `q` to a trending topic, or anything else for that matter. 

In [104]:
topic = '#MillennialBedtimeStories' 
number = 100

search_results = twitter_api.search.tweets(q=topic, count=number)
statuses = search_results['statuses']

In [105]:
print type(search_results)
print search_results.keys()
print len(statuses)
print statuses[0].keys()
print statuses[0]['text']

<class 'twitter.api.TwitterDictResponse'>
[u'search_metadata', u'statuses']
100
[u'contributors', u'truncated', u'text', u'is_quote_status', u'in_reply_to_status_id', u'id', u'favorite_count', u'entities', u'retweeted', u'coordinates', u'source', u'in_reply_to_screen_name', u'in_reply_to_user_id', u'retweet_count', u'id_str', u'favorited', u'retweeted_status', u'user', u'geo', u'in_reply_to_user_id_str', u'possibly_sensitive', u'lang', u'created_at', u'in_reply_to_status_id_str', u'place', u'extended_entities', u'metadata']
RT @MrRaceBannon: The Three Little Pigs  #MillennialBedtimeStories https://t.co/wSsicFEeQx


Twitter often returns duplicate results, we can filter them out checking for duplicate texts:

In [106]:
all_text = []
filtered_statuses = []
for s in statuses:
    if not s["text"] in all_text:
        all_text.append(s["text"])
        filtered_statuses.append(s)
statuses = filtered_statuses     

In [107]:
len(statuses)

82

In [109]:
all_text = [s['text'] for s in statuses]
all_text[:10]

[u'RT @MrRaceBannon: The Three Little Pigs  #MillennialBedtimeStories https://t.co/wSsicFEeQx',
 u'#MillennialBedtimeStories Once upon a ending millennium convinced the world was going to end due to software, some\u2026 https://t.co/598meHPpeN',
 u'#MillennialBedtimeStories\nABC Land',
 u'RT @kevinwxgg: A gender-fluid blonde and the three furry beings who prefer not to be labled. \n#MillennialBedtimeStories',
 u'@ #Robot #Vacuum Cleaner with 1000PA Power Suction for Thin Carpet!!\n\nproduct link: https://t.co/rLxf3Eefet\n specif\u2026 https://t.co/pdk6XSlApV',
 u'Nancy Drew and the Nigerian Prince Mystery\n  #MillennialBedtimeStories',
 u'RT @billy4ever9: #MillennialBedtimeStories Beauty And The Beast\U0001f607\U0001f47f https://t.co/AK7A3tRDvP',
 u'RT @HashtagRoundup: HOT HASHTAGS\n\n#4 USA #MillennialBedtimeStories w/@BrieHxC \n\n#5 USA #WhatStarWarsTaughtMe w/@HappyHourTags \n\n#6 USA #Yea\u2026',
 u'RT @WahkersRevolt: \u201cThe Very Hungry &amp; Greedy Generation Before Us That Des

Explore one of the tweets:

In [114]:
# Show one sample search result by slicing the list
t = statuses[0]
print(json.dumps(t, indent=1))

{
 "contributors": null, 
 "truncated": false, 
 "text": "RT @MrRaceBannon: The Three Little Pigs  #MillennialBedtimeStories https://t.co/wSsicFEeQx", 
 "is_quote_status": false, 
 "in_reply_to_status_id": null, 
 "id": 992559856061214722, 
 "favorite_count": 0, 
 "entities": {
  "symbols": [], 
  "user_mentions": [
   {
    "id": 40444923, 
    "indices": [
     3, 
     16
    ], 
    "id_str": "40444923", 
    "screen_name": "MrRaceBannon", 
    "name": "Mister Race Bannon"
   }
  ], 
  "hashtags": [
   {
    "indices": [
     41, 
     66
    ], 
    "text": "MillennialBedtimeStories"
   }
  ], 
  "urls": [], 
  "media": [
   {
    "source_user_id": 40444923, 
    "source_status_id_str": "992514426061312000", 
    "expanded_url": "https://twitter.com/MrRaceBannon/status/992514426061312000/photo/1", 
    "display_url": "pic.twitter.com/wSsicFEeQx", 
    "url": "https://t.co/wSsicFEeQx", 
    "media_url_https": "https://pbs.twimg.com/media/DcYemxEXkAQ3RJG.jpg", 
    "source_user_id_s

In [115]:
print(t['retweet_count'])
print(t['retweeted'])

28
False


In [116]:
## Extracting text, screen names (usernames of twitter account), and hashtags from tweets

status_texts = [status['text'] for status in statuses ]

screen_names = [user_mention['screen_name'] for status in statuses for user_mention in status['entities']['user_mentions'] ]

hashtags = [hashtag['text'] for status in statuses for hashtag in status['entities']['hashtags'] ]

In [131]:
print 'Texts:\n', status_texts[:5]
print '\nUsernames:', screen_names[:5]
print '\nHashtags:', hashtags[:5]

Texts:
[u'RT @MrRaceBannon: The Three Little Pigs  #MillennialBedtimeStories https://t.co/wSsicFEeQx', u'#MillennialBedtimeStories Once upon a ending millennium convinced the world was going to end due to software, some\u2026 https://t.co/598meHPpeN', u'#MillennialBedtimeStories\nABC Land', u'RT @kevinwxgg: A gender-fluid blonde and the three furry beings who prefer not to be labled. \n#MillennialBedtimeStories', u'@ #Robot #Vacuum Cleaner with 1000PA Power Suction for Thin Carpet!!\n\nproduct link: https://t.co/rLxf3Eefet\n specif\u2026 https://t.co/pdk6XSlApV']

Usernames: [u'MrRaceBannon', u'kevinwxgg', u'billy4ever9', u'HashtagRoundup', u'BrieHxC']

Hashtags: [u'MillennialBedtimeStories', u'MillennialBedtimeStories', u'MillennialBedtimeStories', u'MillennialBedtimeStories', u'Robot']


In [134]:
# collection of all words from all tweets
words = [w for t in status_texts for w in t.split() ]
print words[:15]

[u'RT', u'@MrRaceBannon:', u'The', u'Three', u'Little', u'Pigs', u'#MillennialBedtimeStories', u'https://t.co/wSsicFEeQx', u'#MillennialBedtimeStories', u'Once', u'upon', u'a', u'ending', u'millennium', u'convinced']


## Basic frequency distribution of the words in tweets

In [137]:
word_counter = Counter(words)
word_counter.most_common()[:10]

[(u'#MillennialBedtimeStories', 66),
 (u'RT', 41),
 (u'the', 31),
 (u'The', 24),
 (u'and', 18),
 (u'a', 11),
 (u'A', 9),
 (u'#MillennialBedTimeStories', 8),
 (u'Little', 8),
 (u'to', 6)]

In [140]:
sorted_word_counts = sorted(word_counter.values(), reverse=True)

In [146]:
for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print(c.most_common()[:10]) # top 10
    print

[(u'#MillennialBedtimeStories', 66), (u'RT', 41), (u'the', 31), (u'The', 24), (u'and', 18), (u'a', 11), (u'A', 9), (u'#MillennialBedTimeStories', 8), (u'Little', 8), (u'to', 6)]

[(u'eatpraystyle', 6), (u'Nalahru', 6), (u'Rizember', 6), (u'FioNoire', 6), (u'BrushingOff', 2), (u'MrRaceBannon', 2), (u'billy4ever9', 2), (u'JeffHendrix88', 1), (u'Superbokka', 1), (u'kevinwxgg', 1)]

[(u'MillennialBedtimeStories', 66), (u'MillennialBedTimeStories', 8), (u'WhatStarWarsTaughtMe', 1), (u'Vacuum', 1), (u'Robot', 1), (u'millennialbedtimestories', 1)]



In [158]:
##Create a prettyprint function to display tuples in a nice tabular format

def prettyprint_counts(label, list_of_tuples):
    # the header of the table: first write the "label", pad it to 20 spaces and center-align the label.
    # then, write the word "Count", pad it to 6 spaces and center-align it.
    print("\n{:^20} | {:^6}".format(label, "Count"))
    # print * for 40 times
    print("*"*40)
    for k,v in list_of_tuples:
        # for each word and its count, print the word as left-aligned and the count as right-aligned. 
        print("{:20} | {:>6}".format(k,v))

In [156]:
for label, data in (('Word', words), ('Screen Name', screen_names), ('Hashtag', hashtags)):
    c = Counter(data)
    prettyprint_counts(label, c.most_common()[:10])


        Word         | Count 
****************************************
#MillennialBedtimeStories |     66
RT                   |     41
the                  |     31
The                  |     24
and                  |     18
a                    |     11
A                    |      9
#MillennialBedTimeStories |      8
Little               |      8
to                   |      6

    Screen Name      | Count 
****************************************
eatpraystyle         |      6
Nalahru              |      6
Rizember             |      6
FioNoire             |      6
BrushingOff          |      2
MrRaceBannon         |      2
billy4ever9          |      2
JeffHendrix88        |      1
Superbokka           |      1
kevinwxgg            |      1

      Hashtag        | Count 
****************************************
MillennialBedtimeStories |     66
MillennialBedTimeStories |      8
WhatStarWarsTaughtMe |      1
Vacuum               |      1
Robot                |      1
millennialbedtim

## Finding the most popular retweets

In [170]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text'].replace("\n","\\")) #replace a new line with backslash
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if 'retweeted_status' in status
           ]

In [173]:
retweets[:5]

[(28,
  u'MrRaceBannon',
  u'RT @MrRaceBannon: The Three Little Pigs  #MillennialBedtimeStories https://t.co/wSsicFEeQx'),
 (3,
  u'kevinwxgg',
  u'RT @kevinwxgg: A gender-fluid blonde and the three furry beings who prefer not to be labled. \\#MillennialBedtimeStories'),
 (2,
  u'billy4ever9',
  u'RT @billy4ever9: #MillennialBedtimeStories Beauty And The Beast\U0001f607\U0001f47f https://t.co/AK7A3tRDvP'),
 (8,
  u'HashtagRoundup',
  u'RT @HashtagRoundup: HOT HASHTAGS\\\\#4 USA #MillennialBedtimeStories w/@BrieHxC \\\\#5 USA #WhatStarWarsTaughtMe w/@HappyHourTags \\\\#6 USA #Yea\u2026'),
 (1,
  u'WahkersRevolt',
  u'RT @WahkersRevolt: \u201cThe Very Hungry &amp; Greedy Generation Before Us That Destroyed an Economy &amp; Environment.\u201d It\u2019s a vey scary story. #Millen\u2026')]

In [176]:
sorted_retweets = sorted(retweets, reverse=True)
sorted_retweets[:5]

[(207,
  u'DaSkrambledEgg',
  u"RT @DaSkrambledEgg: The Little engine that couldn't even #MillennialBedTimeStories"),
 (121,
  u'BrushingOff',
  u'RT @BrushingOff: Bi-Curious George. #MillennialBedTimeStories https://t.co/4oQXYhsND3'),
 (85,
  u'MrRaceBannon',
  u'RT @MrRaceBannon: Hansel and Regretel #MillennialBedtimeStories https://t.co/U06Pr4Ou60'),
 (79,
  u'colbywinters',
  u'RT @colbywinters: Little red riding hood AF\\ #MillennialBedtimeStories'),
 (62,
  u'iamalmostlegend',
  u'RT @iamalmostlegend: Organic Green Eggs &amp; Free Range Ham \\#MillennialBedtimeStories')]

In [185]:
##Create a prettyprint function to display tweets with their retweet counts

row_template = "{:^7} | {:^15} | {:50}"
def prettyprint_tweets(list_of_tuples):
    print()
    print(row_template.format("Count", "Screen Name", "Text"))
    print("*"*60)
    for count, screen_name, text in list_of_tuples:
        print(row_template.format(count, screen_name, text[:50].encode('utf-8')))
        # split the text of the tweet in up to 3 lines, if needed
        if len(text) > 50:
            print(row_template.format("", "", text[50:100].encode('utf-8')))
            if len(text) > 100:
                print(row_template.format("", "", text[100:].encode('utf-8')))


In [186]:
prettyprint_tweets(sorted_retweets)

()
 Count  |   Screen Name   | Text                                              
************************************************************
  207   | DaSkrambledEgg  | RT @DaSkrambledEgg: The Little engine that couldn'
        |                 | t even #MillennialBedTimeStories                  
  121   |   BrushingOff   | RT @BrushingOff: Bi-Curious George. #MillennialBed
        |                 | TimeStories https://t.co/4oQXYhsND3               
  85    |  MrRaceBannon   | RT @MrRaceBannon: Hansel and Regretel #MillennialB
        |                 | edtimeStories https://t.co/U06Pr4Ou60             
  79    |  colbywinters   | RT @colbywinters: Little red riding hood AF\ #Mill
        |                 | ennialBedtimeStories                              
  62    | iamalmostlegend | RT @iamalmostlegend: Organic Green Eggs &amp; Free
        |                 |  Range Ham \#MillennialBedtimeStories             
  57    |   BrushingOff   | RT @BrushingOff: Charlotte's Webcam. #M