# Twitter data

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

# Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.

Choose any name for your application, write a description and use `http://google.com` for the website.

Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

The first time you execute the notebook, add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

In [1]:
import pickle
import os

As you see in this code block, in this if statement,
we create an object called Twitter and we use it
to store our access credentials.
So consumer key, consumer secret,
they are all stored in the Twitter object.
If the pickled credentials exist,
it will just load the credentials from the pickle file
into the Twitter object.

What does pickle do, right?
It's a cute name for a Python utility module
to save any Python object or data structure on disk.
Pickle will do something special called serialization.
To convert any Python object or in this case,
this Twitter object, into a character stream
so the object can be created later in Python
when we need it.
Reconstruction of that object is called deserialization.
So pickle will do this for your Twitter access credentials
and when we come back, it loads it
back into Twitter as an object.

In [2]:
if not os.path.exists('secret_twitter_credentials.pkl'):
    Twitter={}#creating object twitter to save access credentials
    Twitter['Consumer Key'] = 'pupiHYsEGgi363iUyH3wsrgQL'
    Twitter['Consumer Secret'] = '4zkm8MMgyAjP9MnHlA3naHdmctOsEs3Wf03DvNEe9wCojc2XEF'
    Twitter['Access Token'] = '1092500612711882752-mIywgatuE9SK86DISdIVXJ9X3ccuMc'
    Twitter['Access Token Secret'] = '9u7dT9mgYmNH5bqiYlkWfMuXV8dUD2vFVI6ldHOHtuQS8'
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))

Okay, in the next code block,
we'll use the Twitter object we started
and we start working with the Twitter API
to create a new object called authentication, or auth.
Here we have our A-U-T-H, the auth.
We will use this authentication to create
a Twitter API object.
So what we've done is, we called twitter.
We imported twitter package in Python first.
We called twitter and used the auth function
to create the authentication object, or that class creation.
And then we'll now use the Twitter API
to use this authentication
and create a Twitter API object.

In [3]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

#Nothing to see by displayng twitter_api except that it's now 
# a defined variable

print(twitter_api)

<twitter.api.Twitter object at 0x7f57940db860>


## Example 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID.

The Yahoo! Where On Earth ID for the entire world is 1.
See https://dev.twitter.com/docs/api/1.1/get/trends/place and
http://developer.yahoo.com/geo/geoplanet/

look at the BOSS placefinder here: https://developer.yahoo.com/boss/placefinder/

Sitas paskutinis  linkas skirtas ieskot vietos ID numerio

In [4]:
WORLD_WOE_ID = 1
US_WOE_ID = 23424977
LOCAL_WOE_ID=2487889
#LTU_WOE_ID = 598544
# KAUNAS_WOE_ID = 55848084

Look for the WOEID for [san-diego](http://woeid.rosselliot.co.nz/lookup/san%20diego%20%20ca)

You can change it to another location.

In [5]:
# Prefix ID with the underscore for query string parameterization.
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)
world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
#ltu_trends = twitter_api.trends.place(_id=LTU_WOE_ID)
# kns_trends = twitter_api.trends.place(_id=KAUNAS_WOE_ID)

In [6]:
#print(world_trends[:2])

In [7]:
trends = world_trends
print(type(trends))
# print(list(trends[0].keys()))
# print(trends[0]['trends'])

<class 'twitter.api.TwitterListResponse'>


## Example 3. Displaying API responses as pretty-printed JSON


So in this line we are using the dumps function of JSON
to create a better
or more prettier version of the same output.
Here we said the indentation format,
we are saying indent every neve parenthesis
or every neve level we would call in JSON
with one character.

In [8]:
import json

print((json.dumps(us_trends[:2], indent=1)))

[
 {
  "trends": [
   {
    "name": "#NGWSD",
    "url": "http://twitter.com/search?q=%23NGWSD",
    "promoted_content": null,
    "query": "%23NGWSD",
    "tweet_volume": 11078
   },
   {
    "name": "#WednesdayWisdom",
    "url": "http://twitter.com/search?q=%23WednesdayWisdom",
    "promoted_content": null,
    "query": "%23WednesdayWisdom",
    "tweet_volume": 106888
   },
   {
    "name": "#ThingsIKnowIllNeverDo",
    "url": "http://twitter.com/search?q=%23ThingsIKnowIllNeverDo",
    "promoted_content": null,
    "query": "%23ThingsIKnowIllNeverDo",
    "tweet_volume": null
   },
   {
    "name": "Matt Bryant",
    "url": "http://twitter.com/search?q=%22Matt+Bryant%22",
    "promoted_content": null,
    "query": "%22Matt+Bryant%22",
    "tweet_volume": null
   },
   {
    "name": "#NationalSigningDay",
    "url": "http://twitter.com/search?q=%23NationalSigningDay",
    "promoted_content": null,
    "query": "%23NationalSigningDay",
    "tweet_volume": 12558
   },
   {
    "name": 

## Example 4. Computing the intersection of two sets of trends

In [9]:
trends_set = {}
trends_set['world'] = set([trend['name']
                           for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name']
                        for trend in us_trends[0]['trends']])

trends_set['san diego'] = set([trend['name']
                               for trend in local_trends[0]['trends']])


doing is
we are first creating a four loop
that joins all the trends for a particular location
and prints them in pretty format.

In [10]:
for loc in ['world', 'us', 'san diego']:
    print(('-'*10, loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
#Vurgun,#NGWSD,#HalaMadrid,Malcom,#EFA_Violates_Decisions_of_CAF,#NSD19,CanlaBaşla TahsinBABAŞla,Matt Bryant,Virginia,#TürkHürriyetsizOlmaz,#LulaLivre2043,#MengoniSanremo2019,#Gasco,#الكلاسيكو,#HazalAliWedding,#GSvHTY,#برشلونه_ريال_مدريد,#FCVBPSG,#YoVoy,#NefeseTeslim,#ZiyaHocam1200UcretliyiAta,#AgrubuKadroBekliyor,#specialplaceinhell,Donald Tusk,Gabriela Hardt,#Mehdi,#BSCFCB,#DirilişTurgutAltınok,Villefranche,#Assauer,#ChauNetflix,Ibagué,#EVEMCI,#6Feb,#EtimesgutaAylinBaşkan,#HanaNaCasaDeVidro,#ThingsIKnowIllNeverDo,#ElClásico,Aysti Kola,#GeziciAnketRezaleti,#النصر_القادسيه,#وفاه_الفريق_سعود_الهلال,ElClasico Hilbettvde,#NationalSigningDay,#askmeek,#şuleçetiçinadalet,#ไทยรักษาชาติ,Mais 12,TwitarttırCom Açıldı,Atibaia
('----------', 'us')
#NGWSD,#WallpaperWednesday,#wednesdaythoughts,#MannequinComplaints,#HalaMadrid,Signing Day,#NSD19,#WednesdayMotivation,Matt Bryant,#SadComicBooks,Vanessa Tyson,#globalplayday,Charlie Collier,#HowIGetRedemption,#KISDBragChat,#ipada

In [11]:
print(('='*10, 'intersection of world and us'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10, 'intersection of us and  san-diego'))
print((trends_set['san diego'].intersection(trends_set['us'])))

{'#NGWSD', '#NationalSigningDay', '#HalaMadrid', '#askmeek', '#NSD19', 'Matt Bryant', '#EVEMCI', '#ThingsIKnowIllNeverDo', '#ElClásico'}
{'#WallpaperWednesday', '#NGWSD', 'Brian Boyle', '#wednesdaythoughts', '#HalaMadrid', '#MannequinComplaints', '#NSD19', '#WednesdayMotivation', 'Matt Bryant', '#SadComicBooks', 'Vanessa Tyson', '#HowIGetRedemption', '#KISDBragChat', '#PelosiClap', 'Mark Herring', 'World War Z', '#UnitedWayChat', '#mespamn', '#GSPD2019', 'Ishmael Sopsher', '#FunkoTwitterLive', '#WednesdayWisdom', 'Stanley Johnson', '#CarinaPitch', 'Bob Marley', '#LeadHerForward', 'Kurtis Blow', '#EVEMCI', '#TakeItBack', '#frank2019', '#ThingsIKnowIllNeverDo', '#NSD2019', 'Bob Stoops', '#ElClásico', '#Team211', 'devonta lee', 'Thon Maker', '#NationalSigningDay', 'Malachi Richardson', 'World Bank', 'Kirk Cox', '#askmeek', 'Cody McLeod', 'Virginia Democrats'}


## Example 5. Collecting search results

Set the variable `q` to a trending topic, 
or anything else for that matter. The example query below
was a trending topic when this content was being developed
and is used throughout the remainder of this chapter

In [12]:
q = '#NSD19'
number = 100
# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q, count=number)
statuses = search_results['statuses']

In [15]:
print(len(statuses))
#print(statuses)

100


Twitter often returns duplicate results, we can filter them out checking for duplicate texts:

In [16]:
all_text = []
filtered_statuses = []
for s in statuses:
    if not s['text'] in all_text:
        filtered_statuses.append(s)
        all_text.append(s['text'])
statuses = filtered_statuses

And if the same text of the tweet,
which is the tweet message,
is not in already in that all_text
that we are keeping track of,
we are going to append that to filtered_statuses.
And in the end when we are done with this for loop,
we'll assign this filtered_statuses to statuses
that object we had as the response from Twitter before.

In [17]:
print(len(statuses))

90


In [18]:
[s['text'] for s in search_results['statuses']]

['RT @NickRalston22: 🚨 New addition to the #cULture on Feb. 6, 2019!\n\n🗣 Nick Ralston\n📍 Argyle, Texas\n📝 Arizona State University\n🏈 Tight End…',
 'RT @vypehouston: #NSD19 at @NSSHS_gpisd was a busy one with 16 players signing. Caught up with a few. \n\n@Tbradford97 - @TexasTechFB \n@Ajani…',
 'RT @king00770975: 🚨 New addition to the #cULture on Feb. 6, 2019!\n\n🗣 King McGowen\n📍 Willis, Texas\n📝 Willis HS\n🏈 Offensive Line \n\n#NSD19 | #…',
 'RT @mcmichael20: NSD Q&amp;A with Louisville signee @bigdes01 : https://t.co/dx1RrESLg6 #NSD19',
 'RT @MichaelEhrlich: PUNTERS ARE BRANDS TOO.\n\n🙌 @louishedley1 🙌\n\n#NSD19\n#NationalSigningDay\n#BrandFood https://t.co/KmVNiymrFv',
 'RT @BSCFootball: Dylon Kelley can do it all! Versatile athlete from Gulf Breeze, FL.\n\n#Excellence\n#NSD19 https://t.co/B1Pwnff82H',
 'RT @SWOSUFootball: Welcome @sauccy_k to the family! First #TXDawg! \n"Kingsley is a QB from the Lone Star State that will bring us great lea…',
 'RT @FoleyLionsFB: Congratulations

In [19]:
# Show one sample search result by slicing the list...

print(json.dumps(statuses[0], indent=1))

{
 "created_at": "Wed Feb 06 20:09:28 +0000 2019",
 "id": 1093240277945458689,
 "id_str": "1093240277945458689",
 "text": "RT @NickRalston22: \ud83d\udea8 New addition to the #cULture on Feb. 6, 2019!\n\n\ud83d\udde3 Nick Ralston\n\ud83d\udccd Argyle, Texas\n\ud83d\udcdd Arizona State University\n\ud83c\udfc8 Tight End\u2026",
 "truncated": false,
 "entities": {
  "hashtags": [
   {
    "text": "cULture",
    "indices": [
     41,
     49
    ]
   }
  ],
  "symbols": [],
  "user_mentions": [
   {
    "screen_name": "NickRalston22",
    "name": "Nick Ralston",
    "id": 1348366854,
    "id_str": "1348366854",
    "indices": [
     3,
     17
    ]
   }
  ],
  "urls": []
 },
 "metadata": {
  "iso_language_code": "en",
  "result_type": "recent"
 },
 "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
 "in_reply_to_status_id": null,
 "in_reply_to_status_id_str": null,
 "in_reply_to_user_id": null,
 "in_reply_to_user_id_str": null,
 "in_reply_

In [20]:
# The result of the list comprehension is a list with only one element that
# can be accessed by its index and set to the variable t

t = statuses[0]

#[ status for status in statuses 
#          if status['id'] == 316948241264549888 ][0]

# Explore the variable t to get familiarized with the data structure...

print(t['retweet_count'])
print(t['retweeted'])

12
False


## Example 6. Extracting text, screen names, and hashtags from tweets

We'll use again the text screen names and hashtags
for all these records and we'll assign them to lists.
We'll call the first list status_texts.
So status_texts for all statuses, right?
Screen_names will be the next one.
User mention screen name and the for status all statuses.
And hashtags.
We definitely want the hashtags to keep track of them.

In [22]:
status_texts = [status['text']
                for status in statuses]

screen_names = [user_mention['screen_name']
                for status in statuses
                    for user_mention in status['entities']['user_mentions']]

hashtags = [hashtag['text']
            for status in statuses
                for hashtag in status['entities']['hashtags']]

# Compute a collection of all words from all tweets

words = [w
         for t in status_texts
            for w in t.split()]

We are using the data structure
for retrieving the data from the tweet records
just as a summary.
The third listed here, or fourth, is interesting
because we are splitting the message
to create a list of all the words.
You've seen this in the bag of words.
Here we are just using the split function again
from the string class.
Running this.
In the next code cell, we just the JSON dumps
to display the first five items for each list.

In [23]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "RT @NickRalston22: \ud83d\udea8 New addition to the #cULture on Feb. 6, 2019!\n\n\ud83d\udde3 Nick Ralston\n\ud83d\udccd Argyle, Texas\n\ud83d\udcdd Arizona State University\n\ud83c\udfc8 Tight End\u2026",
 "RT @vypehouston: #NSD19 at @NSSHS_gpisd was a busy one with 16 players signing. Caught up with a few. \n\n@Tbradford97 - @TexasTechFB \n@Ajani\u2026",
 "RT @king00770975: \ud83d\udea8 New addition to the #cULture on Feb. 6, 2019!\n\n\ud83d\udde3 King McGowen\n\ud83d\udccd Willis, Texas\n\ud83d\udcdd Willis HS\n\ud83c\udfc8 Offensive Line \n\n#NSD19 | #\u2026",
 "RT @mcmichael20: NSD Q&amp;A with Louisville signee @bigdes01 : https://t.co/dx1RrESLg6 #NSD19",
 "RT @MichaelEhrlich: PUNTERS ARE BRANDS TOO.\n\n\ud83d\ude4c @louishedley1 \ud83d\ude4c\n\n#NSD19\n#NationalSigningDay\n#BrandFood https://t.co/KmVNiymrFv"
]
[
 "NickRalston22",
 "vypehouston",
 "NSSHS_gpisd",
 "Tbradford97",
 "TexasTechFB"
]
[
 "cULture",
 "NSD19",
 "cULture",
 "NSD19",
 "NSD19"
]
[
 "RT",
 "@NickRalston22