<div align="center">
    <h1><a href="index.ipynb">Knowledge Discovery in Digital Humanities</a></h1>
</div>

<div align="center">
    <h2>Class 09. Mining Twitter</h2>
    <img src="img/twitter-logo.png" width="300">
</div>

###Table of contents

- [Why Twitter?](#Why-Twitter?)
- [Creating a connection to Twitter](#Creating-a-connection-to-Twitter)
- [Trending topics](#Trending-topics)
- [Searching for tweets](#Searching-for-tweets)
- [Tweet analysis](#Tweet-analysis)

###Why Twitter?

[Twitter](http://twitter.com/) is as a microblogging service that allows people to communicate with 140-character messages (called *tweets*).

- Tweets reflect people's thoughts in near real time

Twitter's *following* system connects people and creates networks. Its asymmetric model allow users to follow any other user even if there is no reciprocation, unlike other social media like Facebook and LinkedIn, that require the mutual acceptance of a connection between users (which usually implies a some kind of real-world connection).

- Twitter's asymmetric *following* model allows people to keep up with their interests

Interest graphs are a way of modeling connections between people and their interests. Interest graphs can be mined in order to measure correlations between users and interests and make recommendations ranging from whom to follow on Twitter to what to purchase online to whom you should date.

- Mining Twitter provides a way to discover people's opinions and interests 

###Creating a connection to Twitter

1. Create an app on Twitter
    1. Go to [https://apps.twitter.com/](https://apps.twitter.com/)
    2. Login with your user account
    3. Click on Create new app
    4. Fill in the form, accept the agreement and click on Create your Twitter application
    5. Go to Keys and Access Tokens tab
    6. Scroll down and click on Create my access token
    7. Create a script named `credentials.py` (do not share it with anyone else) than contains this code:
```
TW_CONSUMER_KEY = 'Consumer Key (API Key)'
TW_CONSUMER_SECRET = 'Consumer Secret (API Secret)'
TW_ACCESS_TOKEN = 'Access Token'
TW_ACCESS_TOKEN_SECRET = 'Access Token Secret'
```
2. Authorize your application to access Twitter

In [11]:
import credentials
import twitter

CONSUMER_KEY = credentials.TW_CONSUMER_KEY
CONSUMER_SECRET = credentials.TW_CONSUMER_SECRET
ACCESS_TOKEN = credentials.TW_ACCESS_TOKEN
ACCESS_TOKEN_SECRET = credentials.TW_ACCESS_TOKEN_SECRET

auth = twitter.oauth.OAuth(
    ACCESS_TOKEN,
    ACCESS_TOKEN_SECRET,
    CONSUMER_KEY,
    CONSUMER_SECRET
)

twitter_api = twitter.Twitter(auth=auth)

###Trending topics

Two options:
- Global
- Specific location (via [Yahoo! GeoPlanet](https://developer.yahoo.com/geo/geoplanet/)'s Where On Earth (WOE) ID)

In [19]:
GLOBAL_WOE_ID = 1 # Worldwide
CA_WOE_ID = 23424775 # Canada

global_trends = twitter_api.trends.place(_id=GLOBAL_WOE_ID)
ca_trends = twitter_api.trends.place(_id=CA_WOE_ID)

In [33]:
# Auxiliar function to print json in a friendly format
import json

def print_friendly_json(d):
    print json.dumps(d, indent=2)

In [34]:
print_friendly_json(global_trends)

[
  {
    "created_at": "2015-04-04T02:34:43Z", 
    "trends": [
      {
        "url": "http://twitter.com/search?q=%23SosFlorDePelotudoSi", 
        "query": "%23SosFlorDePelotudoSi", 
        "name": "#SosFlorDePelotudoSi", 
        "promoted_content": null
      }, 
      {
        "url": "http://twitter.com/search?q=%23BatuquemosPorAmandaFinalista", 
        "query": "%23BatuquemosPorAmandaFinalista", 
        "name": "#BatuquemosPorAmandaFinalista", 
        "promoted_content": null
      }, 
      {
        "url": "http://twitter.com/search?q=%23BSBTheMovie", 
        "query": "%23BSBTheMovie", 
        "name": "#BSBTheMovie", 
        "promoted_content": null
      }, 
      {
        "url": "http://twitter.com/search?q=%23%D9%81%D8%B1%D9%8A%D9%82_%D8%AA%D8%AD%D8%AA%D8%B1%D9%85%D9%87_%D9%84%D9%83%D9%86_%D9%84%D8%A7%D8%AA%D8%B4%D8%AC%D8%B9%D9%87", 
        "query": "%23%D9%81%D8%B1%D9%8A%D9%82_%D8%AA%D8%AD%D8%AA%D8%B1%D9%85%D9%87_%D9%84%D9%83%D9%86_%D9%84%D8%A7%D8%AA%D8%B4%D8%AC

In [35]:
print_friendly_json(ca_trends)

[
  {
    "created_at": "2015-04-04T02:34:43Z", 
    "trends": [
      {
        "url": "http://twitter.com/search?q=%23CuffMeDanny", 
        "query": "%23CuffMeDanny", 
        "name": "#CuffMeDanny", 
        "promoted_content": null
      }, 
      {
        "url": "http://twitter.com/search?q=%23MakeAQuoteWhiny", 
        "query": "%23MakeAQuoteWhiny", 
        "name": "#MakeAQuoteWhiny", 
        "promoted_content": null
      }, 
      {
        "url": "http://twitter.com/search?q=%23BlueJaysMTL", 
        "query": "%23BlueJaysMTL", 
        "name": "#BlueJaysMTL", 
        "promoted_content": null
      }, 
      {
        "url": "http://twitter.com/search?q=Tokarski", 
        "query": "Tokarski", 
        "name": "Tokarski", 
        "promoted_content": null
      }, 
      {
        "url": "http://twitter.com/search?q=%23GraceShow", 
        "query": "%23GraceShow", 
        "name": "#GraceShow", 
        "promoted_content": null
      }, 
      {
        "url": "http://twit

####Exercise 1
What Canadian trending topics are worldwide trending topics?
1. Calculate the set of global trending topics (use list comprehensions)
2. Calculate the set of Canadian trending topics (use list comprehensions)
3. Calculate the intersection of both sets

In [27]:
global_trends_list = [trend['name'] for trend in global_trends[0]['trends']]
global_trends_set = set(global_trends_list)

ca_trends_list = [trend['name'] for trend in ca_trends[0]['trends']]
ca_trends_set = set(ca_trends_list)

common_trends_set = ca_trends_set.intersection(global_trends_set)
print common_trends_set

set([])


###Searching for tweets

- Search results contain a special `search_metadata` node that embeds a `next_results` field with a query string that provides the basis of a subsequent query
- *Cursor* approach instead of *pagination*
- Due to the highly dynamic state of Twitter resources (more information about *cursor vs pagination* [here](https://dev.twitter.com/rest/public/timelines))

Search for a trending topic, or anything else for that matter (variable `q`).

In [41]:
q = 'Tokarski' 
count = 100
search_results = twitter_api.search.tweets(q=q, count=count)
tweets = search_results['statuses']

# Iterate through 5 more batches of results by following the cursor
for _ in range(5):
    print "Number of tweets retrieved:", len(tweets)
    try:
        next_results = search_results['search_metadata']['next_results']
    except KeyError, e: # No more results when next_results doesn't exist
        break
    
    # If next_results exists, search again
    # next_results has the following form:
    # ?max_id=313519052523986943&q=NCAA&include_entities=1
    # Unpacking the values in a dictionary into keyword arguments
    # for the next search
    kwargs = dict([ kw.split('=') for kw in next_results[1:].split("&") ])
    search_results = twitter_api.search.tweets(**kwargs)
    tweets += search_results['statuses']

Number of tweets retrieved: 100
Number of tweets retrieved: 200
Number of tweets retrieved: 300
Number of tweets retrieved: 400
Number of tweets retrieved: 500


In [40]:
print_friendly_json(tweets[0])

{
  "contributors": null, 
  "truncated": false, 
  "text": "Elias talks about SO move that turned Tokarski inside\u00a0out http://t.co/SF4NO11yCA", 
  "in_reply_to_status_id": null, 
  "id": 584224863260278784, 
  "favorite_count": 0, 
  "source": "<a href=\"http://publicize.wp.com/\" rel=\"nofollow\">WordPress.com</a>", 
  "retweeted": false, 
  "coordinates": null, 
  "entities": {
    "symbols": [], 
    "user_mentions": [], 
    "hashtags": [], 
    "urls": [
      {
        "url": "http://t.co/SF4NO11yCA", 
        "indices": [
          58, 
          80
        ], 
        "expanded_url": "http://wp.me/p4Vz4N-1xoF", 
        "display_url": "wp.me/p4Vz4N-1xoF"
      }
    ]
  }, 
  "in_reply_to_screen_name": null, 
  "in_reply_to_user_id": null, 
  "retweet_count": 0, 
  "id_str": "584224863260278784", 
  "favorited": false, 
  "user": {
    "follow_request_sent": false, 
    "profile_use_background_image": true, 
    "profile_text_color": "333333", 
    "default_profile_image":

###Tweet analysis