- Go to http://dev.twitter.com/apps/new to create an app and get values for these credentials, which you'll need to provide in place of these empty string values that are defined as placeholders.


- See https://dev.twitter.com/docs/auth/oauth for more information on Twitter's OAuth implementation.

In [1]:
# pip install python-twitter -> run that command in your remote machine's terminal window to install the Twitter wrapper
import twitter

# Twitter API keys go here
CONSUMER_KEY = ''
CONSUMER_SECRET = ''

OAUTH_TOKEN = ''
OAUTH_TOKEN_SECRET = ''


auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
                           CONSUMER_KEY, CONSUMER_SECRET)

twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print twitter_api

<twitter.api.Twitter object at 0x7f998df0a390>


Wo0t! we've successfully used OAuth credentials to gain authorization to query Twitter's API!!!

# Searching Tweets

Let's take one of the common hashtags across trends and use it as the basis of a search query to fetch some tweets for further analysis. Here's a link to the <a class="ulink" href="http://bit.ly/1a1l398" target="\_top"><code class="literal">GET search/tweets</code> resource</a>.

In [5]:
import json

q = '#PopeinNYC' 
count = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets



In [6]:
# here's how a single tweet object looks like

print json.dumps(search_results['statuses'][0], indent=1)

{
 "contributors": null, 
 "truncated": false, 
 "text": "RT @ApoliticComedy: Pope Francis should drive his popemobile right into MSG. It'd be the first defense the garden's seen since we got Carme\u2026", 
 "is_quote_status": false, 
 "in_reply_to_status_id": null, 
 "id": 649451326439563265, 
 "favorite_count": 0, 
 "source": "<a href=\"https://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android Tablets</a>", 
 "retweeted": false, 
 "coordinates": null, 
 "entities": {
  "symbols": [], 
  "user_mentions": [
   {
    "id": 1853428015, 
    "indices": [
     3, 
     18
    ], 
    "id_str": "1853428015", 
    "screen_name": "ApoliticComedy", 
    "name": "Apolitical Comedy"
   }
  ], 
  "hashtags": [
   {
    "indices": [
     139, 
     140
    ], 
    "text": "PopeinNYC"
   }
  ], 
  "urls": []
 }, 
 "in_reply_to_screen_name": null, 
 "in_reply_to_user_id": null, 
 "retweet_count": 2, 
 "id_str": "649451326439563265", 
 "favorited": false, 
 "retweeted_status": {
  "

In [7]:
# all statuses
statuses = search_results['statuses']

In [13]:
import datetime 
datetime.datetime.strptime(statuses[0]['created_at'], '%a %b %d %H:%M:%S +0000 %Y').strftime('%Y-%m-%d %H')

'2015-10-01 05'

In [None]:
# all users
tweet_users = [x['user']['screen_name'] for x in statuses]
tweet_users[:20]

In [None]:
# this is metadata returned along with our search results -> it includes the parameters for continuing to query for 
# additional results (just like we did with IG)

search_results['search_metadata']

In [None]:
# if we provide max_id -> we'll get tweets older than the current ones 

params = {a:b for a,b in [x.split('=') for x in search_results['search_metadata']['next_results'][1:].split('&')]}
max_id = int(params['max_id'])

In [None]:
# given the 'max_id' parameter now we iterate through the results

search_results = twitter_api.search.tweets(q=q, count=count, max_id=max_id)
statuses += search_results['statuses']

In [None]:
len(statuses)

In [None]:
# use a loop -> iterate multiple times to get many more tweets

num_iterations = 30

for i in range(num_iterations):
    params = {a:b for a,b in [x.split('=') for x in search_results['search_metadata']['next_results'][1:].split('&')]}
    max_id = int(params['max_id'])    
    search_results = twitter_api.search.tweets(q=q, count=count, max_id=max_id)
    statuses += search_results['statuses']

In [None]:
len(statuses)

In [None]:
# save our data in pickled format - so that we don't ahve to grab it again if our machine crashes

import pickle
path = '/class/itpmssd/datasets/'

pickle.dump(statuses, open(path+'%s_tw.p' % q,'wb'))

This was a simple hashtag search. It is worth noting that Twitter's search API enables some more advanced queries - https://dev.twitter.com/docs/using-search

# Tweet Entities

In [None]:
status_texts = [ status['text'] 
                 for status in statuses ]

print json.dumps(status_texts[0:5], indent=1)

In [None]:
screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

print json.dumps(screen_names[0:5], indent=1) 

In [None]:
hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

print json.dumps(hashtags[0:15], indent=1)

In [None]:
# Compute a collection of all words from all tweets
words = [ w 
          for t in status_texts 
              for w in t.split() ]

print json.dumps(words[0:5], indent=1)

In [None]:
status_texts[0].split()

In [None]:
len(words)

## Analyzing Tweets and Tweet Entities with Frequency Analysis

From an empirical standpoint, counting observable things is the starting point for just about everything, and thus the starting point for any kind of statistical filtering or manipulation that strives to find what may be a faint signal in noisy data. Whereas we just extracted the first 5 items of each unranked list to get a feel for the data, let's now take a closer look at what's in the data by computing a frequency distribution and looking at the top 10 items in each list.

The result of the frequency distribution is a map of key/value
      pairs corresponding to terms and their frequencies, so let's make
      reviewing the results a little easier on the eyes by emitting a tabular
      format. You can install a package called <code class="literal">prettytable</code> by typing <strong class="userinput"><code>pip install prettytable</code></strong> in a terminal; this
      package provides a convenient way to emit a fixed-width tabular format
      that can be easily copied-and-pasted.

In [None]:
from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print c.most_common()[:10] # top 10
    print

In [None]:
from prettytable import PrettyTable

for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    pt = PrettyTable(field_names=[label, 'Count']) 
    c = Counter(data)
    [ pt.add_row(kv) for kv in c.most_common()[:10] ]
    pt.align[label], pt.align['Count'] = 'l', 'r' # Set column alignment
    print pt

Used code and examples from Mining the Social Web, 2nd Edition - https://rawgit.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/master/ipynb/html/Chapter%201%20-%20Mining%20Twitter.html

Neat Link - https://github.com/lennerd/TwitterGraph