<DIV ALIGN=CENTER>

# Introduction to Social Media: Twitter
## Professor Robert J. Brunner
  
</DIV>  
-----
-----


## Introduction


When looking for data to use for text data processing, one of the more
popular data sources is [Twitter][tw]. In this Notebook, we introduce
the Twitter API, and demonstrate how to use the Twitter API from within
a Python program to acquire and process tweets, or Twitter messages.


In this IPython Notebook, we explore 

OAUTH

Twitter client 

Process twitter information. SHow Tweet JSON.

Discuss rate limits

Sentiment analysis?

-----
[tw]: https://www.twitter.com

## Python and Twitter

To work with the Twitter API from within a Python program, we need a
Python library that wraps the official [Twitter API][twapi]. There are a
number of different Python libraries that provide this capability, we
will use the [tweepy][tpy] library, which is fairly popular and provides
a fairly complete interface.

The full Twitter API is large and robust (and continuous to evolve),
for this course we will restrict our attention to several basic
concepts, namely authenticating to Twitter, searching for Tweets, and
digesting the messages.

----
[twapi]: https://dev.twitter.com
[tpy]: http://www.tweepy.org

In [1]:
import tweepy as tw

-----

## Reading Twitter Data

To read twitter data, you need to first need to be a registered Twitter
user and you need to create a new _Twitter Application_ in order to
obtain credentials for connecting to Twitter and querying to the
Twitter data. You create (and later manage) Twitter applications by
visting the [Twitter Application Management](https://apps.twitter.com)
website.

![Twitter App Sign-in](images/twitter-app-signin.png)

At this point you need to authenticate with Twitter, if you are already
logged in to Twitter on your computer (for instance by using the Twitter
website) you should already be authenticated. If you are not
authenticated, click the _sign in_ link to be directed to the Twitter
signin page where you can enter your credentials (if you do not have
Twitter credentials, you will need to obtain a Twitter account to
proceed).

![Twitter Sign-in](images/twitter-signin.png)

After you have been authenticated, you will be redirected to the Twitter
apps page. If you have never created a Twitter application, you will
have nothing listed. To create a new application, press the _Create New
App_ button, as shown in the following screenshot.

![Twitter Create App](images/twitter-create.png)

This will open up the Twitter _Create an application_ webpage, where you
need to supply some basic information for your Twitter application such
as an application name, description, and website.

![Twitter Application details](images/twitter-appdetails.png)

Scroll to the bottom of this webpage where the **Developer Agreement**
is located. Following this agreement, is a check box that you should
click to signify you agree to be bound by the agreement (of course you
should read this to be sure you do _agree_ with it first). Following
this, press the _Create your Twitter application_ button as shown in the
following screenshot.

![Twitter Agree](images/twitter-agree.png)

This will create your new application, and provide you with your
application webpage, which will be similar to the following screenshot.

![Twitter Apppage](images/twitter-apppage.png)

While you can control a number of application features from this
webpage, the most important tasks to complete include:

1. Change your application to _read-only_ in case it is set to
read-write.

2. Obtain the application **Consumer Key** and **Consumer Secret**.

3. Obtain your personal **Access Token** and **Access Token Secret**.

You should change your application read-only to ensure you don't
accidentally send data out to Twitter. You change this by selecting the
_Permissions_ tab and selecting _Read only_, shown in the following
screenshot. To save this setting, scroll down this webpage and click the
_Update Settings_ button at the bottom of the page.

![Twitter Read Only Setting](images/twitter-ro.png)

These credentials can be found by selecting the _Keys and Access Tokens_
tab, and scrolling down appropriately as shown in the following two
screenshots.

![Twitter Consumer Application Credentials](images/twitter-consume.png)

![Twitter User Credentials](images/twitter-access.png)

<font color='red'>Warning: Never share these credentials with others or
they will be able to fully impersonate you on Twitter!</font>

You can directly copy these credentials into your Notebook, or,
alternatively, save them into a file (for example by opening a terminal
window and using `vim` to create a text file. In the rest of this
Notebook, I demonstrate this functionality by using my credentials,
which I have saved into a file called`twitter.cred'. In this empty file,
which is in your github repository, I have saved the following four
credentials in order:

1. Access Token
2. Access Token Secret
3. Consumer Key
4. Consumer Secret

You can inform `git` to ignore changes in the `twitter.cred` file by
using the folloqing command:

```bash
git update-index --assume-unchanged Week8/notebooks/twitter.cred 
```

The following code cell demonstrates how these credentials are read from
the file and used to properly authenticate our application with Twitter.

-----




In [2]:
tokens = []

# Order: Access Token, Access Token Secret, Consumer Key, Consumer SecretAccess

with open("twitter.cred", 'r') as fin:
    for line in fin:
        if line[0] != '#': # Not a comment line
            tokens.append(line.rstrip('\n'))

auth = tw.OAuthHandler(tokens[2], tokens[3])
auth.set_access_token(tokens[0], tokens[1])

api = tw.API(auth)

user = api.me()

print("Twitter Screen Name: ", user.screen_name)
print("Twitter Follower Count: ", user.followers_count)

Twitter Screen Name:  ProfBrunner
Twitter Follower Count:  148


-----

If the previous code cell runs without an error, you have successfully
connected to twitter. If you are new to twitter and are not following
anyone, you can instead display the user information for a different
Twitter user. For example, the following code would display my Twitter
information.

```python
user = api.get_user('ProfBrunner')
```

Replacing `ProfBrunner` with any valid Twitter user id will display
their information. You can find examples by looking at those Twitter
users you (or `ProfBrunner`) follow. Or, alternatively, you could chose
a specific twitter account; for example, to analyze the _NY Times_
twitter account you would use the following statement:

```python
user = api.get_user('NYTimes')
```

This is demonstrated in the following code cell.

In [3]:
user = api.get_user('nytimes')

print("Twitter Screen Name: ", user.screen_name)
print("Twitter Follower Count: ", user.followers_count)

print("\nThis user follows:\n--------------")
for friend in user.friends():
    print(friend.screen_name)

Twitter Screen Name:  nytimes
Twitter Follower Count:  25080526

This user follows:
--------------
emrosenberg
azamsahmed
BenWeiserNYT
Wesley_Morris
Yamiche
ChanellePrice
TwitterMoments
poniewozik
thunderwooddd
migold
timrace
NYT_IR
SciFleur
geminiimatt
mbeditor
gregfwinter
meredith_levien
mmcintire
mccarthyryanj
MikaylaBouchard


At any point, you can return to your Twitter application management
webpage to view your new application. You can now view and manage your
existing application, or create a new application as shown in the
following screenshot.

![Twitter new app management](images/twitter-manage.png)


-----

-----

## Student Activity

To run the Twitter application in the preceding cells, you will need to register your own Twitter Application. To do so, complete the following steps.

1. Create a New Twitter application.

2. Save your Twitter credentials and Application credentials into the
provided `twitter.cred` file.

3. Run the _tweepy_ sample code to connect to Twitter and display your
Twitter user information.

Finally, try running the preceding code, but for someone famous (if you do not know the twitter handle for someone famous, google will be your friend). 

-----

### Obtaining Tweets

Once you have authenticated with Twitter, you can begin to [search the
Twitter stream][stw] for tweets of interest. The easiest method to get started
is to being with your own (or another specific Twitter user's) own
Twitter feed. To access your own Twitter feed, you can simply use your
`home_timeline` to retrieve your own Tweets or Tweets from those whom
you follow. This is demonstrated in the following code cell, where we
display the `text` values from the ten most recent Tweets from our
timeline.

-----
[stw]: https://dev.twitter.com/rest/public/search

In [4]:
for tweet in tw.Cursor(api.home_timeline).items(10):
    # Process a single status
    print(tweet.text) 

When did Data Science become cool? I want to know whether I can say I was doing it before that.
RT @FiveThirtyEight: Catch up on everything you missed from #SuperTuesday: https://t.co/TITZAoovrB https://t.co/EXeiUAuBUF
'Choice of an appropriate family of distributions may be the most challenging phase of analysis.' -- David Cox
"I have a few 'special' friends on that floor." --What I hear at lunch...
Encyclopedia of graph classes https://t.co/YIP8YjwBU1
I suppose I'm on team make-Trump-fight-for-it-till-California-but-not-necessarily-till-Cleveland-see-what-voters-do. https://t.co/Rk2JYlINNi
Chebyshev's inequality: P( |X - mu| &gt; k sigma) &lt; 1/k^2 https://t.co/hX14dknEjQ
What's different between Clinton and Trump? Clinton has 60% of the Dem vote so far. Trump has 34% of the GOP vote. https://t.co/Rk2JYlINNi
RT @FiveThirtyEight: Can Republicans still take the nomination away from Trump? https://t.co/sc7qbo1NXN https://t.co/fvlyYrlJhx
I'm particularly interested in seeing @ApacheKudu

-----

### Searching

Twitter also provides the capability to search for specific tweets by
using the Tweepy [`search` method][twse]. In this method, you supply a
query string (and optional arguments) and are returned a list of Tweets.
The query string should follow the [Twitter Search API][tsa], but
basically you can search for specific text in a string by using the text
of interest, you can search for a person by using the `@` character
followed by their Twitter username, and hashtags by using the `#`
character followed by the tag text.

-----

[twse]: http://docs.tweepy.org/en/stable/api.html#API.search
[tsa]: https://dev.twitter.com/rest/public/search

In [5]:
# Hash Tage search: term = '#python'
# User search: term = '@nytimes'
# Keyword search: term = 'data science'
# Keyword and Sentiment: term ='data science :)' # Positive attitute

term ='data science :)'
num_tweets = 5

for tweet in tw.Cursor(api.search, q=term).items(num_tweets):
    # Process a single status
    print("Tweet ID:", tweet.id)
    print('Tweeted by ', tweet.user.screen_name)
    print("Created at ",tweet.created_at)
    print("Location: ",tweet.source)
    print('Tweet Text: ', tweet.text)
    print('-------------------------')

Tweet ID: 705108162555322370
Tweeted by  iamidakwo
Created at  2016-03-02 19:11:00
Location:  Twitter for Android
Tweet Text:  RT @Springboard: Learn Python for Data Science: here's a free tutorial :)
https://t.co/pOJFFVmVOq
#datascience #data #startup #tech https:/…
-------------------------
Tweet ID: 705107930203475969
Tweeted by  alexip
Created at  2016-03-02 19:10:05
Location:  Twitter Web Client
Tweet Text:  That's one of a catchy title! :)
"Open Data Science Is the David Beckham of Your Data Team" by @MCanalytics https://t.co/qnW4wowOc9
-------------------------
Tweet ID: 705099921385353216
Tweeted by  Springboard
Created at  2016-03-02 18:38:16
Location:  Twitter Web Client
Tweet Text:  Learn Python for Data Science: here's a free tutorial :)
https://t.co/pOJFFVmVOq
#datascience #data #startup #tech https://t.co/qUs50Dga9M
-------------------------
Tweet ID: 705099166121242624
Tweeted by  Springboard
Created at  2016-03-02 18:35:15
Location:  Twitter Web Client
Tweet Text:  @Joe

-----

We can view the available attributes to display by using Python `dir`
method to perform introspection. In the following code cell we
explicitly remove _class_ methods to minimize the display list and focus
on the items of interest. After this, we display the Tweet in its raw
JSON format by accessing the `_json` attribute.

-----

In [6]:
import pprint

pp = pprint.PrettyPrinter(indent=2, depth=2, width=80, compact=True)

tweets = api.search(q='ProfBrunner', rpp=1)

pp.pprint([att for att in dir(tweets) if '__' not in att])

[ '_max_id', '_since_id', 'append', 'clear', 'completed_in', 'copy', 'count',
  'extend', 'ids', 'index', 'insert', 'max_id', 'next_results', 'parse', 'pop',
  'query', 'refresh_url', 'remove', 'reverse', 'since_id', 'sort']


In [7]:
# Pick a single tweet to analyze

tweet = tweets[1]
pp.pprint([att for att in dir(tweet) if '__' not in att])

[ '_api', '_json', 'author', 'contributors', 'coordinates', 'created_at',
  'destroy', 'entities', 'favorite', 'favorite_count', 'favorited', 'geo', 'id',
  'id_str', 'in_reply_to_screen_name', 'in_reply_to_status_id',
  'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str',
  'is_quote_status', 'lang', 'metadata', 'parse', 'parse_list', 'place',
  'retweet', 'retweet_count', 'retweeted', 'retweets', 'source', 'source_url',
  'text', 'truncated', 'user']


In [8]:
# We can display the message data in JSON format

pp.pprint(tweet._json)

{ 'contributors': None,
  'coordinates': None,
  'created_at': 'Sun Feb 28 22:54:23 +0000 2016',
  'entities': {'hashtags': [], 'symbols': [], 'urls': [], 'user_mentions': []},
  'favorite_count': 6,
  'favorited': False,
  'geo': None,
  'id': 704077214724333569,
  'id_str': '704077214724333569',
  'in_reply_to_screen_name': None,
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'is_quote_status': False,
  'lang': 'en',
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'place': None,
  'retweet_count': 2,
  'retweeted': False,
  'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web '
            'Client</a>',
  'text': 'I received an interesting email today, it started off:\n'
          '\n'
          '"Dear Astrology Advisors,"\n'
          '\n'
          'Never saw it coming.',
  'truncated': False,
  'user': { 'contributors_enabled': False,
            'created_

----

### Trending

https://developer.yahoo.com/geo/geoplanet/guide/concepts.html

https://dev.twitter.com/rest/reference/get/trends/available

----

In [9]:
# Returns a JSON object that contains (a large number of) locations 
# that are currently trending.

top_display = 20
trending = api.trends_available()

# We skip first value, which is entry for the World in JSON.
for trend in trending[1:top_display]:
    print('WOEID Code ({2:d}): {0}, {1}'.format(trend['name'], \
                                                trend['country'], trend['woeid']))

WOEID Code (2972): Winnipeg, Canada
WOEID Code (3369): Ottawa, Canada
WOEID Code (3444): Quebec, Canada
WOEID Code (3534): Montreal, Canada
WOEID Code (4118): Toronto, Canada
WOEID Code (8676): Edmonton, Canada
WOEID Code (8775): Calgary, Canada
WOEID Code (9807): Vancouver, Canada
WOEID Code (12723): Birmingham, United Kingdom
WOEID Code (12903): Blackpool, United Kingdom
WOEID Code (13383): Bournemouth, United Kingdom
WOEID Code (13911): Brighton, United Kingdom
WOEID Code (13963): Bristol, United Kingdom
WOEID Code (15127): Cardiff, United Kingdom
WOEID Code (17044): Coventry, United Kingdom
WOEID Code (18114): Derby, United Kingdom
WOEID Code (19344): Edinburgh, United Kingdom
WOEID Code (21125): Glasgow, United Kingdom
WOEID Code (25211): Hull, United Kingdom


In [10]:
pp.pprint(trending[10])

{ 'country': 'United Kingdom',
  'countryCode': 'GB',
  'name': 'Blackpool',
  'parentid': 23424975,
  'placeType': {'code': 7, 'name': 'Town'},
  'url': 'http://where.yahooapis.com/v1/place/12903',
  'woeid': 12903}


In [11]:
# We can use a WOEID to find location specific trends.
# Here we use the WOEID for the UK (from previous example)

top_display = 10

print("UK Trends")
print(10*'-')

for trends in api.trends_place(id = 23424975):
    for trend in trends["trends"][:top_display]:
        print("  {0:s}".format(trend["name"]))

UK Trends
----------
  Adam Johnson
  #TwoWordTrump
  #GIDC
  #AskMarcusAnything
  Ann Coulter
  Finding Dory
  #WednesdayWisdom
  #AskLauraPrepon
  Gouffran
  Nancy Banks-Smith


-----
## Twitter text analysis

We use NLTK tweet corpus. Perform analysis.

http://www.nltk.org/howto/twitter.html#Using-a-Tweet-Corpus

-----

In [12]:
import numpy as np
import nltk

tws = nltk.corpus.twitter_samples

pos_tweets = np.array(tws.strings('positive_tweets.json'))
neg_tweets = np.array(tws.strings('negative_tweets.json'))

pos_labels = np.ones(pos_tweets.shape[0])
neg_labels = np.zeros(neg_tweets.shape[0])

targets = np.concatenate((pos_labels, neg_labels), axis=0)
data = np.concatenate((pos_tweets, neg_tweets), axis = 0)

print('{0} Positive Tweets'.format(pos_tweets.shape[0]))
print('{0} Negative Tweets'.format(neg_tweets.shape[0]))

5000 Positive Tweets
5000 Negative Tweets


In [13]:
from sklearn.cross_validation import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    data, targets, test_size=0.25, random_state=23)

In [14]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn import metrics

tools = [('cv', CountVectorizer()), ('nb', MultinomialNB())]
pclf = Pipeline(tools)


# Lowercase and restrict ourselves to about half the available features
pclf.set_params(cv__stop_words = 'english', \
                cv__ngram_range=(1,2), \
                cv__lowercase=True)

pclf.fit(x_train, y_train)
y_pred = pclf.predict(x_test)
print(metrics.classification_report(y_test, y_pred, target_names = ['Positive', 'Negative']))

             precision    recall  f1-score   support

   Positive       0.73      0.79      0.76      1240
   Negative       0.78      0.71      0.74      1260

avg / total       0.75      0.75      0.75      2500



In [15]:
unknown_tweets = np.array(tws.strings('tweets.20150430-223406.json'))
unknown_pred = pclf.predict(unknown_tweets)

unknown_pos = unknown_tweets[unknown_pred == 1]
unknown_neg = unknown_tweets[unknown_pred == 0]

In [16]:
tweet_idx = 101

print('{0} tweets to classify.'.format(unknown_tweets.shape[0]))
print('{0} tweets classified as positive.'.format(unknown_pos.shape[0]))
print('{0} tweets classified as negative.'.format(unknown_neg.shape[0]))

print(75*'-')
print('Sample Positve Tweet:')
print(75*'-')
print(unknown_pos[tweet_idx])

print(75*'-')
print('Sample Negatve Tweet:')
print(75*'-')
print(unknown_neg[tweet_idx])

20000 tweets to classify.
8508 tweets classified as positive.
11492 tweets classified as negative.
---------------------------------------------------------------------------
Sample Positve Tweet:
---------------------------------------------------------------------------
"David Cameron: smooth, smiley but unconvincing" #bbcqt http://t.co/mJ2ZkX1TjB
---------------------------------------------------------------------------
Sample Negatve Tweet:
---------------------------------------------------------------------------
RT @DouglasDaniel: Miliband's new line 'if you don't vote Labour in Scotland I will punish you by letting the Tories in'.


-----

### Apply classifier on new tweets

We pick _random_ user, in this case CNN Political Twitter Feed.

-----

In [17]:
newtweets = api.user_timeline(screen_name='@CNNPolitics', include_rts=False, count=100)
                           
print('We obtained a sample of {0} tweets'.format(len(newtweets)))

We obtained a sample of 90 tweets


In [18]:
messages = []

for tweet in newtweets:
    messages.append(tweet.text)
    
new_tweets = np.array(messages)
new_pred = pclf.predict(new_tweets)

new_pos = new_tweets[new_pred == 1]
new_neg = new_tweets[new_pred == 0]

In [19]:
tweet_idx = 13

print('{0} tweets to classify.'.format(new_tweets.shape[0]))
print('{0} tweets classified as positive.'.format(new_pos.shape[0]))
print('{0} tweets classified as negative.'.format(new_neg.shape[0]))

print(75*'-')
print('Sample Positve Tweet:')
print(75*'-')
print(new_pos[tweet_idx])

print(75*'-')
print('Sample Negatve Tweet:')
print(75*'-')
print(new_neg[tweet_idx])

90 tweets to classify.
63 tweets classified as positive.
27 tweets classified as negative.
---------------------------------------------------------------------------
Sample Positve Tweet:
---------------------------------------------------------------------------
NY Daily News again takes aim at Donald Trump, offers advice to those looking to flee U.S. https://t.co/R3xOE3b32Q https://t.co/9vrDGyUjK1
---------------------------------------------------------------------------
Sample Negatve Tweet:
---------------------------------------------------------------------------
#Breaking: CNN projects @HillaryClinton will win the Massachusetts Democratic primary https://t.co/DkPyle0Wrv https://t.co/uV2pSUUQLN


-----

## Student Activity

To run the Twitter application in the preceding cells, you will need to register your own Twitter Application. To do so, complete the following steps.

Different Classifier

Different twitter user or search for new tweets (election results?)

1. Create a New Twitter application.

2. Save your Twitter credentials and Application credentials into the
provided `twitter.cred` file.

3. Run the _tweepy_ sample code to connect to Twitter and display your
Twitter user information.

Finally, try running the preceding code, but for someone famous (if you do not know the twitter handle for someone famous, google will be your friend). 

-----