# Twitter HOWTO

## Overview

This document is an overview of how to use NLTK to collect and process Twitter data. It was adapted from the NLTK GitHub repository (http://www.nltk.org/howto/twitter.html).

## <a name="first_steps">First Steps</a>

In order to collect data from Twitter, you first need to register a new *application* &mdash; this is Twitter's way of referring to any computer program that interacts with the Twitter API. Alternatively, if you just want to play around with the Twitter data that is distributed as part of NLTK, head over to the section on using the [`twitter-samples` corpus reader](#corpus_reader).

### <a name="twython">Install Twython</a>

The NLTK Twitter package relies on a third party library called [Twython](https://twython.readthedocs.org/). Install Twython via [pip](https://pip.pypa.io):
```bash
$ pip install twython
```

or with [easy_install](https://pythonhosted.org/setuptools/easy_install.html):

```bash
$ easy_install twython
```

In [1]:
!pip install twython



## <a name="simple">Using the simple `Twitter` class</a>

### Dipping into the Public Stream

The `Twitter` class is intended as a simple means of interacting with the Twitter data stream. 

In [2]:
from nltk.twitter import credsfromfile, Query

oauth = credsfromfile(creds_file="twitterCredentials.txt", subdir="D:\\Dropbox\\UMN\\Teaching\\Predictive Analytics\\Fall 2016\\Code", verbose=True)
client = Query(**oauth)

Reading credentials file D:\Dropbox\UMN\Teaching\Predictive Analytics\Fall 2016\Code\twitterCredentials.txt
Credentials file "twitterCredentials.txt" looks good


In [3]:
tweets = client.search_tweets(keywords='hilary clinton, donald trump', limit=10)
tweet = next(tweets)

# Twitter's own documentation provides a useful overview of all the fields in the JSON object (https://dev.twitter.com/overview/api/tweets).
# Since each Tweet is converted into a Python dictionary, it's straightforward to just show a selected field, 
# such as the value of the `'text'` key.
from pprint import pprint
pprint(tweet, depth=1)

{u'contributors': None,
 u'coordinates': None,
 u'created_at': u'Mon Oct 31 21:11:44 +0000 2016',
 u'entities': {...},
 u'favorite_count': 0,
 u'favorited': False,
 u'geo': None,
 u'id': 793198795932704768L,
 u'id_str': u'793198795932704768',
 u'in_reply_to_screen_name': None,
 u'in_reply_to_status_id': None,
 u'in_reply_to_status_id_str': None,
 u'in_reply_to_user_id': None,
 u'in_reply_to_user_id_str': None,
 u'is_quote_status': False,
 u'lang': u'en',
 u'metadata': {...},
 u'place': None,
 u'retweet_count': 1,
 u'retweeted': False,
 u'retweeted_status': {...},
 u'source': u'<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
 u'text': u"RT @itgoeszaysway: Don't like Hilary Clinton but I hate Donald Trump... So yea..\U0001f643",
 u'truncated': False,
 u'user': {...}}


In [4]:
for tweet in tweets:
    print(tweet['text'])
    print("")

Sooooooo tired of hearing about Donald trump and Hilary Clinton

Don't like Hilary Clinton but I hate Donald Trump... So yea..🙃

RT @penguin_univers: UNBELIEVABLE! HILARY CLINTON AND DONALD TRUMP SPOTTED AT LOCAL HIGH SCHOOL! https://t.co/z3CoYTAa55

RT @V_of_Europe: Donald Trump will win US Presidential election, predicts hugely accurate AI machine https://t.co/Ct6WmAdImn https://t.co/h…

RT @penguin_univers: UNBELIEVABLE! HILARY CLINTON AND DONALD TRUMP SPOTTED AT LOCAL HIGH SCHOOL! https://t.co/z3CoYTAa55

Does anybody have a or know anybody with a Hilary Clinton or Donald Trump face mask I need it NOT for today.

RT @penguin_univers: UNBELIEVABLE! HILARY CLINTON AND DONALD TRUMP SPOTTED AT LOCAL HIGH SCHOOL! https://t.co/z3CoYTAa55

RT @_Ungagged: Ungagged USA! Podcast-"The Trumpet Tower of Clinton Cards" https://t.co/5vFrA8as4Q #POTUS2016 #yes2 #punjab #trump #clinton…

UNBELIEVABLE! HILARY CLINTON AND DONALD TRUMP SPOTTED AT LOCAL HIGH SCHOOL! https://t.co/z3CoYTAa55



In [5]:
#!pip install unirest
import unirest

mashapeKey = open('D:\\Dropbox\\UMN\\Teaching\\Predictive Analytics\\Fall 2016\\Code\\mashapeKey.txt', 'r').read().strip()

tweets = client.search_tweets(keywords='hilary clinton, donald trump', limit=1)
for tweet in tweets:
    apiurl = "https://alchemy.p.mashape.com/text/TextGetTextSentiment?outputMode=json&showSourceText=false&text="+tweet['text'].encode("ascii", "ignore")
    #print apiurl
    response = unirest.get(apiurl,
      headers={
        "X-Mashape-Key": mashapeKey,
        "Accept": "text/plain"
      }
    )
    print(tweet['text'], response.body["docSentiment"])  

(u"RT @itgoeszaysway: Don't like Hilary Clinton but I hate Donald Trump... So yea..\U0001f643", {u'score': u'-0.851597', u'type': u'negative'})


### <a name="streaming">Using the streaming API</a>

For more detail, you can see this blog post on [The difference between the Twitter Firehose API, the Twitter Search API, and the Twitter Streaming API](http://www.brightplanet.com/2013/06/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/)

In [6]:
from nltk.twitter import Query, Streamer, Twitter, TweetViewer, TweetWriter, credsfromfile

client = Streamer(**oauth)
client.register(TweetViewer(limit=10))
client.filter(track='clinton, trump')

@SenateGOP A truly strong, well-researched on Trump explains deception, lies&amp;racism going back to the 70s. https://t.co/E4DjPmzoVc
RT @steph93065: All the prominent Republicans that are silent now are complicit in govt corruption and fear being exposed in a Trump admini…
i am sure, after another PRIVATE MEETING with bill clinton, DOJ, lynch will make the final decision and this time,… https://t.co/X4i1a79USG
@deusvulteuropa So you think Bernie and Clinton would have had significantly different stances on issues w/advanced ?'s ?
RT @paulkrugman: There's a real scandal here -- but it's about Comey, not Clinton https://t.co/ZE11kaVIQt
RT @FOX2News: Before #donaldtrump took the podium, Bobby Knight told the crowd there's 'no BS' with Trump. He didn't abbreviate. https://t.…
sósia da hillary clinton senta na jaralha do padre num cinema com o filme da kéfera de fundo
RT @chrislhayes: Trump's true innovation was showing how few voters actually crave the small government/let the market dec

### <a name="sore">Storing tweets</a>

In [7]:
# To store data that Twitter sents by the Streaming API, we register a `TweetWriter` instance.
client = Streamer(**oauth)
client.register(TweetWriter(limit=10))
client.statuses.sample()

Writing to C:\Users\padam\twitter-files\tweets.20161031-161720.json
Written 10 Tweets


## <a name="corpus_reader">Using a Tweet Corpus from NLTK</a>

NLTK's Twitter corpus currently contains a sample of 20k Tweets (named '`twitter_samples`')
retrieved from the Twitter Streaming API, together with another 10k which are divided according to sentiment into negative and positive.

In [8]:
from nltk.corpus import twitter_samples
twitter_samples.fileids()

[u'negative_tweets.json',
 u'positive_tweets.json',
 u'tweets.20150430-223406.json']

In [9]:
strings = twitter_samples.strings('tweets.20150430-223406.json')
for string in strings[:15]:
    print(string)

RT @KirkKus: Indirect cost of the UK being in the EU is estimated to be costing Britain £170 billion per year! #BetterOffOut #UKIP
VIDEO: Sturgeon on post-election deals http://t.co/BTJwrpbmOY
RT @LabourEoin: The economy was growing 3 times faster on the day David Cameron became Prime Minister than it is today.. #BBCqt http://t.co…
RT @GregLauder: the UKIP east lothian candidate looks about 16 and still has an msn addy http://t.co/7eIU0c5Fm1
RT @thesundaypeople: UKIP's housing spokesman rakes in £800k in housing benefit from migrants.  http://t.co/GVwb9Rcb4w http://t.co/c1AZxcLh…
RT @Nigel_Farage: Make sure you tune in to #AskNigelFarage tonight on BBC 1 at 22:50! #UKIP http://t.co/ogHSc2Rsr2
RT @joannetallis: Ed Milliband is an embarrassment. Would you want him representing the UK?!  #bbcqt vote @Conservatives
RT @abstex: The FT is backing the Tories. On an unrelated note, here's a photo of FT leader writer Jonathan Ford (next to Boris) http://t.c…
RT @NivenJ1: “@George_Osborne: Ed Mi

The default tokenizer for Tweets (`casual.py`) is specialised for 'casual' text, and
the `tokenized()` method returns a list of lists of tokens.

In [10]:
tokenized = twitter_samples.tokenized('tweets.20150430-223406.json')
for toks in tokenized[:5]:
    print(toks)

[u'RT', u'@KirkKus', u':', u'Indirect', u'cost', u'of', u'the', u'UK', u'being', u'in', u'the', u'EU', u'is', u'estimated', u'to', u'be', u'costing', u'Britain', u'\xa3', u'170', u'billion', u'per', u'year', u'!', u'#BetterOffOut', u'#UKIP']
[u'VIDEO', u':', u'Sturgeon', u'on', u'post-election', u'deals', u'http://t.co/BTJwrpbmOY']
[u'RT', u'@LabourEoin', u':', u'The', u'economy', u'was', u'growing', u'3', u'times', u'faster', u'on', u'the', u'day', u'David', u'Cameron', u'became', u'Prime', u'Minister', u'than', u'it', u'is', u'today', u'..', u'#BBCqt', u'http://t.co\u2026']
[u'RT', u'@GregLauder', u':', u'the', u'UKIP', u'east', u'lothian', u'candidate', u'looks', u'about', u'16', u'and', u'still', u'has', u'an', u'msn', u'addy', u'http://t.co/7eIU0c5Fm1']
[u'RT', u'@thesundaypeople', u':', u"UKIP's", u'housing', u'spokesman', u'rakes', u'in', u'\xa3', u'800k', u'in', u'housing', u'benefit', u'from', u'migrants', u'.', u'http://t.co/GVwb9Rcb4w', u'http://t.co/c1AZxcLh\u2026']
