# Twitter Stream with Python

This notebook connects to a (filtered) real-time Twitter stream.  
Incoming tweets are classified on sentiment, either positive or negative.  
We'll use a Naive Bayes analyzer (pre-trained on movie reviews).  
And we'll train a classifier ourselfs, based on human classified tweets.  

### Setup

To use this notebook, you need to get credentials from Twitter.  
Acquiring credentials is described here : https://www.slickremix.com/docs/how-to-get-api-keys-and-tokens-for-twitter/  
Put the credentials in a file called ```twitter_credentials.py``` in the same folder as this notebook.  
The format of that file needs to be :  

```
consumer_key = "THE_ACTUAL_CONSUMER_KEY"
consumer_secret = "THE_ACTUAL_CONSUMER_SECRET"
access_token = "THE_ACTUAL_ACCESS_TOKEN"
access_token_secret = "THE_ACTUAL_ACCESS_TOKEN_SECRET"
```

Your Conda installation might not have all the necessary modules.  
Run the commands below once by uncommenting them, and running the cell.  
Afterwards comment the lines to improve run speed.

In [None]:
# If needed
#!pip install tweepy
#!pip install textblob
#!pip install nltk

# If needed
# import nltk
# nltk.download()  # Select twitter_samples under tab 'Corpora'

In [None]:
# Imports always goes on top
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from textblob import TextBlob
from textblob.classifiers import NaiveBayesClassifier
from textblob.sentiments import NaiveBayesAnalyzer
from nltk.corpus import twitter_samples
import json
import random

# Import the credentials
import twitter_credentials

### Authentication

Create the authentication object, only needed once per session.

In [None]:
auth = OAuthHandler(twitter_credentials.consumer_key, twitter_credentials.consumer_secret)
auth.set_access_token(twitter_credentials.access_token, twitter_credentials.access_token_secret)

### Training

We start by training a classifier with the NLTK module, with the provided samples.  
These samples are classified by humans on positive or negative sentiment.  
This might take some time, so only do this once per session.

In [None]:
# List of 2-tuples, with each 2-tuple a list of strings and a label  
train = []

# First add the negative tweets
for tokens in twitter_samples.tokenized('negative_tweets.json'):
    train.append((tokens, 'neg'))
    
# Then add the positive tweets
for tokens in twitter_samples.tokenized('positive_tweets.json'):
    train.append((tokens, 'pos'))

# Shuffle and take a subset, this speeds op speed up training
random.shuffle(train)
train = train[0:600]

cl = NaiveBayesClassifier(train)

### Tweet class

Let's define a class that receives the tweet, and classifies the text.  
Also define methods to print the sentiment, language and the tweet itself.

In [None]:
class Tweet:
    """This class creates a tweet from a JSON string"""
    def __init__(self, data, cl):
        # Hint : print(self._tweet.keys()) for all keys in the tweet
        self._tweet = json.loads(data)
        self.blob1 = TextBlob(self._tweet["text"], classifier=cl)
        self.blob2 = TextBlob(self._tweet["text"], analyzer=NaiveBayesAnalyzer())
        
    def print_tweet(self):
        print()
        print("-" * 80)
        print(self._tweet["id_str"], self._tweet["created_at"])
        print(self._tweet["text"])
        
    def print_language(self):
        print("language", self.blob1.detect_language())
        
    def print_sentiment(self):
        print("sentiment", self.blob1.classify())
        print(self.blob2.sentiment)

### Listener class

Here we define a listener, that processes the stream.  
If it receives data, create a Tweet object and classifies the tweet.  
It also prints the various characteristics and checks if it needs to continue.

In [None]:
class MyListener(StreamListener):
    """Listener class that processes a Twitter Stream"""
    def __init__(self, max_count, cl):
        self.max_count = max_count
        self.count = 0
        self.cl = cl
    
    def on_data(self, data):
        self.tweet = Tweet(data, cl)
        self.tweet.print_tweet()
        self.tweet.print_language()
        self.tweet.print_sentiment()
                
        self.count += 1
        if self.count >= self.max_count:
            return False
        return True

### Main

First instantiate a listener from the definition that stops after 10 tweets.  
We pass it our trained classifier.

In [None]:
mylistener = MyListener(10, cl)

Connect the listener to the stream, pass the authentication.

In [None]:
mystream = Stream(auth, listener=mylistener)

Create a list of keywords to filter the stream of new tweets.

In [None]:
keywords = ['Econometrics', 'Operations Research', 'Erasmus']

And with the keywords, start the stream.

In [None]:
mystream.filter(track = keywords)

Once we're done, disconnect from the stream.

In [None]:
mystream.disconnect()