# Joel's Pokemon Twitter Analysis

Hello! So I decided that it would be neat to do some unsupervised learning on Pokemon tweets (since Pokemon Go has been so popular recently). This is my exploratory analysis on the tweets that I amassed. 

##  Connecting to Twitter API

Here is the code to connect to Twitter's API.

In [None]:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import simplejson


def values():
    """access_token = input("Please input access_token: ")
    access_token_secret = input("Please input access_secret_token: ")
    consumer_key = input("Please input consumer_key: ")
    consumer_secret = input("Please input consumer_secret: ")"""

    return(access_token, access_token_secret, consumer_key, consumer_secret)

#Basic Listener

class StdOutListener(StreamListener):
    vals = []
    def on_data(self, data):
        #print data
        print(data)
        return True

    def on_error(self, status):
        print(status)


if __name__ == "__main__":
    #This handles Twitter Auth
    data = []
    try:
        access_token, access_token_secret, consumer_key, consumer_secret = values()
        listener = StdOutListener()
        auth = OAuthHandler(consumer_key, consumer_secret)
        auth.set_access_token(access_token, access_token_secret)
        stream = Stream(auth, listener)

        #This line filters twitter Streams to caputre keywords
        stream.filter(track=["Pokemon Go", "pokemon", "Pokemon"], languages=["en"])
    except KeyboardInterrupt:
        #Press Control-C
            pass


Now the way that I set this up was that I took the output of the python file (through the print() statments) and used as input to a text file. Here is the bash command that I used to run this.  

python file_name.py > tweets.txt 

## Transform Text Data to JSON  

In [None]:
import json

tweets = []

poke_tweet  = open("filename.txt", "r")

for line in poke_tweet:
    try:
        tweet = json.loads(line)
        tweets.append(tweet)
    except:
        continue


poke_tweet.close()
with open("filename.json", "w") as objectfile:
    json.dump(tweets, objectfile, indent=4)

print("Done")

## Training the K-Means Algorithm  

I decided to use the K-Means Clustering algorithm. The way that the algorithm works is that it takes a sample from the data and maps it to the closest cluster by using its mean. Here is the code that I used to train the K-Means Algorithm.

In [None]:
from sklearn.cluster import KMeans
import pickle
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from matplotlib import pyplot as plt
import numpy as np
import pickle
import time

start = time.time()
df = pd.read_json("/Users/Joel/Desktop/Tweets/final_poke_tweets.json")
text_data = df["text"].fillna('')

vect = TfidfVectorizer(stop_words='english')
final_text_data = vect.fit_transform(text_data)


print('Training....')
k = 5
model = KMeans(n_clusters=k, init='k-means++', max_iter=100, n_init=1)
model.fit(final_text_data)

f = pickle.dump(model, open('/Users/Joel/Desktop/Tweets/kmeans_ipy.pkl','wb'))
total_time = time.time() - start
print("The algorithm ran in %3.f" % total_time)

### Top Ten Terms Per Cluster  

In [None]:
k = 5
print('Top terms per cluster:')

with open('/Users/Joel/Desktop/Tweets/kmeans_ipy.pkl', 'rb+') as f:
    model = pickle.load(f)
order_centroids = model.cluster_centers_.argsort()[:,::-1]
terms = vect.get_feature_names()

print("Top ten terms per cluster.")
for i in range(k):
    print("Cluster %d: " % i)
    for ind in order_centroids[i, :10]:
        print(' %s' % terms[ind]),
    print

Here is the output: 

Top ten terms per cluster. 
Cluster0: 
 pokemon
 https
 rt
 catch
 pokemongo
 playing
 like
 video
 youtube
 just

Cluster1: 
 people
 pokemon
 rt
 playing
 fact
 asap
 troubling
 deleting
 https
 like

Cluster2: 
 play
 pokemon
 rt
 https
 told
 omgtsn
 1wontbddqv
 parents
 lol
 wanna

Cluster3: 
 rare
 rt
 https
 broken
 step
 nearby
 catchemali
 bmchh7mwn1
 qabriels
 takei

Cluster4: 
 rt
 glockachu
 yard
 21savaage
 gone
 come
 trained
 looking
 night
 pokemon

## Predict a new tweet

In [None]:
with open('/Users/Joel/Desktop/Tweets/kmeans_ipy.pkl', 'rb+') as f:
    model = pickle.load(f)

df = pd.read_json("/Users/Joel/Desktop/Tweets/final_poke_tweets.json")
text_data = df["text"].fillna('')

pred_tweet = "Niantic Labs has made a public statement regarding their recent banning of accounts in"


vect = TfidfVectorizer(stop_words='english')
vec = vect.fit_transform(text_data)

final_pred_data = vect.transform(pred_tweet.split('\n'))

prediction = model.predict(final_pred_data)

That's all I have for now. Later on I will be including visualizations. 