## Introduction

Twitch.tv is a platform where users can stream videos of themselves playing video games live. One of the prominent features of twitch.tv is that each channel has a unique IRC channel that allows users who are viewing that channel to communicate. Due to the anonymity of the users in the chat though, civil discussion rarely takes place. More often than not, and especially in channels with large amounts of viewers, the chat devolves into a large portion of the users all concurrently spamming one or two different messages. The picture below is an example of this:

![Image of Chat](https://i.gyazo.com/554f05d25f4c18d5dd8e64a4cd009840.png)

In addition to text, twitch has a large quantity of humorous built-in emoticons that feature prominent streamers faces. Because of this, the chat often devolves into all of the users spamming the same face. For the sake of this project, we will refer to messages that are commonly spammed as memes.

With this, we pose the question: How do the content and virality of memes affect the overall composition of twitch chat rooms and the advent of new and future memes?

We have sampled twitch chat data from 10 popular League of Legends streamers, all of which pull in on average 20,000+ viewers per streaming session. While these viewers may not be concurrent nor even unique per stream, we attempt to assess the content of the memes to see whether or not individual streaming personalities can create their own community memes that don't carry over from channel. We first see if we can classify streams by their twitch chat content. We will run a multinomial Naive bayes classifier on text data from different streams with the documents being different stream sessions. We then will attempt to generate memes with an ngram model to see if these memes are different in scale/content.

### Getting Data

The first task is to pull data from a twitch chat that's currently active. Twitch does not store chat logs for the average user to see, so we need to use a bot to sit in streamers chat rooms and pull all incoming messages and store them to a file.

Our group didn't have much experience mining IRC chats with text and sockets, so we found a twitch chat bot online and modified it for our needs. The link to the github we used is in the code block.

We made an account 'dankmemebot1' and generated an oauth key for that account using Twitch's API. We then registered a list of channels we would pull data from and set up files to store the extracted text in. The account was used to enter the IRC of each channel and to send messages later on to see whether our ngram generated memes would pass as user generated memes.

In [3]:
# Set up for methods and stuff
#https://github.com/RubbixCube/Twitch-Chat-Bot-V2

import datetime
import socket
import select
import re

channels = [
    'tsm_bjergsen',
    'c9sneaky',
    'trick2g',
    'tsm_doublelift',
    'nightblue3',
    'imaqtpie',
    'rush',
    'admiralbulldog',
    'tsm_theoddone',
    'wingsofdeath'
    ]
username = 'dankmemebot1'
oauth = 'oauth:f1d7mm17vlzjols100etso2zg9jqru'

channelfiles = {}

for name in channels:
    channelfiles[name] = open(name+'.txt', 'a+')
#     channelfiles[name].write("\n \n------------ \n \n \n \n")
#     channelfiles[name].write("new session\n")
#     channelfiles[name].write("\n \n \n ------------ \n \n \n")

def ping():
    ''' Respond to the server 'pinging' (Stays connected) '''
    socks[0].send('PONG :pingis\n')
    print('PONG: Client > tmi.twitch.tv')

def sendmsg(chan,msg):
    ''' Send specified message to the channel '''
    socks[0].send('PRIVMSG '+chan+' :'+msg+'\n')
    print('[BOT] -> '+chan+': '+msg+'\n')

def sendwhis(user,msg):
    socks[1].send('PRIVMSG #jtv :/w '+user+' '+msg+'\n')
    print('[BOT] -> '+user+': '+msg+'\n')

def getmsg(msg):
    ''' GET IMPORTANT MESSAGE '''
    if(re.findall('@(.*).tmi.twitch.tv PRIVMSG (.*) :(.*)',msg)):
        msg_edit = msg.split(':',2)
        if(len(msg_edit) > 2):
            user = msg_edit[1].split('!',1)[0] # User
            message = msg_edit[2] # Message
            channel = re.findall('PRIVMSG (.*)',msg_edit[1]) # Channel
#             print channel[0][1:-1]
            privmsg = re.findall('@(.*).tmi.twitch.tv PRIVMSG (.*) :(.*)',msg)
            ''' CONVERT TO ARRAY '''
            privmsg = [x for xs in privmsg for x in xs]

            datelog = datetime.datetime.now()

            ''' PRINT TO CONSOLE '''
            if(len(privmsg) > 0):
                if len(channel) > 0:
                    chan = channel[0] 
                    if len(chan) >= 3:
    
                        #print ('['+str(datelog.hour)+':'+str(datelog.minute)+':'
#         +str(datelog.second)+'] '+user+' @ '+channel[0][:-1]+': '+message+'\n')
                        channelfiles[channel[0][1:-1]].write('['+str(datelog.day)+':'
                                                             +str(datelog.hour)+':'+str(datelog.minute)+':'
                                                             +str(datelog.second)+'] '+user+' @ '+channel[0][:-1]
                                                             +': '+message+'\n')

print "finished"
socks = [socket.socket(),socket.socket()]
socks[0].connect(('irc.twitch.tv',6667))

socks[0].send('PASS '+oauth+'\n')
socks[0].send('NICK '+username+'\n')

#https://github.com/RubbixCube/Twitch-Chat-Bot-V2

finished


18

Next, we had to use these methods and the API to save the chat from when we monitored it. Again, we didn't really know how to do this, but the github had an example way of connecting to a server, so we followed that and read chat from all of the registered list of channels and saved them all. We saved the data from each channel in its own text file.

In [None]:
#https://github.com/RubbixCube/Twitch-Chat-Bot-V2

socks = [socket.socket(),socket.socket()]

socks[0].connect(('irc.twitch.tv',6667))

socks[0].send('PASS '+oauth+'\n')
socks[0].send('NICK '+username+'\n')

for val in channels:

    socks[0].send('JOIN #'+val+'\n')
    
print('Connected to irc.twitch.tv on port 6667')
print('USER: '+username)
print('OAUTH: oauth:'+'*'*len(oauth))
print('\n')

temp = 0
count = 0
while True:
  
    (sread,swrite,sexc) = select.select(socks,socks,[],120)
    for sock in sread:
  
        ''' Receive data from the server '''
        msg = sock.recv(2048)
        if(msg == ''):
            temp + 1
            if(temp > 5):
                print('Connection might have been terminated')
    
        ''' Remove any linebreaks from the message '''
        msg = msg.strip('\n\r')

        ''' DISPLAY MESSAGE IN SHELL '''
        getmsg(msg)
#         print(msg)

        # ANYTHING TO DO WITH CHAT FROM CHANNELS
        ''' GET THE INFO FROM THE SERVER '''
        check = re.findall('@(.*).tmi.twitch.tv PRIVMSG (.*) :(.*)',msg)
        if(len(check) > 0):
            msg_edit = msg.split(':',2)
            if(len(msg_edit) > 2):
                user = msg_edit[1].split('!',1)[0] # User
                message = msg_edit[2] # Message
                channel = msg_edit[1].split(' ',2)[2][:-1] # Channel
                msg_split = str.split(message)
                        
        # ANYTHING TO DO WITH WHISPERS RECIEVED FROM USERS
        check = re.findall('@(.*).tmi.twitch.tv WHISPER (.*) :(.*)',msg)
        if(len(check) > 0):
            msg_edit = msg.split(':',2)
            if(len(msg) > 2):
                user = msg_edit[1].split('!',1)[0] # User
                message = msg_edit[2] # Message
                channel = msg_edit[1].split(' ',2)[2][:-1] # Channel

                whis_split = str.split(message)                           

        ''' Respond to server pings '''
        if msg.find('PING :') != -1:
            print('PING: tmi.twitch.tv > Client')
            ping()
            
print "stopped"

#https://github.com/RubbixCube/Twitch-Chat-Bot-V2

### Preprocessing

Now we needed to parse the raw output from the IRC into just the messages themselves because we don't care about who was sending the message or the timestamp. An example message output from the IRC looks like:

[11:12:22:40] delriopie @ #rush: PogChamp

So this really is just a matter of stripping all of the text before the colon and storing it somewhere. We just put each specific channel's parsed text into another text file for further storage. All of this was just simple string processing.

In [5]:
def parseTwitchChat(textpath):
    #returns an array of text
    text = []
    with open(textpath, 'r') as f:
        for line in f:
            text.append(line.split())
    count = 0
    newtext = []
    for line in text:
        
        if len(line) > 2:
            count += 1
            if line[1] == 'PRIVMSG':
                line = line[3:]
                
                line[0] = line[0][1:]
                newtext.append(line)
#                 if count <= 10:
#                     print line
            else:    
                line = line[4:]
                newtext.append(line)
#                 if count <= 10:
#                     print line


    return newtext
def parseAllFiles():
    
    textFiles = {}
    labels = []

    for i in channels:
        textFiles[i] = (parseTwitchChat(i + ".txt"))
        labels.append(i)

    for i in channels:

        for k in range(len(textFiles[i])): #for array inside textfile
            if (k % 1000) == 0:
                with open('league/'+i+'/' + i + '_' + str(k) +'_parsed.txt', 'a+') as f: #open every 1000 lines
                    f.truncate()
                    lenleftover = len(textFiles[i]) - k
                    if lenleftover >= 1000:
                        lenleftover = 1000

                    for j in range(lenleftover): #indivudal word in array
                        for l in textFiles[i][j+k]: #iterate through 1000 times for every 1000 lines
                            f.write(l +' ') #l = word
                        f.write('\n')
with open('league/c9sneaky/c9sneaky_0_parsed.txt','r') as f:
    for i,b in enumerate(f):
        print b
        if i > 5: 
            break



LONGER gachiGASM 

SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS SourPls qtpDANCE qtpPLS Sou\rPls qtpDANCE qtpPLS SourPls qtpDANCE . 

qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls 

DOO DOO DOO DOOO bunWeeb 

YES 

UNSUBBED 

TT 



## Meme Classification

The first thing we want to check is whether we can classify streams by their twitch chat content. We run a multinomial Naive Bayes classifier on text data from different streams with the documents being different stream sessions. We take the different streams (by channel) as the labels that we input into our classifier. We then will take more twitch chat data and attempt to predict which stream it came from based off its content. Note that this task is not easy, often two chatrooms can look very similar despite being different channels. Below is an example of two completely different chatroom that look identical:

![Image of Comparison](http://i.imgur.com/RLgwivV.png)
 
We believe that Naive Bayes is an appropriate test for our question because we believe that despite the massive quantity of shared memes, different streams also have a fair amount of unique memes. The uniqueness of memes (which can often makeup large portions of chat) is in part due to the composition of the twitch chat rooms which are different per streamer. Many streamers have unique emotes for their own chatrooms too, making classifying messages that contain these emotes fairly easy (although often emotes from larger streamers spill over into the chat of other streamers). If we have a high success rate, we can conclude that memes do correlate to the composition of twitch chat rooms. However, if our classifier cannot correctly classify twitch chat data based on streamers, we assess that memes are not unique and are in fact shared between different streamers. We then conclude that twitch chat is homogenous in nature and is highly similar between streams. Although Naive Bayes is a simple classifier, it is very accurate and hence can cause some issues in our analysis. If our test documents are very similar, our classifier might classify it as different based off small intricacies between each stream. Although at this point it is hard to figure out what the minor differences are between streams, in general we must account for this concern in accuracy.
 
We use sci-kits sklearn API with multinomialNB as our classifier. We then use other scikit tools such as count vectorizer and tfdif, in order to transform our data into machine learning recognizable data.

In [23]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.datasets import load_files

#load files from very specific format check 
#http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_files.html#sklearn-datasets-load-files
files_train = load_files('league')

#vectorize the data
count_vect = CountVectorizer()

#vector transform 
# print unicode(files_train.data)
X_train_counts = count_vect.fit_transform([unicode(s, errors='replace') for s in files_train.data])

tf_transformer = TfidfTransformer().fit(X_train_counts)

X_train_tfidf = tf_transformer.transform(X_train_counts)

clf = MultinomialNB().fit(X_train_tfidf, files_train.target)

files_test = load_files('test')

test_vect = CountVectorizer()

test_train_counts = count_vect.transform(files_test.data)

Test_train_tfidf = tf_transformer.transform(test_train_counts)

ypred = clf.predict(Test_train_tfidf)
print "errors: " + str((files_test.target != ypred).sum())
print "total: " + str(len(files_test.target))
print "error rate: " +  str((files_test.target != ypred).sum()/(float(len(ypred))))


errors: 10
total: 27
error rate: 0.37037037037


This corresponds to an error rate of 10/27, which is approximately 37%. That's extremely good considering how often chat's look identical and how many different streams we were trying to differentiate between. If we were just blindly guessing our error rate would be around 90%! With an error rate of 37% this would lead us to believe that our sample of twitch chat data can be differentiated between and thus the content of memes do affect the composition of twitch chat.

Apart from this, we cannot conclude much more; we can hypothesize that the responses that streamers give to viewers generate the similar reactions. We could surmise that viewers are not locked to only one stream and view many streams concurrently and thus take their memes with them. The specificty of the game also can be factor in determining chat meme content. It could also be possible that memes of the lowest common denominator appeal to a wider audience which could overwhelm the more localized memes that are prevalent per streamer.

If we were doing further analysis, we could probably group different chatrooms into different categories; many streamers have varying levels of moderation and tolerate different memes differently. For example a few of the chatrooms we datamined contained a lot offensive content, whereas other chatrooms were heavily moderated and did not have that content. Other chatrooms disallow lots of spam. If we classified based on this, we could probably group the chatrooms accurately and have most of our error be in between groups.




# Meme Generation

In order to analyze how the content of memes affect future memes, we need a way to use old memes to create new memes and determine if these new memes are viral. An ngram language model is a simple language model that fits this exact need. The ngram language model trains on past data and extracts the ngrams, word tuples of length n, from the text. This model captures the relationship between words that frequently occur together, which it uses to generate phrases. 

To make a ngram model, we need to first process the data. We added !END! tags to the end of every chat line to indicate the end of a message. We then tagged all tokens with less than 5 occurances in the file as UNKNOWN words, which we will deal with later. We create our ngram model with all of our collected data in order to create the most wholesome memes. To generate a phrase, we start with a random ngram and look at all words that followed it in our data. We then randomly choose a word from that list with the most occuring word more likely to be chosen. We keep generating until the maximum length has been reached or an !END! tag has been reached. 



In [24]:
import math
import os
import nltk
import glob
import random
from collections import Counter
from itertools import groupby as g

def tokenize_by_word(sentence):
    return nltk.word_tokenize(sentence)

def most_common(L):
    return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
class NgramModel:
    """

    __init__ initializes a NgramModel. 
    training preprocessing: add !START! and !END! tokens to each line. 
    Parameters:
    n                       the n used in the ngram model
    train                   a string containing a training text
    unknown_replace_limit   if a word in the training text appears less than this many times, replace it with 'UNKNOWNWORD'
    """
    def __init__(self, n,tokens=[], train="",filenames="", UnknownCount=0):
        self.filenames = filenames
        self.UnknownCount = UnknownCount
        if (n < 0):
            raise Exception("N must be greater than or equal to zero to make an Ngram model.")
        if len(tokens) == 0:
            self.tokens = self.tokenize_from_files()
            self.tag_with_unknown()
        else:
            self.tokens = tokens
            self.tag_with_unknown()

        self.n = n
        
        self.model = self.build_model()

    def tag_with_unknown(self):
        token_count = Counter(self.tokens)
        def mark(token):
            if token_count[token] < self.UnknownCount:
                return "!UNKNOWN!"
            else:
                return token
        tokens = [mark(token) for token in self.tokens ]
        self.tokens = tokens
    def tokenize_from_files(self):
        file_tokens = []
        for fname in glob.glob(self.filenames):
            with open(fname, 'r') as f:

                for i in f:
                    tokens = i.split()
                    tokens.append(None)
                    file_tokens.extend(tokens)
        return file_tokens
    def build_model(self):
    #     build a language model from ngrams
        tokens = self.tokens
        n = self.n
        model = dict()
        if len(tokens) < n:
            return model
        for i in range(len(tokens) - n):
            gram = tuple(tokens[i:i+n])
            next_tok = tokens[i+n]
            if gram in model:
                model[gram].append(next_tok)
            else: 
                model[gram] = [next_tok]
        final_gram = tuple(tokens[len(tokens)-n:])

        return model
    def generate(self, seed=None, max_iterations=50):
        model = self.model

        n = self.n
        seed = random.choice(model.keys())
        flag = True
        while flag:
            flag = False
            for i in seed:
                if i is None or i == "!UNKNOWN!":
                    flag = True
            if flag:
                seed = random.choice(model.keys())

        output = list(seed)
        current = tuple(seed)
        for i in range(max_iterations):
            next_token = random.choice(model.keys())[0]
            while next_token == "!UNKNOWN!":
                next_token = random.choice(model.keys())[0]
            if current in model:
                possible_next_tokens = model[current]
                if len(possible_next_tokens) > 1:
                    next_token = random.choice(possible_next_tokens)
                    while next_token == "!UNKNOWN!":
                        possible_next_tokens.remove(next_token)
                        try:
                            next_token = random.choice(possible_next_tokens)
                        except:
                            next_token = random.choice(model.keys())[0]
                            while next_token == "!UNKNOWN!":
                                next_token = random.choice(model.keys())[0]

            if next_token is None: break
            output.append(next_token)
            current = tuple(output[-n:])
            
        return output

files = []
file_tokens = []
for fname in glob.glob("league/*/*.txt"):
    with open(fname, 'r') as f:
        
        for i in f:
            tokens = i.split()
            if len(tokens) > 0 and "@" not in tokens[0] and "!" not in tokens[0] :
                tokens.append(None)
                file_tokens.extend(tokens)

    break
        
            

    

bigram = NgramModel(3, file_tokens, filenames="league/*/*.txt", UnknownCount=15)
counter = 0

for i in bigram.model:
    print i,":", bigram.model[i]
    if counter > 5:
        break
    counter += 1
print "Model Created"


('haHAA', '!UNKNOWN!', None) : ['haHAA']
('!UNKNOWN!', 'ANELE', None) : ['ANELE', 'in', '!UNKNOWN!']
('BEST', 'SONG', 'bunWeeb') : [None, 'BEST', 'BEST', None, 'BEST', 'BEST', 'BEST', None, 'BEST', 'BEST', 'BEST', 'BEST', None, 'BEST', 'BEST', 'BEST', None, 'BEST', 'BEST', None]
('Kreygasm', None, 'TT') : ['Kreygasm']
('gachiGASM', None, 'SourPls') : ['qtpDANCE']
('cmonBruh', 'cmonBruh', 'cmonBruh') : ['cmonBruh', None]
('cmonBruh', None, 'Kreygasm') : ['Kreygasm']
Model Created


This just creates a model with which we can generate memes; to generate a list of memes we call the code below:

In [25]:
fresh_memes = []
for i in range(10):
    meme = " ".join(bigram.generate())
    print meme
    print 
    fresh_memes.append(meme)



qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE

Ban bunWeeb .

bunWeeb Me Spam bunWeeb Mods Baka bunWeeb If Ban bunWeeb bunWeeb Me Weeb bunWeeb Me Spam bunWeeb Mods Baka bunWeeb If Ban bunWeeb

VoHiYo PogChamp Kreygasm LOL If cmonBruh

IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited gachiGASM bunWeeb LUL SneakyPls

LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebaited LINK IT Jebait

As you can see, a majority of our generated messages contain emotes in them, like "SourPls."  This is expected, as a majority of twitch message are just emotes with some loosely-related text inbetween. However, one of the problems that arises because of this is that often times our generated messages are just incomplete nonsense with random emotes as spacing. Another issue that arises is that the chatrooms with more spam has a significantly larger effect on the generated memes. As you can see a majority of the emotes start with the words 'Sneaky' or 'qtp', which means that they each are associated to the streamers Sneaky and QTPie. These streamers have notoriously low moderation and have a ridiculous amount of spam passing through them. For example, despite the fact that we pulled the chat from each stream for a similar amount of time and that a majority of streamers had similar amounts of views while we were recording their chats, Sneaky and QTPie each had over 5 thousand more messages than the next highest streamers. Clearly this is represented in our memes.

Next we want to measure how our computer generated memes stack up to human generated memes. Typically a succesful human generated meme will gather traction and be spammed by other users after being entered into the chat a few times. So we wrote a script that does that. Typically twitch has the ability to detect a chatbot if it enters messages too fast, so we set up our script to spam the input message once ever 30 seconds for 5 minutes and store all of the resulting output in a text file. We then counted how many times it was spammed.

In [22]:
import time

def spam(chan, fname, message1, username, oauth):

    socks[0].send('JOIN '+chan+'\n')
        
    print('Connected to irc.twitch.tv on port 6667')
    print('USER: '+username)
    print('OAUTH: oauth:'+'*'*30)
    print('\n')

    temp = 0
    count = 0
    starttime = time.clock()
    endtime = starttime
    spamtime = -1

    while (endtime-starttime) <= 180:
      
        (sread,swrite,sexc) = select.select(socks,socks,[],120)
        for sock in sread:
              
            if spamtime == -1 or spamtime >= 30:
                socks[0].send('PRIVMSG '+chan+' :'+message1+'\n')
#                 print "spam"+fname
                spamtime = 0
                
            ''' Receive data from the server '''
            msg = sock.recv(2048)
            if(msg == ''):
                temp + 1
                if(temp > 5):
                    print('Connection might have been terminated')
        
            ''' Remove any linebreaks from the message '''
            msg = msg.strip('\n\r')

            ''' DISPLAY MESSAGE IN SHELL '''
            getmsg(msg)
            with open(fname, 'a+') as f:
                f.write(msg + ' \n')
                f.close()
    #         print(msg)

            # ANYTHING TO DO WITH CHAT FROM CHANNELS
            ''' GET THE INFO FROM THE SERVER '''
            check = re.findall('@(.*).tmi.twitch.tv PRIVMSG (.*) :(.*)',msg)
            if(len(check) > 0):
                msg_edit = msg.split(':',2)
                if(len(msg_edit) > 2):
                    user = msg_edit[1].split('!',1)[0] # User
                    message = msg_edit[2] # Message
                    channel = msg_edit[1].split(' ',2)[2][:-1] # Channel

                    msg_split = str.split(message)

            ''' Respond to server pings '''
            if msg.find('PING :') != -1:
                print('PING: tmi.twitch.tv > Client')
                ping()
            
            spamtime += time.clock() - endtime
#             print spamtime
            endtime = time.clock()
            
    print "stopped"

for i, meme in enumerate(fresh_memes[1:3]):
    print meme
    chan = "#tsm_theoddone"

    filename = chan+"_"+str(i)+"_testspam.txt"
    spam(chan, filename, meme, username, oauth)

SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls qtpDANCE SneakyPls
Connected to irc.twitch.tv on port 6667
USER: dankmemebot1
OAUTH: oauth:******************************


sneakyFedora sneakyFedora .
Connected to irc.twitch.tv on port 6667
USER: dankmemebot1
OAUTH: oauth:******************************




Now we have a method that lets us spam our generated memes in a particular chat. We now just count the number of times our meme appeared in the following chat.

In [29]:
counter = 0
memes= ["it doesnt give EleGiggle", "jungler ?? !!" ]
for meme, fname in zip(memes, glob.glob("spam/*")):
    counter = 0
    with open(fname, 'r') as f:

        for i in f:

            if meme in i:
                
                counter += 1
    print meme + ": " + str(counter)

it doesnt give EleGiggle: 0
jungler ?? !!: 0


As we can clearly see above, the generator did not work. While some of these generated memes looked similar to nonsensical Twitch chat, nobody rechatted the lines, 

One of the major problems was probably our ngram model. The trigram model we used only stored messages that had 3 tokens or more. While trigrams are usually good for analyzing text, Twitch chat messages tend to be very short or extremely long, which isn't understood by a trigram model. Another problem with the trigram model was that a lot of chat messages frequently consisted of unique tokens such as ascii art or unique emotes represented as proper nouns. Due to the nature of the ngram model, these unique tokens frequently skew the generator because since most of these messages are only sent once, the model only has 1 message to generate from, leading to unoriginal memes.

We also hypothesize that memes that naturally occur in twitch chat often occur at specific times, as a relation to events that happen on the stream. We can see this with high clusters of chat data being transmitted in close proximity of each other. Due to our lackluster performance of our ngram model, we cannot confirm that memes generated from twitch chat itself will form and take shape in new memes. This leans toward our idea that memes are in fact generated as a response to content from the streamer rather than the chat itself. Thus twitch chat probably does not create new memes.

For further exploration in generating memes, one should look into ways that augment the ngram model, or other, better predictive models. An easy augmentation could be using a backoff model. A backoff model is a ngram model that combines multiple ngram models into 1. It will always try to match on the longest ngram possible, but when that fails, that model will back off and use a ngram with a lower n. This model will capture more of the short length tokens while providing context for longer length tokens. 
