# Light Beer Network Analysis
---
### APRD 6346
### Author: Matt Hardwick

This notebook performs network analysis on tweets involving the competiting light beer brands of *Bud Light*, *Coors Light*, and *Miller Light*. The first set of networks are Mention Networks that analyze which Twitter users are tweeting at which brand. The second set of networks are Semantic Networks that associate specific words from tweets with their respective brands.

## Data Extraction
---

In [1]:
import glob
import os
import shutil
import zipfile
import json
import csv
import re
import string
import itertools

import nltk # text processor
wn = nltk.WordNetLemmatizer()
ps = nltk.PorterStemmer()
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

from time import sleep
from textblob import TextBlob

In [2]:
# create temporary directory to extract zip files to
TMPDIR = 'tmp'

if not os.path.exists(TMPDIR):
    os.makedirs(TMPDIR)

In [3]:
tweetzipfiles = glob.glob('*.zip')
tweetzipfiles

['miller_lite_OR_millerlite.zip',
 'bud_light_OR_budlight.zip',
 'coors_light_OR_coorslight.zip']

In [4]:
# extract all tweets to temp directory
for tweetzipfile in tweetzipfiles:
    with zipfile.ZipFile(tweetzipfile, 'r') as f:
        print('Unzipping to tmp directory: %s' % tweetzipfile)
        f.extractall(TMPDIR)

Unzipping to tmp directory: miller_lite_OR_millerlite.zip
Unzipping to tmp directory: bud_light_OR_budlight.zip
Unzipping to tmp directory: coors_light_OR_coorslight.zip


## Mention Network
---
The Mention Network will display the users who are tweeting at each light beer brand. Because there are so many tweets, we will filter just the users who have tweeted about a brand multiple times.

In [5]:
# create dictionary with profile names and number of tweets
uniqueusers = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        count += 1
        if count % 1000 == 0:
            print("Tweets opened: %s" % count)
            
        tweetjson = json.load(f)
        userwhotweeted = tweetjson['user']['screen_name']
        
        if userwhotweeted in uniqueusers:
            uniqueusers[userwhotweeted] += 1
        if userwhotweeted not in uniqueusers:
            uniqueusers[userwhotweeted] = 1

Tweets opened: 1000
Tweets opened: 2000
Tweets opened: 3000
Tweets opened: 4000
Tweets opened: 5000
Tweets opened: 6000
Tweets opened: 7000
Tweets opened: 8000
Tweets opened: 9000
Tweets opened: 10000


In [6]:
len(uniqueusers)

9130

In [7]:
# users who tweet more than once
userstoinclude = set()
usercount = 0

for user in uniqueusers:
    if uniqueusers[user] > 1:
        usercount += 1
        userstoinclude.add(user)
        
print(len(userstoinclude))

662


To create a network in Gephi, the data needs to be formatted as an edge list.

In [8]:
# initializing edge list csv
edgelist = open('lightbeer.mention.full.gephi.csv', 'w')
csvwriter = csv.writer(edgelist)
header = ['Source', 'Target']
csvwriter.writerow(header)

15

In [9]:
count = 0
print('Writing edge list')

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
        userwhotweeted = tweetjson['user']['screen_name']
        if userwhotweeted in userstoinclude:
            count += 1
            if count % 1000 == 0:
                print("Tweets written: %s" % count)
                
            users = tweetjson['entities']['user_mentions']
            if len(users) > 0:
                for user in users:
                    screenname = user['screen_name']
                    row = [userwhotweeted, screenname]
                    csvwriter.writerow(row)

edgelist.close()

Writing edge list
Tweets written: 1000
Tweets written: 2000


<img src="files/Mention Full.png">

#### Top Users Per Brand

*Coors Light*: marketingdive, sullnmika, Doritos, MKEBizJournal, CarpeZytha, SLeskMBJ, lewbryson, oldmudgie, nwi_jsp, jamesbwxm

The main users tweeting @CoorsLight are beer reviewers, pubs, and journalists. A good marketing strategy would be to supply these users with beer samples and merchandise so they may promote Coors products. A promotion with Doritos could also improve sales, as beer and chips are commonly consumed together.

*Bud Light*: rkbennet, Tyne_Ag, MillerCoors, Cornfrmr, NationalCorn, paulwhittington, BlueJacketsNHL, CarpeZytha, lewbryson, MKEBizJournal

The main users tweeting @budlight are farmers and beer reviewers. Advertising *Bud Light* to regions with dense agricultural populations could lead to increased sales. Many of the reviewers also reviewed *Coors Light*, so Budweiser is competing for their mentions. A promotion with the Colombus Blue Jackets would appeal to sports fans and may even reach other NHL teams.

*Miller Lite*: AdamDCollins, dennismonsewicz, M2Third, KCLiveBlock, csbev, Cornlrmr, ComClassic, mattcham37, vogeliowa, Morning_Ag

The main users tweeting @MillerLite are farmers and beer distributors. A marketing strategy similar to the *Bud Light* recommendation above would improve sales to farmers, and would lead to increased competition. Interestingly, MillerCoors bridges *Bud Light* and *Miller Light* together. This account can be promoted to improve Miller's image while hurting Bud's.

### Verified Users
We can perform the same network analysis on verified Twitter users. These users generally have more social media influence than unverified users.

In [10]:
verifiedusers = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        count += 1
        if count % 1000 == 0:
            print("Tweets opened: %s" % count)
        tweetjson = json.load(f)
        userwhotweeted = tweetjson['user']['screen_name']

        verified = tweetjson['user']['verified']
        if verified == True:
        
            if userwhotweeted in verifiedusers:
                verifiedusers[userwhotweeted] += 1
            if userwhotweeted not in verifiedusers:
                verifiedusers[userwhotweeted] = 1

Tweets opened: 1000
Tweets opened: 2000
Tweets opened: 3000
Tweets opened: 4000
Tweets opened: 5000
Tweets opened: 6000
Tweets opened: 7000
Tweets opened: 8000
Tweets opened: 9000
Tweets opened: 10000


In [11]:
len(verifiedusers)

224

In [12]:
# initializing edge list csv
edgelist = open('lightbeer.mention.verified.gephi.csv', 'w')
csvwriter = csv.writer(edgelist)
header = ['Source', 'Target']
csvwriter.writerow(header)

15

In [13]:
print('Writing edge list')

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
        userwhotweeted = tweetjson['user']['screen_name']
        if userwhotweeted in verifiedusers:
            users = tweetjson['entities']['user_mentions']
            if len(users) > 0:
                for user in users:
                    screenname = user['screen_name']
                    row = [userwhotweeted, screenname]
                    csvwriter.writerow(row)

edgelist.close()

Writing edge list


<img src="files/Mention Verified.png">

#### Top Verified Users Per Brand

*Coors Light*: kentjlewis, iDarija, NBCSPhilly, AdAgeIn, beerbabe, TheDrum, DuffersTavern, lewbryson

Marketers, beer writers, and pubs again dominate *Coors Light* mentions. Coors will compete with Bud over writers and marketing agencies, but NBC Sports Philly offers opportunities for good promotions. Coors can improve its sales by marketing to sports fans on a major news platform such as NBC.

*Bud Light*: AdAgeIn, beerbabe, rkbennet, paulwhittington, AmerksHockey, BlueJacketsNHL, peterfrost, MillerCoors

Again, *Bud Light* cements itself between Coors and Miller, drawing competition from their mutual tweeters. Marketers, beer writers, and hockey teams are tweeting at Bud the most. Sending swag to the writers and promoting with hockey teams can help improve sales.

*Miller Lite*: SyracuseCrunch, Molson_Canadian, Stephens2727, MillerCoors, Beer_Notes, peterfrost

Hockey and writers are the two themes tweeting at *Miller Lite*. Miller can compete with Bud over the hockey market and its fans. MillerCoors again bridges *Bud Light* with *Miller Light*, and its platform can help advertise Miller and boost its market share.

### Retweets and Favorites
Many Tweets go ignored, so filtering Tweets with at least one retweet and one favorite can remove users whose Tweets are less relevant.

In [14]:
rtusers = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    #print(fn)
    with open(fn) as f:
        count += 1
        if count % 1000 == 0:
            print("Tweets opened: %s" % count)
        tweetjson = json.load(f)
        userwhotweeted = tweetjson['user']['screen_name']
        
        rtcount = tweetjson['retweet_count']
        favcount = tweetjson['favorite_count']
        if rtcount > 1 and favcount > 1:
        
            if userwhotweeted in rtusers:
                rtusers[userwhotweeted] += 1
            if userwhotweeted not in rtusers:
                rtusers[userwhotweeted] = 1

Tweets opened: 1000
Tweets opened: 2000
Tweets opened: 3000
Tweets opened: 4000
Tweets opened: 5000
Tweets opened: 6000
Tweets opened: 7000
Tweets opened: 8000
Tweets opened: 9000
Tweets opened: 10000


In [15]:
len(rtusers)

271

In [16]:
# initializing edge list csv
edgelist = open('lightbeer.mention.rtfav.gephi.csv', 'w')
csvwriter = csv.writer(edgelist)
header = ['Source', 'Target']
csvwriter.writerow(header)

15

In [17]:
print('Writing edge list')

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
        userwhotweeted = tweetjson['user']['screen_name']
        if userwhotweeted in rtusers:
            users = tweetjson['entities']['user_mentions']
            if len(users) > 0:
                for user in users:
                    screenname = user['screen_name']
                    row = [userwhotweeted, screenname]
                    csvwriter.writerow(row)

edgelist.close()

Writing edge list


<img src="files/Mention RTfav.png">

#### Top Retweeted and Favorited Users Per Brand

*Coors Light*: lewbryson, WhiskeyRiff, andimariieee, JobmanAg

Country is the theme for Coors. The beer writer @lewbryson appears again, but the other three users enjoy the country lifestyle. Coors does a great job of marketing itself to southern states and the country market, and it should continue to do so to maximize its reach.

*Bud Light*: AmerksHockey, BlueJacketsNHL, lewbryson, kscornfed1, Cornfrmr, Tyne_Ag, MillerCoors

Hockey clubs, farmers, and writers are the three groups of Twitter users that interact with *Bud Light* the most across all three mention networks. Partnerships and promotions catered to these groups will help Bud reach its largest audience.

*Miller Lite*: AdamDCollins, csbev, BroughtTheDog, Cornfrmr, hopnotes

*Miller Lite* is all over the board, with users ranging from farmers to writers to distributors to a rock band. Rather than competing with Bud, Miller should focus its marketing efforts to rock groups and the music scene. No other bands have appeared in the mention networks, so Miller has an opportunity to reach a new Twitter market.

## Semantic Network
---
The semantic network will display key words from Tweets associated with each brand. Rather than showing who is talking about each brand, this network shows what people are saying.

In [18]:
# setting up stopwords
punctuation = string.punctuation
stopwordsset = set(stopwords.words("english"))
stopwordsset.add("'s")

In [19]:
#Removing urls
def removeURL(text):
    result = re.sub(r"http\S+", "", text)
    return result

#Extracting contextual words from a sentence
def tokenize(text):
    #lower case
    text = text.lower()
    #split into individual words
    words = word_tokenize(text)
    return words

def stem(tokenizedtext):
    rootwords = []
    for aword in tokenizedtext:
        aword = ps.stem(aword)
        rootwords.append(aword)
    return rootwords

def stopWords(tokenizedtext):
    goodwords = []
    for aword in tokenizedtext:
        if aword not in stopwordsset:
            goodwords.append(aword)
    return goodwords

def lemmatizer(tokenizedtext):
    lemmawords = []
    for aword in tokenizedtext:
        aword = wn.lemmatize(aword)
        lemmawords.append(aword)
    return lemmawords

def removePunctuation(tokenizedtext):
    nopunctwords = []
    for aword in tokenizedtext:
        if aword not in punctuation:
            nopunctwords.append(aword)
    cleanedwords = []
    for aword in nopunctwords:
        aword = aword.translate(str.maketrans('', '', string.punctuation))
        cleanedwords.append(aword)
    return cleanedwords

In [20]:
# create dictionary with words and weights
uniquewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
        count += 1
        if count % 1000 == 0:
            print("Tweets parsed: %s" % count)
            
        text = tweetjson['text']
        nourlstext = removeURL(text) #remove url
        tokenizedtext = tokenize(nourlstext) #separate text by each word
        nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
        lemmatizedtext = lemmatizer(nostopwordstext) #root words
        nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation

        for aword in nopuncttext:
            if aword in uniquewords:
                uniquewords[aword] += 1
            if aword not in uniquewords:
                uniquewords[aword] = 1

Tweets parsed: 1000
Tweets parsed: 2000
Tweets parsed: 3000
Tweets parsed: 4000
Tweets parsed: 5000
Tweets parsed: 6000
Tweets parsed: 7000
Tweets parsed: 8000
Tweets parsed: 9000
Tweets parsed: 10000


In [21]:
# filter words repeated at least 25 times
wordstoinclude = set()
wordcount = 0

for aword in uniquewords:
    if uniquewords[aword] > 25:
        wordcount += 1
        wordstoinclude.add(aword)
    
print(wordcount)

687


In [22]:
edgelist = open('lightbeer.semantic.full.gephi.csv', 'w')
csvwriter = csv.writer(edgelist)
header = ['Source', 'Target', 'Type'] #undirected network
csvwriter.writerow(header)

print("Writing Edge List")

uniquewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
        count += 1
        if count % 1000 == 0:
            print("Tweets parsed: %s" % count)
            
        text = tweetjson['text']
        nourlstext = removeURL(text) #remove url
        tokenizedtext = tokenize(nourlstext) #separate text by each word
        nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
        lemmatizedtext = lemmatizer(nostopwordstext) #root words
        nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation
        
        goodwords = []
        for aword in nopuncttext:
            if aword in wordstoinclude:
                goodwords.append(aword.replace(',',''))
        
        allcombos = itertools.combinations(goodwords, 2)
        for acombo in allcombos:
            row = []
            for anode in acombo:
                row.append(anode)
            row.append('Undirected')
            csvwriter.writerow(row)
            
edgelist.close()

Writing Edge List
Tweets parsed: 1000
Tweets parsed: 2000
Tweets parsed: 3000
Tweets parsed: 4000
Tweets parsed: 5000
Tweets parsed: 6000
Tweets parsed: 7000
Tweets parsed: 8000
Tweets parsed: 9000
Tweets parsed: 10000


<img src="files/Semantic Full.jpg">

#### Top Words Per Brand

*Coors Light*: another, hard, use, thought, ad, im, commercial, syrup, put, remember, ve, bowl

*Bud Light*: white, always, better, think, say, good, right, make, last, cold, fan, college

*Miller Lite*: anheuserbusch, summer, millercoors, away, way, mean, orange, top

Unfortunately, this basic semantic network isn't very intuitive. At a glance, *Coors Light* is commercialized, *Bud Light* is good for college fans, and *Miller Lite* is a summer beer. If that is how each brand wants to be perceived, then our work here is done. However, each light beer brand is competing to be the tastiest for all occasions. Adding filters to the Tweets can create more intuitive networks.

### Sentiment Analysis
Using the TextBlob package we can assign each Tweet a sentiment score. Modeling semantic networks for both positively and negatively scored words can provide insights on how each brand is perceived.

In [23]:
# positive sentiment
positivewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)

        text = tweetjson['text']
        
        sentimentscore = TextBlob(text)
        score = sentimentscore.sentiment.polarity
        
        if score > 0:
        
            nourlstext = removeURL(text) #remove url
            tokenizedtext = tokenize(nourlstext) #separate text by each word
            nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
            lemmatizedtext = lemmatizer(nostopwordstext) #root words
            nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation

            for aword in nopuncttext:
                if aword in positivewords:
                    positivewords[aword] += 1
                if aword not in positivewords:
                    positivewords[aword] = 1

In [24]:
positivewordstoinclude = set()
wordcount = 0

for aword in positivewords:
    if positivewords[aword] > 25:
        wordcount += 1
        positivewordstoinclude.add(aword)
    
print(wordcount)

500


In [25]:
edgelist = open('lightbeer.semantic.positive.gephi.csv', 'w')
csvwriter = csv.writer(edgelist)
header = ['Source', 'Target', 'Type'] #undirected network
csvwriter.writerow(header)

print("Writing Edge List")

positivewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
            
        text = tweetjson['text']
        
        sentimentscore = TextBlob(text)
        score = sentimentscore.sentiment.polarity
        
        if score > 0:
    
            nourlstext = removeURL(text) #remove url
            tokenizedtext = tokenize(nourlstext) #separate text by each word
            nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
            lemmatizedtext = lemmatizer(nostopwordstext) #root words
            nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation

            goodwords = []
            for aword in nopuncttext:
                if aword in positivewordstoinclude:
                    goodwords.append(aword.replace(',',''))

            allcombos = itertools.combinations(goodwords, 2)
            for acombo in allcombos:
                row = []
                for anode in acombo:
                    row.append(anode)
                row.append('Undirected')
                csvwriter.writerow(row)
            
edgelist.close()

Writing Edge List


<img src="files/Semantic Positive.jpg">

#### Top Positive Words Per Brand

*Coors Light*: bowl, think, look, said, even, might, anyone, brand, someone, corn, commercial

*Bud Light*: new, right, love, first, super, going, guy, everyone, lime, good, time, game

*Miller Lite*: thank, sale, nice, lot, tap, ipa, sound, craft

Each of these nodes stem from what TextBlob considers to be positive Tweets. *Coors Light* and *Bud Light* are close together and compete for some of the words. Both seem to have received attention from Super Bowl ads, but Coors is still more commercialized. On the other hand, Bud appeals to everyone.

*Miller Lite* stands away from the competition at the bottom of the network. It's perceived as a nicer beer than Coors and Bud. Miller should reinforce this perception and market to drinkers who want a light beer that isn't as low-end as *Coors Light* or *Bud Light*.

In [26]:
# negative sentiment
negativewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
        
        text = tweetjson['text']
        
        sentimentscore = TextBlob(text)
        score = sentimentscore.sentiment.polarity
        
        if score < 0:
        
            nourlstext = removeURL(text) #remove url
            tokenizedtext = tokenize(nourlstext) #separate text by each word
            nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
            lemmatizedtext = lemmatizer(nostopwordstext) #root words
            nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation

            for aword in nopuncttext:
                if aword in negativewords:
                    negativewords[aword] += 1
                if aword not in negativewords:
                    negativewords[aword] = 1

In [27]:
negativewordstoinclude = set()
wordcount = 0

for aword in negativewords:
    if negativewords[aword] > 25:
        wordcount += 1
        negativewordstoinclude.add(aword)
    
print(wordcount)

91


In [28]:
edgelist = open('lightbeer.semantic.negative.gephi.csv', 'w')
csvwriter = csv.writer(edgelist)
header = ['Source', 'Target', 'Type'] #undirected network
csvwriter.writerow(header)

print("Writing Edge List")

negativewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
            
        text = tweetjson['text']
        
        sentimentscore = TextBlob(text)
        score = sentimentscore.sentiment.polarity
        
        if score < 0:
    
            nourlstext = removeURL(text) #remove url
            tokenizedtext = tokenize(nourlstext) #separate text by each word
            nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
            lemmatizedtext = lemmatizer(nostopwordstext) #root words
            nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation

            goodwords = []
            for aword in nopuncttext:
                if aword in negativewordstoinclude:
                    goodwords.append(aword.replace(',',''))

            allcombos = itertools.combinations(goodwords, 2)
            for acombo in allcombos:
                row = []
                for anode in acombo:
                    row.append(anode)
                row.append('Undirected')
                csvwriter.writerow(row)
            
edgelist.close()

Writing Edge List


<img src="files/Semantic Negative.png">

#### Top Negative Words Per Brand

*Coors*: get, shit, know, need, right, crack, drunk

*Bud*: light, come, night, got, game

*Miller*: drink, lite, amp, cold

TextBlob only rated a handful of Tweets to be negative, so it's difficult to truly gain any insights from this network. Additionally, the full Twitter handles displayed in previous networks were filtered out. What remains are the shortened names for each beer brand. *Coors* seems to get the worst of the negative words, but its difficult to tell what users negatively think about *Bud* and *Miller*.

A better method of modeling negative sentiment is shown below.

### Specific Words
Often, light beers are considered to be low-quality and taste like urine. To find out which light beer tastes the most like pee, we can create a list of specific words to be modeled in our network.

In [29]:
# list of words to match
specificwords = ['pee', 'piss', 'urine']

uniquewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
            
        text = tweetjson['text']
        
        # matching specific words list
        match = 0
        for word in specificwords:
            if word in text:
                match += 1

        if match > 0:
        
            nourlstext = removeURL(text) #remove url
            tokenizedtext = tokenize(nourlstext) #separate text by each word
            nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
            lemmatizedtext = lemmatizer(nostopwordstext) #root words
            nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation

            for aword in nopuncttext:
                if aword in uniquewords:
                    uniquewords[aword] += 1
                if aword not in uniquewords:
                    uniquewords[aword] = 1

In [30]:
wordstoinclude = set()
wordcount = 0

for aword in uniquewords:
    if uniquewords[aword] > 1:
        wordcount += 1
        wordstoinclude.add(aword)
    
print(wordcount)

128


In [31]:
edgelist = open('lightbeer.semantic.piss.gephi.csv', 'w')
csvwriter = csv.writer(edgelist)
header = ['Source', 'Target', 'Type'] #undirected network
csvwriter.writerow(header)

print("Writing Edge List")

uniquewords = {}
count = 0

for fn in os.listdir(TMPDIR):
    fn = os.path.join(TMPDIR, fn)
    with open(fn) as f:
        tweetjson = json.load(f)
            
        text = tweetjson['text']
        nourlstext = removeURL(text) #remove url
        tokenizedtext = tokenize(nourlstext) #separate text by each word
        nostopwordstext = stopWords(tokenizedtext) #remove irrelevant words
        lemmatizedtext = lemmatizer(nostopwordstext) #root words
        nopuncttext = removePunctuation(lemmatizedtext) #remove punctuation
        
        goodwords = []
        for aword in nopuncttext:
            if aword in wordstoinclude:
                goodwords.append(aword.replace(',',''))
        
        allcombos = itertools.combinations(goodwords, 2)
        for acombo in allcombos:
            row = []
            for anode in acombo:
                row.append(anode)
            row.append('Undirected')
            csvwriter.writerow(row)
            
edgelist.close()

Writing Edge List


<img src="files/Semantic Piss.jpg">

#### Top Urinary Words Per Brand

*Coors*: taste, amp, water, horse, natty, pbr, weekend, garbage

*Bud*: piss, drinking, see, get, well, bar, cold

*Miler*: make, would, people, bottle, good

*Coors* takes home the trophy for being the worst light beer. Although "piss" is slightly closer to *Bud*, *Coors* is being compared to other low quality beers and is closest to "pee". Again, *Miller* separates itself from its competitors by being thought of as higher quality. *Coors* needs to work to improve its image, while *Miller* should market itself as a premium light beer.

To no suprise, the bridging words between the three brands are "drink", "light", "lite", and "beer". This leads to no market insights, but it shows there is a fairly even distribution of Tweets amongst the three brands.