### First Analysis of Cleaned Data
This analysis includes frequency lists from individual candidates, collocation analysis, and KWIC concordance analysis.
The purpose of the analysis is to get an initial view of the difference between tweets about male vs female politicians.

Import necessary modules, name the sentiment intensity analyzer, name list of stopwords, and name tweet tokernizer "tt"

In [1]:
import os
import json
from nltk.tokenize import TweetTokenizer, WordPunctTokenizer
import datetime
from collections import Counter
import string
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import random

In [2]:
sid = SentimentIntensityAnalyzer()

In [3]:
stoplist=stopwords.raw('english').split('\n')

In [4]:
tt = TweetTokenizer(preserve_case=False)

In [5]:
Biden_tweets_nrt = json.load(open('move_data/corpus_data1.json'))

Klob_tweets_nrt = json.load(open('move_data/corpus_data2.json'))

Sanders_tweets_nrt = json.load(open('move_data/corpus_data3.json'))

Warren_tweets_nrt = json.load(open('move_data/corpus_data4.json'))

Name new working directory and download functions from functions folder

In [6]:
%run functions.ipynb

### Analysis of Tweets: Frequency lists
#### For all of the tweets by candidate
* Create counters for the frequency lists
* Using loop and filter make tokens out of the text of the tweets using tweet tokenizer
    * Remove punctuation from tokens
    * Update the counters with the tokens to be able to update frequency
    * Use tokens to make frequency lists of unigrams, bigrams, and trigrams from tweets

In [7]:
Biden_freq = Counter()
Biden_bigram_freq= Counter()
Biden_trigram_freq= Counter()

for tweet in Biden_tweets_nrt:
    Biden_toks = tt.tokenize(tweet['text'])
    toks_np = [t for t in Biden_toks if t not in string.punctuation + '’']
    Biden_freq.update(toks_np)
    Biden_bigrams = get_ngram_tokens(toks_np, n=2)
    Biden_bigram_freq.update(Biden_bigrams)
    Biden_trigrams = get_ngram_tokens(toks_np, n=3)
    Biden_trigram_freq.update(Biden_trigrams)

In [8]:
Biden_freq.most_common(30)

[('the', 6393),
 ('to', 4780),
 ('a', 3615),
 ('and', 3474),
 ('you', 3118),
 ('is', 3069),
 ('i', 2806),
 ('of', 2693),
 ('biden', 2574),
 ('in', 2445),
 ('for', 2415),
 ('that', 2090),
 ('it', 1749),
 ('he', 1713),
 ('not', 1405),
 ('this', 1327),
 ('are', 1304),
 ('s', 1198),
 ('be', 1182),
 ('on', 1133),
 ('bernie', 1112),
 ('have', 1049),
 ('with', 1020),
 ('@joebiden', 995),
 ('trump', 963),
 ('t', 959),
 ('but', 954),
 ('if', 947),
 ('will', 945),
 ('we', 943)]

In [9]:
Biden_bigram_freq.most_common(30)

[('in the', 494),
 ('of the', 451),
 ('joe biden', 417),
 ('is a', 309),
 ('don t', 304),
 ('it s', 290),
 ('to be', 289),
 ('vote for', 286),
 ('for the', 268),
 ('going to', 256),
 ('is the', 254),
 ('this is', 244),
 ('i m', 236),
 ('to the', 232),
 ('you are', 222),
 ('if you', 212),
 ('he is', 207),
 ('biden is', 205),
 ('on the', 203),
 ('he s', 184),
 ('need to', 164),
 ('i think', 159),
 ('that s', 158),
 ('and the', 154),
 ('is not', 152),
 ('will be', 144),
 ('have to', 143),
 ('out of', 143),
 ('have a', 141),
 ('want to', 138)]

In [10]:
Biden_trigram_freq.most_common(30)

[('i don t', 99),
 ('to vote for', 79),
 ('a lot of', 65),
 ('is going to', 58),
 ('i m not', 43),
 ('joe biden is', 42),
 ('you don t', 40),
 ('out of the', 39),
 ('it s not', 39),
 ('vote for him', 35),
 ('don t think', 34),
 ('do you think', 33),
 ('is not a', 31),
 ('this is a', 31),
 ('we need to', 31),
 ('vote for biden', 31),
 ('you need to', 30),
 ('it s a', 30),
 ('you want to', 30),
 ('for joe biden', 30),
 ('biden is a', 29),
 ("i don't think", 29),
 ('going to be', 29),
 ('biden is the', 29),
 ('you have to', 28),
 ('this is the', 28),
 ('biden', 28),
 ('the democratic party', 27),
 ('will never be', 27),
 ('yes', 27)]

These frequency lists are incredibly interesting. Biden's unigram list includes I, Bernie, and Trump which show when people are talking about Biden, it includes references to themselves and their personal opinions. Also different politicians are very common when talking about Biden as during this time was becoming more viable as a candidate and being compared to his biggest democratic primary competition and potential future competition. Some of the bigrams I found were 'vote for' and 'going to' which show people are tweeting out there support and either say what they think they or the candidate will do. The trigams that stood out to me were 'the democratic party' and the heavy feature of 'don't' in trigrams. Joe Biden is aligned with the democratic party, which people are both positive and negative about so it makes sense that it is a common things to tweet about. 'don t' is interesting because it is 'i don t', ' you don t', 'i don't think', and 'don t think' so it seems to be common to talk about Biden in a negative sense.

In [11]:
Klob_freq = Counter()
Klob_bigram_freq= Counter()
Klob_trigram_freq= Counter()

for tweet in Klob_tweets_nrt:  
    toks = tt.tokenize(tweet['text'])
    toks_np = [t for t in toks if t not in string.punctuation + '’']
    Klob_freq.update(toks_np)
    Klob_bigrams = get_ngram_tokens(toks_np, n=2)
    Klob_bigram_freq.update(Klob_bigrams)
    Klob_trigrams = get_ngram_tokens(toks_np, n=3)
    Klob_trigram_freq.update(Klob_trigrams)

In [12]:
Klob_freq.most_common(30)

[('the', 7123),
 ('to', 5428),
 ('and', 4755),
 ('klobuchar', 4417),
 ('a', 3992),
 ('i', 3487),
 ('is', 3226),
 ('of', 3113),
 ('in', 3086),
 ('you', 2898),
 ('for', 2758),
 ('that', 2038),
 ('it', 1848),
 ('biden', 1783),
 ('not', 1577),
 ('warren', 1571),
 ('@amyklobuchar', 1557),
 ('be', 1556),
 ('s', 1422),
 ('but', 1403),
 ('amy', 1378),
 ('are', 1313),
 ('this', 1256),
 ('she', 1255),
 ('on', 1244),
 ('he', 1190),
 ('bernie', 1186),
 ('buttigieg', 1184),
 ('have', 1166),
 ('out', 1163)]

In [13]:
Klob_bigram_freq.most_common(30)

[('amy klobuchar', 879),
 ('in the', 632),
 ('of the', 609),
 ('and klobuchar', 432),
 ('to be', 409),
 ('klobuchar and', 376),
 ('drop out', 369),
 ('vote for', 353),
 ('is a', 322),
 ('it s', 318),
 ('i think', 300),
 ('for the', 292),
 ('don t', 291),
 ('need to', 259),
 ('i m', 257),
 ('this is', 245),
 ('is the', 241),
 ('would be', 241),
 ('going to', 228),
 ('to the', 222),
 ('on the', 219),
 ('klobuchar is', 216),
 ('to win', 205),
 ('to get', 203),
 ('buttigieg and', 202),
 ('out of', 197),
 ('you are', 188),
 ('if you', 184),
 ('warren and', 183),
 ('but i', 176)]

In [14]:
Klob_trigram_freq.most_common(30)

[('buttigieg and klobuchar', 159),
 ('to drop out', 148),
 ('warren and klobuchar', 105),
 ('klobuchar', 97),
 ('i don t', 92),
 ('klobuchar and buttigieg', 87),
 ('out of the', 84),
 ('to vote for', 81),
 ('should not be', 81),
 ('official language of', 78),
 ('a lot of', 75),
 ('english should not', 74),
 ('says english should', 70),
 ('klobuchar says english', 66),
 ('amy klobuchar says', 65),
 ('in the race', 61),
 ('amy klobuchar is', 61),
 ('be official language', 58),
 ('of the race', 56),
 ('of u s', 56),
 ('language of u', 55),
 ('not be official', 53),
 ('one of the', 52),
 ('and amy klobuchar', 52),
 ('going to be', 51),
 ('the white house', 50),
 ('we need to', 49),
 ('would be a', 47),
 ('flip-flop amy klobuchar', 47),
 ('klobuchar and warren', 46)]

In Klobuchar's unigrams biden, bernie, warren, and buttigieg are mentioned which makes it evident that Klobuchar is often being compared to the other politicians in the Democratic primary. This is furthered in the bigrams with 'buttigieg and' and 'warren and'. The bigram 'drop out' is also super common as people are reporting that during this time Klobuchar cropped out of the race. There are lots of trigrams have to with Klobuchar saying that English should not be the official language of the US which was a controversial piece of news that happened in this time. The trigrams include 'official language of', 'a lot of', 'english should not', 'says english should', and 'klobuchar says english'. These show how much one news story can dominate tweets. 'flip-flop amy klobuchar' is also an interesting trigram as it shows a common criticism or Klobuchar.

In [15]:
Sanders_freq= Counter()
Sanders_bigram_freq= Counter()
Sanders_trigram_freq= Counter()

for tweet in Sanders_tweets_nrt:
    toks = tt.tokenize(tweet['text'])
    toks_np = [t for t in toks if t not in string.punctuation + '’']
    Sanders_freq.update(toks_np)
    Sanders_bigrams = get_ngram_tokens(toks_np, n=2)
    Sanders_bigram_freq.update(Sanders_bigrams)
    Sanders_trigrams = get_ngram_tokens(toks_np, n=3)
    Sanders_trigram_freq.update(Sanders_trigrams)

In [16]:
Sanders_freq.most_common(30)

[('the', 7118),
 ('to', 5135),
 ('bernie', 4101),
 ('a', 3844),
 ('and', 3577),
 ('is', 3385),
 ('you', 3308),
 ('i', 3055),
 ('of', 2894),
 ('for', 2670),
 ('in', 2458),
 ('that', 2339),
 ('it', 1924),
 ('he', 1623),
 ('are', 1585),
 ('not', 1533),
 ('this', 1338),
 ('be', 1231),
 ('s', 1199),
 ('have', 1166),
 ('but', 1123),
 ('on', 1101),
 ('will', 1098),
 ('t', 1069),
 ('we', 1067),
 ('they', 1062),
 ('if', 1051),
 ('trump', 1020),
 ('with', 997),
 ('people', 887)]

In [17]:
Sanders_bigram_freq.most_common(30)

[('in the', 549),
 ('of the', 487),
 ('bernie sanders', 382),
 ('bernie is', 374),
 ('vote for', 368),
 ('is a', 355),
 ('don t', 322),
 ('is the', 317),
 ('it s', 316),
 ('to be', 316),
 ('for bernie', 307),
 ('for the', 281),
 ('going to', 272),
 ('i m', 265),
 ('this is', 243),
 ('if you', 239),
 ('to the', 228),
 ('on the', 225),
 ('you are', 218),
 ('that s', 187),
 ('i think', 180),
 ('will be', 177),
 ('the same', 175),
 ('have to', 170),
 ('he s', 170),
 ('he is', 168),
 ('want to', 165),
 ('need to', 162),
 ('is not', 161),
 ('to vote', 160)]

In [18]:
Sanders_trigram_freq.most_common(30)

[('vote for bernie', 107),
 ('i don t', 98),
 ('to vote for', 92),
 ('a lot of', 81),
 ('bernie is the', 65),
 ('bernie is a', 65),
 ('i m not', 62),
 ('is going to', 52),
 ('bernie sanders is', 49),
 ('you don t', 46),
 ('vote for him', 45),
 ('you want to', 42),
 ('we need to', 41),
 ('medicare for all', 40),
 ('the democratic party', 39),
 ('thank you for', 39),
 ('it s not', 38),
 ('this is a', 38),
 ('bernie', 38),
 ('is the only', 37),
 ('is the nominee', 36),
 ('be able to', 35),
 ('is not a', 34),
 ('@berniesanders', 34),
 ('are going to', 34),
 ('do you think', 31),
 ('one of the', 31),
 ('the fact that', 31),
 ('he is a', 30),
 ('will vote for', 30)]

In the Sanders unigrams the only politician that is mentioned is 'trump' as Sanders was seen as a front runner during this time and potentially Trump's biggest competition. 'People' is also common which is evident of Bernie's message of  being for the  common American. The bigram 'the same' is shows that people are drawing similarities between Bernie and probably other politicians. Trigrams 'medicare for all' and 'is the nominee' show what tweets about bernie talk about his policies and his future in the race.

In [19]:
Warren_freq = Counter()
Warren_bigram_freq= Counter()
Warren_trigram_freq= Counter()

for tweet in Warren_tweets_nrt:
    toks = tt.tokenize(tweet['text'])
    toks_np = [t for t in toks if t not in string.punctuation + '’']
    Warren_freq.update(toks_np)
    Warren_bigrams = get_ngram_tokens(toks_np, n=2)
    Warren_bigram_freq.update(Warren_bigrams)
    Warren_trigrams = get_ngram_tokens(toks_np, n=3)
    Warren_trigram_freq.update(Warren_trigrams)

In [20]:
Warren_freq.most_common(30)

[('the', 7073),
 ('to', 5635),
 ('warren', 5126),
 ('and', 4531),
 ('a', 4445),
 ('i', 4385),
 ('is', 3593),
 ('for', 3068),
 ('of', 3022),
 ('in', 2886),
 ('you', 2778),
 ('that', 2302),
 ('it', 1921),
 ('bernie', 1729),
 ('she', 1724),
 ('not', 1543),
 ('but', 1491),
 ('her', 1455),
 ('s', 1393),
 ('this', 1371),
 ('on', 1343),
 ('be', 1333),
 ('are', 1307),
 ('@ewarren', 1266),
 ('with', 1224),
 ('have', 1216),
 ('if', 1129),
 ('he', 1100),
 ('as', 1069),
 ('t', 1003)]

In [21]:
Warren_bigram_freq.most_common(30)

[('in the', 667),
 ('elizabeth warren', 616),
 ('of the', 496),
 ('warren is', 422),
 ('i m', 367),
 ('is a', 357),
 ('vote for', 357),
 ('to be', 341),
 ('is the', 315),
 ('don t', 312),
 ('this is', 303),
 ('it s', 298),
 ('i think', 272),
 ('for the', 266),
 ('warren and', 263),
 ('on the', 256),
 ('for warren', 242),
 ('to the', 239),
 ('going to', 236),
 ('but i', 233),
 ('would be', 223),
 ('and warren', 217),
 ('and i', 210),
 ('she s', 201),
 ('i am', 197),
 ('drop out', 190),
 ('she is', 184),
 ('that s', 179),
 ('you are', 175),
 ('need to', 173)]

In [22]:
Warren_trigram_freq.most_common(30)

[('i don t', 121),
 ('to vote for', 93),
 ('a lot of', 81),
 ('in the race', 78),
 ('warren', 69),
 ('elizabeth warren is', 67),
 ('to drop out', 58),
 ('i voted for', 57),
 ('a warren supporter', 56),
 ('warren is the', 53),
 ('out of the', 52),
 ('for elizabeth warren', 50),
 ('is going to', 49),
 ('i like warren', 48),
 ('warren is a', 48),
 ('one of the', 46),
 ('at this point', 44),
 ('vote for warren', 43),
 ('i m not', 42),
 ('we need to', 41),
 ('vote for her', 40),
 ('this is a', 39),
 ('this is the', 39),
 ('thank you for', 39),
 ('warren and sanders', 37),
 ('in the primary', 37),
 ('it would be', 36),
 ('will vote for', 36),
 ('to be a', 35),
 ('don t think', 35)]

The politician mentioned in Warren's unigrams 'bernie' which makes sense as she is often compared to the other progressive in the race which is Bernie Sanders. Two bigrams that are of particular interest are 'need to' and 'drop out'. 'need to' is  interesting because it shows what tweeter believes is necessary to know about Warren. At the end of the time period Warren dropped out of the race. The trigram 'out of the' shows how warren is being compared to field a lot and 'at this point' probably refers to what twitter believes that Warren should do 'in the race', which is also a popular trigram.

#### Thoughts on the frequency lists overall
Based on the frequency lists, interestingly the women list's have much more frequent mentions of the other candidates, making it seem like the female candidates are being compared more to their competitors than the male candidates. I is also common for all the candidates showing that people are putting out their own personal opinions of the candidates. The male candidates also have Trump in their most common unigrams, showing that people may see male candidates as more formidable opponents to trump and their for comparing them more, or people showing their support for Trump instead of the male candidates. People also seem to be predicting the future for the campaigns which was totally up int he air at this point as the bigram 'going to' is common for all the mentions of the candidates. With twitter people also express who they support as the bigram 'vote for' and the trigram 'to vote for' are common for slal the candidates.

#### Calculating keyness
I am using the calculate keyness function to be able to compare the words of the corpus of the tweets about one candidate to the rest of the twets about the other candidates.

In [23]:
calculate_keyness(Biden_freq, Klob_freq + Sanders_freq + Warren_freq)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
biden                    2574      3227      1338.018
@joebiden                995       659       1071.364
joe                      888       594       948.398
obama                    269       255       204.332
hunter                   80        5         202.323
he                       1713      3913      183.319
his                      868       1891      113.785
biden's                  110       84        104.605
#biden2020               58        19        96.098
ad                       88        72        78.298
ukraine                  48        19        72.516
you                      3118      8984      66.134
son                      69        53        65.258
south                    132       174       63.187
gun                      56        35        63.144
#biden                   68        68        48.467
guns                     42        26        47.709
war                      69        75        

Two of the categories of words that stand out that are unique to Biden for me are words related to Biden's current and past policy positions and personal issues with Biden. Words to do with Biden's policy positions are gun, guns, war, children, and cages, which he clearly puts more focus on than other democrats in the race. As well as things about Biden's personality and his family are hunter and ukraine, which have to do with cristism of Biden's son's dealing in Ukraine. Issues with Biden's personality which are creepy and dementia are some of the most common criticisms about the politician. 'south' and 'carolina' is important to Biden a this time in the race because that is where he stood out against other moderate candidates.

In [24]:
calculate_keyness(Klob_freq, Biden_freq + Sanders_freq + Warren_freq)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
klobuchar                4417      381       9127.046
@amyklobuchar            1557      248       2782.359
amy                      1378      208       2501.373
buttigieg                1184      425       1505.740
steyer                   568       200       730.296
bloomberg                871       867       414.851
@petebuttigieg           497       329       396.287
language                 191       20        379.684
english                  170       12        363.571
minnesota                140       13        285.254
drop                     412       333       262.860
gabbard                  208       83        248.238
#klobuchar               117       9         246.720
harris                   292       192       234.402
pete                     569       693       187.578
@tomsteyer               181       102       167.596
official                 112       35        153.529
mn                       93       

A lot of the words most frequent for Klobuchar are about other politicians and in comparison to the other candidates buttigieg, steyer, bloomberg, gabbard, harris, ambramsn booker, beto, and gillibrand. These are being used much more for klobuchar than the other politicians in the list which tells me that klouchar is being seen in the perspective of how she is in relation to others some of which are out of the race or were never even in the race.

In [25]:
calculate_keyness(Sanders_freq, Klob_freq + Biden_freq + Warren_freq)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
bernie                   4101      4027      2736.412
@berniesanders           832       874       507.758
socialism                170       110       174.590
communist                118       74        124.359
bernie's                 166       166       108.200
socialist                177       186       107.969
free                     192       214       107.703
healthcare               156       165       94.310
you                      3308      8794      88.911
people                   887       1944      87.136
#bernie2020              111       97        85.153
he                       1623      4003      81.016
are                      1585      3924      77.052
#bernie                  87        75        67.762
#notmeus                 87        79        63.870
the                      7118      20589     63.845
bros                     85        79        60.706
bro                      80        71        60

Bernie has a lot of policy and ideology words that are used more for him than others. Including socialism, free, communist, healthcare, establishment, college, capitalism, economy, wage, billionaires, income, #notmeus, venezuela, and cuba. This shows that when people are talking about Bernie they are seeing his policies that are specific to him and talking about certain things that may be universal to democrats like the economy and healthcare more for him than others.

In [26]:
calculate_keyness(Warren_freq, Klob_freq + Sanders_freq + Biden_freq)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
warren                   5126      2161      6024.477
@ewarren                 1266      659       1279.937
elizabeth                697       203       1006.535
her                      1455      1463      707.178
she                      1724      1918      707.000
#warren2020              125       21        223.015
warren's                 143       39        212.883
i                        4385      9348      185.745
progressive              300       264       176.922
native                   65        10        119.033
liz                      128       79        111.964
@senwarren               124       85        97.959
pac                      90        66        66.197
progressives             81        60        58.921
plan                     163       202       54.231
plans                    120       129       52.264
#warren                  72        54        51.654
she's                    191       260   

Warren had words that she has tried to associate with herself like progressive, woman, fight, plan, and policy. These are all words that she sued during the campaign as positive about her as two of her slogans were, "dream big, fight hard" and "she has a plan for that." There are also words that have been put on her negatively like native and pocahontas which was the result of others commenting on her claimed heritage.

#### Thoughts on concordance analysis
The words found from the concordance analysis are incredibly interesting and show a lot about what made each of the candidates distinct to voters on twitter and shows the messages of the candidates at work. It is easy to show from the keyness what people most associate with specific candidates on twitter.

#### KWIC (Key Word in Context) analysis
I want to see how certain key words are used for the different politicians. These words are found from the frequency lists and usually words are found in more that one list for different politicians.

In [27]:
Biden_kwic_I = []

for tweet in Biden_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Biden_kwic_I.extend(make_kwic('i', tokens))

In [28]:
Biden_I_sample = random.sample(Biden_kwic_I, 50)

In [29]:
print_kwic(sort_kwic(Biden_I_sample, order=['R1']))

                         it , probably .  i  ' not sure 
                            we ( you and  i  ) don't need them
                             " so what "  i  am certain that is
                                          i  am prepared to make
                             fun . btw ,  i  am for biden or
               govt running healthcare .  i  believe that you are
                       voices be heard .  i  called my senators multiple
                        be this dense so  i  can only assume this
                                          i  can afford . no
                                          i  can ’ t stop
                                          i  can think of is
                                          i  do think a lot
                   politician here too .  i  don ’ t need
                    bernie doesn't - and  i  don't think those folks
                                          i  don't see how that
                 biden would approve but  i  dont like

In [30]:
Klob_kwic_I = []

for tweet in Klob_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Klob_kwic_I.extend(make_kwic('i', tokens))

In [31]:
Klob_I_sample = random.sample(Klob_kwic_I, 50)

In [32]:
print_kwic(sort_kwic(Klob_I_sample, order=['R1']))

                                          i  am a born-again christian
                                          i  am just a messenger
                                          i  assume this is sarcasm
he did . https://www.politifact.com/factchecks/2017/feb/07/reince-priebus/were-7-nations-identified-donald-trumps-travel-ban/  i  can site my sources
                 wins without cheating .  i  could never vote for
                                          i  don ’ t know
                         america . and ,  i  don ’ t want
                                          i  don ’ t see
                       on the ticket and  i  don't think that combo
                                          i  dont think i've ever
                                          i  encourage you to have
                            oh man , can  i  ever ! ! 
                                          i  fear bernie alienates needed
                    candidate than amy .  i  feel amy should ’
              

In [33]:
Sanders_kwic_I = []

for tweet in Sanders_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Sanders_kwic_I.extend(make_kwic('i', tokens))

In [34]:
Sanders_I_sample = random.sample(Sanders_kwic_I, 50)

In [35]:
print_kwic(sort_kwic(Sanders_I_sample, order=['R1']))

                           . toxic . and  i  agree in principle with
                            ’ m not sure  i  believe this  
               govt running healthcare .  i  believe that you are
                 am very diligent before  i  choose . i want
                                          i  do own a framed
                      enough . however ,  i  don ’ t have
                alot of conspiracies but  i  don ’ t see
                          war in yemen .  i  don ’ t think
                                          i  don't know how this
                    bernie doesn't - and  i  don't think those folks
                          to live . also  i  don't care for fox
                           bit of both ,  i  dont typically hone in
                                          i  guess putin's strategy has
                                          i  had to block about
                          it ’ s because  i  have a low tolerance
                      remind us all that 

In [36]:
Warren_kwic_I = []

for tweet in Warren_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Warren_kwic_I.extend(make_kwic('i', tokens))

In [37]:
Warren_I_sample = random.sample(Warren_kwic_I, 50)

In [38]:
print_kwic(sort_kwic(Warren_I_sample, order=['R1']))

                       , however , today  i  : • paid bills
              initial positive comment .  i  acknowledge failures of both
                                          i  am voting for @ewarren
                                          i  am a @ewarren democrat
                   see . @maggieullman &  i  are honored to support
                                          i  believe we had so
                                          i  cannot stop watching this
                                          i  disagree , i was
                      and his policies .  i  don ’ t give
                                          i  don ’ t know
                                          i  don't do gulags ..
                      i'm middle aged so  i  earned that . 
                then everyone but warren  i  feel   
                                          i  find common ground with
                                          i  guess i missed something
                                  

I think that the use of the word I shows how people use twitter to give their personal opinions about the candidates including I don't, I like, I hope, I think, and I would. It is so common because people like to talk about how politics effects them. Talking about hope in terms of policies, thing they would like to see, and thinks they think about Biden, Klobuchar, Warren, and Sanders. Some of them are very positive like I just voted for a candidate or i just caucused for a candidate, showing support during the primary elections. Some common themes are people tweeting and trying to understand politicians, agreeing and disagreeing with politicians, and what they would do if they were that politician.

In [39]:
Biden_kwic_Trump = []

for tweet in Biden_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Biden_kwic_Trump.extend(make_kwic('trump', tokens))

In [40]:
Biden_Trump_sample = random.sample(Biden_kwic_Trump, 50)

In [41]:
print_kwic(sort_kwic(Biden_Trump_sample, order=['L1']))

                                          trump  2.0 .  
                                          trump  can you charge biden
                       the green stuff -  trump  is good for their
                        for his sister .  trump  lies . trump is
                       money from govt .  trump  family would have been
                        bernie in 2016 .  trump  reached out for help
                          4 more years .  trump  / pence 2020 
                    vote for sanders ...  trump  wins and the nra
                have zero chance against  trump  , right ? now
              performs the worst against  trump  . i am a
                            we ? you and  trump  ?   
                          and orgs . and  trump  ’ s tweets .
                       ad damn .. anyway  trump  2020   
                      left can only beat  trump  with a centrist candidate
                     biden will not beat  trump  . at all 
                   months needed to beat  

In [42]:
Klob_kwic_Trump = []

for tweet in Klob_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Klob_kwic_Trump.extend(make_kwic('trump', tokens))

In [43]:
Klob_Trump_sample = random.sample(Klob_kwic_Trump, 50)

In [44]:
print_kwic(sort_kwic(Klob_Trump_sample, order=['L1']))

                                          trump  calls for buttigieg and
                                          trump  won michigan by 10,000-
                      tops trump 47-43 %  trump  is underwater at 42
                          sanders 49 % ,  trump  42 % bloomberg 48
                          the thing is ,  trump  is going to play
                        bloomberg 48 % ,  trump  40 % warren 46
                            biden 52 % -  trump  44 % ( +
                        is running for .  trump  will eat him alive
                    rigged against you .  trump  claimed that the election
                     / buttigieg — 47/42  trump  / klobuchar — 46/41
                 trump / buttigieg 47/42  trump  / klobuchar 46/41 https://twitter.com/Avenger2Toxic/status/1228796178336276480
                campaign against him and  trump  both .  
                          if you win and  trump  refuses to leave the
                    mike will be another  trump  ( but with no

In [45]:
Sanders_kwic_Trump = []

for tweet in Sanders_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Sanders_kwic_Trump.extend(make_kwic('trump', tokens))

In [46]:
Sanders_Trump_sample = random.sample(Sanders_kwic_Trump, 50)

In [47]:
print_kwic(sort_kwic(Sanders_Trump_sample, order=['L1']))

                                          trump  will be the death
                                          trump  loves bernie . don
                                          trump  ?   
                                          trump  is betting on a
                                          trump  ’ s not scared
                                          trump  2020 kag  
                                          trump  has a ___ is
                                          trump  didnt say coronovirus was
                                          trump  gets over 10k in
                                          trump  are 2 sides of
                         anybody , but ,  trump  ... even bernie sanders
                    can vote for against  trump  and for bernie in
                      ’ m voting against  trump  . bernie ’ s
                    s between bernie and  trump  .   
                 they should be arrested  trump  ( most ) supporters
                            is

In [48]:
Warren_kwic_Trump = []

for tweet in Warren_tweets_nrt:
    tokens = tt.tokenize(tweet['text'])
    Warren_kwic_Trump.extend(make_kwic('trump', tokens))

In [49]:
Warren_Trump_sample = random.sample(Warren_kwic_Trump, 50)

In [50]:
print_kwic(sort_kwic(Warren_Trump_sample, order=['L1']))

                                          trump  is also an idiot
                                          trump  changed parties ? 
                                          trump  would be elected either
                                          trump  and cover the spread
                                          trump  and can ’ t
                                          trump  has already taken care
                                          trump  is a better alternative
                           warren 46 % ,  trump  43 % buttigieg 45
                           warren 46 % ,  trump  43 % buttigieg 45
                          hillary 90 % ,  trump  10 % . but
                         . biden sucks .  trump  sucks . all other
                       pundits not get ?  trump  feared hillary , he
                she couldn't win against  trump  ' ' ! trump
                     need to run against  trump  . not another person
                      the sand and allow  trump  to go u

The use of trump includes many comparisons to Trump for all candidates and lots of horse race poll numbers, that have a certain primary candidate at a certain poling level against trump. There are lots of use of defeat trump and beat trump, as it is a common goal for the democratic candidates, and many believe that their chosen candidate is the best for the job. Some are pro-trump tweets which believe that a candidate will not beat him and is no competition or our current president.