# ANALYSIS 3: LOVED ONES

In the last notebook, we established that family and friends are more distinctive to the 2019 corpus. However, we want to see whether these words appear in different context/for different reasons in the two years...

In this notebook I look more closely at the part of my topic on invocations of loved ones by:

- explore different words frequencies, and by comparing to keyness from last notebook, decide which words to use to test hypothesis 
- look at words/phrases in context in KWIC
- look at collocates around words

**See logic of this roadmap after frequency lists below.**

In [1]:
%run functions.ipynb

In [2]:
tweets_2019 = json.load(open("../data/cleaned/tweets_2019.json"))

In [3]:
tweets_2020 = json.load(open("../data/cleaned/tweets_2020.json"))

## Cleaning

To look at frequencies of words we need to put tweets into string and then tokenize

Need to make string of tweets

In [4]:
string_2019 = ''.join(tweets_2019)

In [5]:
string_2020 = ''.join(tweets_2020)

Tokenize.

In [6]:
tokens_2019 = tokenize(string_2019, lowercase = True, strip_chars = '.,!')

In [7]:
tokens_2020 = tokenize(string_2020, lowercase = True, strip_chars = '.,!')

Save freq lists.

In [8]:
word_freq_2019 = Counter(tokens_2019)
bigram_freq_2019 = Counter(get_bigram_tokens(tokens_2019))

In [9]:
word_freq_2020 = Counter(tokens_2020)
bigram_freq_2020 = Counter(get_bigram_tokens(tokens_2020))

# LOVES ONES

We will start by comparing invocations of family and other loved ones in the two corpora.

We will look at these words as describing loved ones:

- #family 
- family
- friends 
- you
- boyfriend
- girlfriend
- mother 
- father 
- dad
- mom
- brother
- sister
- son
- daughter 
- ily

And this bigram:
- for you

### We can run a filtered frequency.

In [10]:
loved_ones_words = ['#family','family','friends','you','boyfriend', 'girlfriend','mother', 'father', 'dad', 
                    'mom', 'brother', 'sister', 'son', 'daughter','ily']

In [11]:
filter_freq_list(word_freq_2019, loved_ones_words)

[('you', 6574),
 ('family', 1710),
 ('friends', 984),
 ('#family', 156),
 ('mom', 98),
 ('brother', 59),
 ('dad', 51),
 ('sister', 43),
 ('mother', 41),
 ('son', 37),
 ('father', 33),
 ('daughter', 28),
 ('boyfriend', 22),
 ('girlfriend', 18),
 ('ily', 15)]

In [12]:
bigram_freq_2019["for you"]

562

In [13]:
filter_freq_list(word_freq_2020, loved_ones_words)

[('you', 7138),
 ('family', 1413),
 ('friends', 767),
 ('mom', 97),
 ('#family', 79),
 ('brother', 72),
 ('sister', 51),
 ('son', 48),
 ('daughter', 45),
 ('father', 45),
 ('mother', 44),
 ('ily', 39),
 ('dad', 37),
 ('boyfriend', 10),
 ('girlfriend', 9)]

In [14]:
bigram_freq_2020["for you"]

750

We can look at these frequencies in the context of the length of the corpus.

In [15]:
Counter(word_freq_2019)["family"]*100/len(tokens_2019)-Counter(word_freq_2020)["family"]*100/len(tokens_2020)

0.08394140552309237

In this way, we can see that 'family' is only 0.08% more frequent as a word out of all the words for 2019 compared to 2020.

In [16]:
Counter(word_freq_2019)["friends"]*100/len(tokens_2019)-Counter(word_freq_2020)["friends"]*100/len(tokens_2020)

0.06321728133157434

In this way, we can see that 'friends' is only 0.06% more frequent as a word out of all the words for 2019 compared to 2020.

### Let's remember the distinctive words from Keyness relevant to the question of loved ones:

Notable distinctive words to 2019:

- #family 
- family
- friends

Notable distinctive words to 2020:

- you

### So, what do we know so far, given Frequencies and Keyness?

1. We can see that *family* and *friends* are the most frequent/distinctive words that signal loved ones.

2. *you* and *for you* are also very common, although we don't know that these refer to loved ones, but it is still important they are mentioning other people. We will continue to look at this in this notebook and later when we look at pronouns.

3. We can see that other members of ones family/relationship network are **not** very common. This suggests that on Twitter people are not calling these types of loved ones out by name. This could be because they just tag them.

### What do we need to do next?

**Look at these keywords we've isolated.**

A. Look at the context of these words through KWIC:
 -  family
 -  friends
 -  you
 
 
(We are not going to look at #family because hashtags are not always used in the format I am grateful for _ and can instead just be added to the end of a tweet. For the purpose of our analysis will will focus on words being used in a sentence.)


B. Collocates around the keywords we have selected

C. Keyness of these collocates to see what is distinctive for each year about invocations of these ideas.
    

# KWIC: 

### Now looking at mentions of loved ones in context.

### Family (2019)

In [17]:
kwic_family_19 = []

for tweet in tweets_2019:
    tokens_19 = tokenize(tweet, lowercase = True)
    kwic_19 = make_kwic("family", tokens_19)
    kwic_family_19.extend(kwic_19)

In [18]:
kwic_family_19_sample = random.sample(kwic_family_19, 50)

In [19]:
print_kwic(sort_kwic(kwic_family_19_sample, order=['R1']))

                coming together food and  family     
                       my friends and my  family     
                    sending you and your  family  a very happy thanksgiving
               amazing dinner with their  family  amp friends ‚ô•Ô∏è thanksgiving
                        we sit down with  family  and give thanks it
               bless president trump his  family  and all those who
        wonderful thanksgiving with your  family  and friends happythanksgiving2019 gratitude
                      am thankful for my  family  and especially that adorable
                          me and i guess  family  and peace and health
                  love from your dearest  family  and friends may god
                                          family  and the wonderful blessing
                                          family  and friends a happy
               at thanksgiving there‚Äôs a  family  bet between me and
                      enjoying time w my  family  but still hold 

### Family (2020)

In [20]:
kwic_family_20 = []

for tweet in tweets_2020:
    tokens_20 = tokenize(tweet, lowercase = True)
    kwic_20 = make_kwic("family", tokens_20)
    kwic_family_20.extend(kwic_20)

In [21]:
family_20_sample = random.sample(kwic_family_20, 50)

In [22]:
print_kwic(sort_kwic(family_20_sample , order=['R1']))

                    very grateful for my  family     
               blessings for your entire  family     
                     you and your entire  family     
                         to you and your  family     
                   they shall kill their  family  amp friends‚Äù httpstcoc2l8hnd1gy 
                          for the lord üíï  family  amp a "working" holiday
                    i‚Äôm grateful for the  family  and community support that
                thankful for our friends  family  and of course festive
                    also thankful for my  family  and my wonderful friends
                           and my job my  family  and friends im thankful
                thanksgiving to you your  family  and everyone around you
                    to have a supportive  family  and friends alot of
            forever grateful for friends  family  and the communities i
            eric amp newschannelnine our  family  are grateful wtvc9 team
             happy thanksgiving 

### Friends (2019)

In [23]:
kwic_friends_19 = []

for tweet in tweets_2019:
    tokens_19 = tokenize(tweet, lowercase = True)
    kwic_19 = make_kwic("friends", tokens_19)
    kwic_friends_19.extend(kwic_19)

In [24]:
kwic_friends_19_sample = random.sample(kwic_friends_19, 50)

In [25]:
print_kwic(sort_kwic(kwic_friends_19_sample, order=['R1']))

                       to our family and  friends  across the ocean crowsgin
        much this thanksgivingfamily and  friends  all of you who
                  really grateful for my  friends  also me a tsundere
                         all of you your  friends  amp families ü§óüß°ü¶É httpstconncnmmzws1
             our thankfulness for family  friends  and freedoms  
                                          friends  and relatives please have
                  time spent with family  friends  and rescue alumni wonka
                    blessings to all our  friends  and loved ones xoxo
            holiday surrounded by family  friends  and good food thanksgiving
                        to have fun with  friends  and family today and
                     for all the amazing  friends  and the support they
               everyday thankful for the  friends  and family that are
             of my childrenmyself family  friends  and gods many blessings
                    enjoy time wit

### Friends (2020)

In [26]:
kwic_friends_20 = []

for tweet in tweets_2020:
    tokens_20 = tokenize(tweet, lowercase = True)
    kwic_20 = make_kwic("friends", tokens_20)
    kwic_friends_20.extend(kwic_20)

In [27]:
kwic_friends_20_sample = random.sample(kwic_friends_20, 50)

In [28]:
print_kwic(sort_kwic(kwic_friends_20_sample, order=['R1']))

                  behind with my twitter  friends     
                      my children and my  friends     
                        2 family 3 close  friends  4 shelter 5 food
                        to my family and  friends  all around the globe
               a great thanksgiving with  friends  amp family may the
                   reach limit in adding  friends  and run out of
                       for my family our  friends  and our health gratitude
                           to all of our  friends  and family we are
               thankful for my beautiful  friends  and mutuals that always
                   i would make internet  friends  and here i am
                           for all of my  friends  and family thankful to
           happy thanksgiving2020 to our  friends  family colleagues and business
                taken for granted family  friends  foodcall it thanksliving 
                          of u have been  friends  for years and i
                 grateful for

### You (2019)

In [29]:
kwic_you_19 = []

for tweet in tweets_2019:
    tokens_19 = tokenize(tweet, lowercase = True)
    kwic_19 = make_kwic("you", tokens_19)
    kwic_you_19.extend(kwic_19)

In [30]:
kwic_you_19_sample = random.sample(kwic_you_19, 50)

In [31]:
print_kwic(sort_kwic(kwic_you_19_sample, order=['R1']))

                         are with all of  you     
                      back at home texts  you  about missing you on
                 gobble i‚Äôm grateful for  you  all httpstcobe6fs2gpzr  
                                          you  and yours a wonderful
         thanksgiving coach blessings to  you  and your family 
              appreciate of how thankful  you  are to your close
                  mindanao so happy that  you  are well taken care
                                          you  are having a great
                      have no doubt that  you  are making a difference
         gratitude challenge tag someone  you  are grateful for happyholidays
        texancod teddyrecks i appreciate  you  brother   
                       will come then if  you  can say a quick
                        can kill you and  you  can only throw stones
                  thanksgiving to all of  you  celebrating it üçÅ let‚Äôs
                      could all have and  you  could ever hope

### You (2020)

In [32]:
kwic_you_20 = []

for tweet in tweets_2020:
    tokens_20 = tokenize(tweet, lowercase = True)
    kwic_20 = make_kwic("you", tokens_20)
    kwic_you_20.extend(kwic_20)

In [33]:
kwic_you_20_sample = random.sample(kwic_you_20, 50)

In [34]:
print_kwic(sort_kwic(kwic_you_20_sample, order=['R1']))

              paying attention god bless  you     
               soundly tonight thanks to  you     
                   appreciate you i love  you  all üíñüíñüíñü§ûüèº  
                       love and light to  you  all blessings and grace
                        good book i wish  you  all well httpstcodlzgbqycpp 
                     i cant express what  you  all mean to me
                        should be in bed  you  always make me feel
        happy thanksgiving chris wishing  you  and yours many more
                      planned it we love  you  and we will forever
           covidnj happy thanksgiving to  you  and thanks for all
                   grateful to just know  you  and be able to
           thanksgiving and blessings to  you  and your family 
                                          you  are a kind hearted
                    life you lead maisie  you  are so very lucky
                                          you  as well as difficult
                  help 

### What does this tell us?

One thing that is clear from these KWIC is that "you" is not a good keyword as it includes both appeals to specific people and generic statements to the public like "you all" so it not exactly working towards our hypothesis.

It is difficult to compare KWIC for friends and family, so we can:

A. **look at the collocates around these words to see in what contexts they function.**  
B. **look at the collacates' keyness to see if these contexts are distinctive.**

## Keyness of Collocates Around Family and Friends.

## Family 

In [35]:
coll_family_19 = collocates(tokens_2019,"family", win=[4,4])

In [36]:
coll_family_19_freq = Counter([word for word in tokens_2019 if (word in coll_family_19)])

In [37]:
coll_family_20 = collocates(tokens_2020,"family", win=[4,4])

In [38]:
coll_family_20_freq = Counter([word for word in tokens_2020 if (word in coll_family_20)])

In [39]:
calculate_keyness(coll_family_19_freq, coll_family_20_freq, top=50)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
day                      1700      1265      51.527
full                     224       120       28.861
@                        266       158       24.634
loving                   100       41        23.705
#family                  156       79        23.418
wonderful                414       278       22.934
thanksgiving             4123      3599      21.456
friends                  984       767       20.777
family                   1710      1413      19.990
her                      350       242       16.671
season                   186       115       14.820
ü¶É                        213       141       12.642
#grateful                261       183       11.508
amazing                  366       272       11.193
spend                    134       82        11.113
grace                    63        30        10.988
part                     149       94        10.943
course                   59        28        10.371


In [40]:
calculate_keyness(coll_family_20_freq, coll_family_19_freq, top=50)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
safe                     331       120       109.186
stay                     233       83        78.778
year                     1060      720       76.077
you                      7138      6574      43.488
through                  303       182       34.294
but                      1532      1266      34.047
im                       271       166       28.766
u                        464       329       27.376
it                       1793      1567      22.848
sorry                    102       47        22.494
healthy                  109       52        22.392
has                      706       568       19.461
lot                      284       196       19.007
been                     706       574       17.939
as                       915       768       17.707
his                      632       508       17.542
tough                    54        21        16.048
during                   140       84        15.901
cr

### Collocates around family distinctive to 2019 corpus:
 - adding to our hypothesis: friends, family, #family
 - descriptions: full, loving, wonderful, amazing
 - verbs: spend, visit, work, give
 - pronouns: her, our
 - time: day, tomorrow, season
 - nouns: life, babies, home, grace


### Collocates around family distinctive to 2020 corpus:
 - negative descriptions and modifiers: through, but, sorry, tough, crazy
 - modifiers: only, much, too, other, like, same
 - command: stay
 - pronouns: you, myself, i
 - descriptions: safe
 - time: year
 - verb: help, cook, hear, thinking

# Friends

In [41]:
coll_friends_19 = collocates(tokens_2019,"friends", win=[4,4])

In [42]:
coll_friends_19_freq = Counter([word for word in tokens_2019 if (word in coll_friends_19)])

In [43]:
coll_friends_20 = collocates(tokens_2020,"friends", win=[4,4])

In [44]:
coll_friends_20_freq = Counter([word for word in tokens_2020 if (word in coll_friends_20)])

In [45]:
calculate_keyness(coll_friends_19_freq, coll_friends_20_freq, top=50)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
day                      1700      1265      41.821
@                        266       158       22.046
#family                  156       79        21.535
wonderful                414       278       19.765
friends                  984       767       16.090
her                      350       242       14.179
family                   1710      1413      13.989
season                   186       115       13.132
thanksgiving             4123      3599      12.125
friday                   73        35        11.598
spend                    134       82        9.874
#grateful                261       183       9.718
course                   59        28        9.609
he                       558       434       9.321
amazing                  366       272       9.094
babies                   25        8         8.258
opportunities            57        30        7.062
fun                      84        50        6.906
home       

In [46]:
calculate_keyness(coll_friends_20_freq, coll_friends_19_freq, top=50)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
2020                     198       36        132.818
safe                     331       120       114.968
year                     1060      720       85.995
stay                     233       83        82.887
you                      7138      6574      66.060
but                      1532      1266      42.605
through                  303       182       37.739
u                        464       329       31.363
it                       1793      1567      30.679
healthy                  109       52        23.976
has                      706       568       23.800
moots                    34        6         23.250
as                       915       768       22.514
this                     2811      2615      22.503
been                     706       574       22.125
lot                      284       196       21.585
his                      632       508       21.438
i                        5850      5684      20.996
s

We see similar dynamics around friends in the two corpora:

### Collocates around friends distinctive to 2019 corpus:

- hypothesis: #family, friends, family, babies
- verbs: spend
- positive descriptions: wonderful, amazing, fun
- nouns: home, season, course, day
- pronouns: her, he

### Collocates around friends distinctive to 2020 corpus:

- pronouns: you, i
- descriptions having to do with pandemic: safe, stay, workers, healthy
- negative modifiers: but, through, lot, other 
- verbs: helped, 
- people: neighbors, parents, colleagues, guys, followers, god


### What does this mean for our hypothesis?

Before we started the analysis in this notebook, our updated hypothesis was that health was more distinctive to 2020 and family/friends to 2020. 

We see our further analysis of these ideas in context, both in KWIC and keyness of the collocates, **support this hypothesis** because **we see that the collocates around these words are ladden with more descriptions of health in 2020 and family/friends in 2019**. 

These analyses also introduce some interesting new ideas on the comparisons between the corpora:

We see two big comparisons:

1. The tweets in 2019 are a lot more positive:
 -  2019: wonderful, amazing, incredible
 -  2020: but, help, though, despite, sorry
   
     
2. Following this, the context in which gratitude appears in 2020 seem to have a lot more modifiers which suggest that perhaps people have to couch gratitude in an acknowledgement of the troubles of the year with words like: but, through, lot, much.