## KWIC Concordance & Collocation Analyses

### In this notebook, you will find:
- Loaded corpora from JSON files of various song dictionaries 
- Detailed text analysis of lyrics, separated by section headers
- KWIC Concordances are used to help detect patterns and relationships with specific words and find meaning alongside context
- Collocation is supplementary to KWIC analysis; it is better organizes the KWIC data and works to provide context for the usage of certain words 

In [1]:
%run functions.ipynb

## Additional Modules

In [2]:
#Additional modules
import os
import pandas as pd
import re
import json
import requests
from bs4 import BeautifulSoup
import lyricsgenius
from collections import Counter
import nltk
from nltk import Text
nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
sect_stoppers = ['pre-chorus','refrain','chorus','verse','intro','outro','bridge','verse 1','verse 2','verse 3','verse 4','1','2','3','4','Tim McGraw','Faith Hill','Tim McGraw & Faith Hill']
for x in sect_stoppers:
    stop_words.append(x)
# pos tagging
from nltk import pos_tag, pos_tag_sents, FreqDist, ConditionalFreqDist

[nltk_data] Downloading package stopwords to /Commjhub/jupyterhub/comm
[nltk_data]     318_fall2019/jpasik123/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [3]:
char_to_strip = '.,!][?;$"-()'

In [4]:
all_charts = json.load(open('../data/charts/all_charts.json'))

## KWIC Concordance Analysis & Collocation Analysis

Given that the word "love" frequently appears in all corpora for this project, I've decided to start off my KWIC analysis using this term and analyzing it across the decades and genders. 

## All 1990s - "Love"

In [8]:
#create list of KWIC concordance lines of "love" from 'all_90s'
kwic_love_90s = []
for song in all_charts['all_90s']:
    if song.get('tokens'):
        kwic_rel_love90s = make_kwic('love', song['tokens'])
        kwic_love_90s.extend(kwic_rel_love90s)
        
print(f'"love" occurs {len(kwic_love_90s)} times in your lyrics')

#sort and view them
love_sorted_90s = sort_kwic(kwic_love_90s, ['L1','R1'])
print_kwic(love_sorted_90s)

"love" occurs 108 times in your lyrics
                                          love  is unconditional we knew
                     you'll never find a  love  as true as mine
                     you'll never find a  love  as true as mine
                     you'll never find a  love  as true as mine
                          true as mine a  love  as true as mine
                         and then it's a  love  without end amen it's
                         end amen it's a  love  without end amen when
                      and a little about  love  a lot about livin'
                     worn out line about  love  at first sight well
                      and a little about  love  aw haw well we
                      and a little about  love  well way down yonder
                      and a little about  love  yeaheee that's right 
                      she can't help but  love  him for the way
                         and how you can  love  like this 'cause i'm
                      

In [10]:
##love collocation 90s
love_90s_colls = Counter()
for song in range(len(all_charts['all_90s'])):
    love_90s_colls.update(collocates(tokenize(all_charts['all_90s'][song]['lyrics']),'love', [3,3]))

In [12]:
love_90s_colls.most_common(25)

[('I', 33),
 ('you', 28),
 ('the', 26),
 ('boy', 21),
 ('in', 19),
 ('(Oh,', 18),
 ('girl)', 18),
 ("It's", 15),
 ("She's", 13),
 ('with', 13),
 ('a', 13),
 ('Marie,', 12),
 ('(Maria)', 12),
 ('it,', 12),
 ('way', 11),
 ('like', 9),
 ('as', 9),
 ('your', 9),
 ('our', 8),
 ('about', 7),
 ('A', 7),
 ("Don't", 7),
 ('let', 7),
 ('start', 7),
 ("slippin'", 7)]

Above, I ran both KWIC and collocation analyses to see if the word "love" is used differently in any songs from the `all_90s` chart. Scanning through the printed results from the KWIC analysis, due to the great repetition of certain lines, it seems that "love" is recurring from the choruses of the '90s songs. "Love" is either used in the lyrics as a profession of love with the pretexts of "I" (from "I love you") or the term is used more generally, when the artists are speaking about a love story. I predict the indirect and direct uses of this term will yield similar results in the other corpora. 

## All 2010s - "Love"

In [13]:
#create list of KWIC concordance lines of "love" from 'all_2010s'
kwic_love_2010s = []
for song in all_charts['all_2010s']:
    if song.get('tokens'):
        kwic_rel_love2010s = make_kwic('love', song['tokens'])
        kwic_love_2010s.extend(kwic_rel_love2010s)
        
print(f'"love" occurs {len(kwic_love_2010s)} times in your lyrics')

#sort and view them
love_sorted_2010s = sort_kwic(kwic_love_2010s, ['L1','R1'])
print_kwic(love_sorted_2010s)

"love" occurs 19 times in your lyrics
                   well sunsets fade and  love  does too yeah we
                    you sunsets fade and  love  does too we had
                    you sunsets fade and  love  does too though we
                      and rescue me felt  love  pouring down from above
                  you thought you'd find  love  isn't all that it
                      someone you love i  love  you ain't no pickup
                      we were fallin' in  love  in the sweet heart
                          time i fell in  love  with a careless man's
                          all we need is  love  just as free free
                       love who you love  love  who you love 'cause
                        went up in smoke  love  isn't all that it
                    take for granted the  love  this life gives you
                      say what you think  love  who you love love
                      why we bother with  love  if it never lasts
                       

In [15]:
##love collocation 2010s
love_2010s_colls = Counter()
for song in range(len(all_charts['all_2010s'])):
    love_2010s_colls.update(collocates(tokenize(all_charts['all_2010s'][song]['lyrics']),'love', [3,3]))

In [16]:
love_2010s_colls.most_common(25)

[('you', 4),
 ('in', 3),
 ('fade,', 3),
 ('and', 3),
 ('does,', 3),
 ('too', 3),
 ('with', 2),
 ('who', 2),
 ('Sunsets', 2),
 ('the', 2),
 ('"I', 2),
 ('love', 2),
 ('you"', 2),
 ('I', 1),
 ('fell', 1),
 ('a', 1),
 ('careless', 1),
 ('Love', 1),
 ('(Love', 1),
 ('rescue', 1),
 ('me."', 1),
 ('Felt', 1),
 ('pouring', 1),
 ('down', 1),
 ('from', 1)]

Taking into account that the word "love" appeared significantly less in the `all_2010s` data (19 times compared to 108 in `all_90s`), I figured the context would be different as well. While in the '90s data, "love" seemed to have primarily positive connotations, in the 2010s data, "love" seems to occur in lines that reveal a more negative, dismal context. The `all_2010s` data places "love" around words and phrases that reference love fading, love not being all it is expected to be, etc. Even looking at the collocations, "love" is surrounded by words such as "fade", "rescue", and "careless". 

## All Female - "Love"

In [17]:
#create list of KWIC concordance lines of "love" from 'all_female'
kwic_love_f = []
for song in all_charts['all_female']:
    if song.get('tokens'):
        kwic_rel_love_f = make_kwic('love', song['tokens'])
        kwic_love_f.extend(kwic_rel_love_f)
        
print(f'"love" occurs {len(kwic_love_f)} times in your lyrics')

#sort and view them
love_sorted_f = sort_kwic(kwic_love_f, ['L1','R1'])
print_kwic(love_sorted_f)

"love" occurs 49 times in your lyrics
                   well sunsets fade and  love  does too yeah we
                    you sunsets fade and  love  does too we had
                    you sunsets fade and  love  does too though we
                      she can't help but  love  him for the way
                         and how you can  love  like this 'cause i'm
                        you in a country  love  song summer night beauty
                     snow white how does  love  get so off course
                      and rescue me felt  love  pouring down from above
                  you thought you'd find  love  isn't all that it
                restless summer we found  love  growing wild on the
                      like we're even in  love  it matters to me
                      like we're even in  love  it matters to me
                 tommy anywhere she's in  love  with the boy she's
                        the boy she's in  love  with the boy she's
                        

In [19]:
##love collocation all female
love_f_colls = Counter()
for song in range(len(all_charts['all_female'])):
    love_f_colls.update(collocates(tokenize(all_charts['all_female'][song]['lyrics']),'love', [3,3]))

In [20]:
love_f_colls.most_common(25)

[('the', 23),
 ('boy', 21),
 ('in', 18),
 ('with', 14),
 ('you', 14),
 ("She's", 13),
 ('way', 11),
 ('a', 6),
 ('me', 6),
 ('(The', 6),
 ("It's", 5),
 ('me,', 5),
 ('[Chorus]', 4),
 ('fade,', 3),
 ('and', 3),
 ('does,', 3),
 ('too', 3),
 ("we're", 2),
 ('even', 2),
 ('It', 2),
 ('matters', 2),
 ('to', 2),
 ('way)', 2),
 ('baby', 2),
 ('baby)', 2)]

Here, only three words really stuck out to me from the collocation analysis of `all_female` songs, and those were: "in", "with", and "you". These words stand out the most to me because they can link to the message of 'being in love' or 'falling in love' with someone.  

## All Male - "Love"

In [21]:
#create list of KWIC concordance lines of "love" from 'all_male'
kwic_love_m = []
for song in all_charts['all_male']:
    if song.get('tokens'):
        kwic_rel_love_m = make_kwic('love', song['tokens'])
        kwic_love_m.extend(kwic_rel_love_m)
        
print(f'"love" occurs {len(kwic_love_m)} times in your lyrics')

#sort and view them
love_sorted_m = sort_kwic(kwic_love_m, ['L1','R1'])
print_kwic(love_sorted_m)

"love" occurs 78 times in your lyrics
                                          love  is unconditional we knew
                     you'll never find a  love  as true as mine
                     you'll never find a  love  as true as mine
                     you'll never find a  love  as true as mine
                          true as mine a  love  as true as mine
                         and then it's a  love  without end amen it's
                         end amen it's a  love  without end amen when
                      and a little about  love  a lot about livin'
                     worn out line about  love  at first sight well
                      and a little about  love  aw haw well we
                      and a little about  love  well way down yonder
                      and a little about  love  yeaheee that's right 
                   of pretending i don't  love  you anymore let me
                 keep pretending i don't  love  you anymore i've got
                   o

In [22]:
##love collocation all male
love_m_colls = Counter()
for song in range(len(all_charts['all_male'])):
    love_m_colls.update(collocates(tokenize(all_charts['all_male'][song]['lyrics']),'love', [3,3]))

In [23]:
love_m_colls.most_common(25)

[('I', 33),
 ('(Oh,', 18),
 ('you', 18),
 ('girl)', 18),
 ('Marie,', 12),
 ('(Maria)', 12),
 ('it,', 12),
 ("It's", 10),
 ('as', 10),
 ('your', 9),
 ('a', 8),
 ('our', 8),
 ('like', 8),
 ('about', 7),
 ('A', 7),
 ("Don't", 7),
 ('let', 7),
 ('start', 7),
 ("slippin'", 7),
 ('Love', 7),
 ('you,', 6),
 ('[Verse', 6),
 ('want', 6),
 ('the', 5),
 ('My', 4)]

One difference I noticed in the `all_male` data that does not appear in the `all_female` data is that "I" prefaces "love" significantly more in songs written by male artists. Again, you can see from the concordance lines that there is a lot of repetition that stems from the word "love" appearing in core parts of the songs such as the choruses. 

For the next part of this notebook, I have decided to analyze the word "heart" in the same dictionaries examined above. 

## All 1990s - "Heart"

In [25]:
#create list of KWIC concordance lines of "heart" from 'all_90s'
kwic_heart_90s = []
for song in all_charts['all_90s']:
    if song.get('tokens'):
        kwic_rel_heart90s = make_kwic('heart', song['tokens'])
        kwic_heart_90s.extend(kwic_rel_heart90s)
        
print(f'"heart" occurs {len(kwic_heart_90s)} times in your lyrics')

#sort and view them
heart_sorted_90s = sort_kwic(kwic_heart_90s, ['L1','R1'])
print_kwic(heart_sorted_90s)

"heart" occurs 30 times in your lyrics
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  i just don't think
                    heart my achy breaky  heart  i just don't think
                    heart my achy breaky  heart  i just don't think
                    heart my achy breaky  heart  i just don't think
                      knight with a good  heart  soft touch fast horse
                       tied the knot his  heart  wasn't in it he
                        allow i cross my  heart  and promise to give
                           be i cross my  heart  and promise to give
                        to you mmmhmm my  heart  can't take the beating
                         feel it from my  heart  from here on after
           

In [26]:
##heart collocation 90s
heart_90s_colls = Counter()
for song in range(len(all_charts['all_90s'])):
    heart_90s_colls.update(collocates(tokenize(all_charts['all_90s'][song]['lyrics']),'heart', [3,3]))

In [27]:
heart_90s_colls.most_common(25)

[('my', 14),
 ('I', 8),
 ('achy', 8),
 ('breaky', 8),
 ('and', 4),
 ('the', 4),
 ('just', 4),
 ("don't", 4),
 ('He', 4),
 ('might', 4),
 ('blow', 4),
 ('to', 3),
 ('your', 3),
 ('from', 3),
 ('it', 3),
 ('a', 2),
 ('[Chorus]', 2),
 ('in', 2),
 ('cross', 2),
 ('promise', 2),
 ('right', 1),
 ('Without', 1),
 ('saying', 1),
 ('said', 1),
 ('between', 1)]

"Heart" frequently prefaced by "my"; the "achy" and "breaky" clearly stems from Billy Ray Cyrus' '90s hit, "Achy Breaky Heart". That being said, there is really any significant contextual differences in the way "heart" was used by artists in the '90s. 

## All 2010s - "Heart"

In [28]:
#create list of KWIC concordance lines of "heart" from 'all_2010s'
kwic_heart_2010s = []
for song in all_charts['all_2010s']:
    if song.get('tokens'):
        kwic_rel_heart2010s = make_kwic('heart', song['tokens'])
        kwic_heart_2010s.extend(kwic_rel_heart2010s)
        
print(f'"heart" occurs {len(kwic_heart_2010s)} times in your lyrics')

#sort and view them
heart_sorted_2010s = sort_kwic(kwic_heart_2010s, ['L1','R1'])
print_kwic(heart_sorted_2010s)

"heart" occurs 27 times in your lyrics
                   bet he'd understand a  heart  like mine daddy cried
                   bet he'd understand a  heart  like mine i'll fly
            he'd understand understand a  heart  like mine oh yes
                       gets my cold cold  heart  burnin' hotter than a
                   chances and wears her  heart  on her sleeve yeah
                   chances and wears her  heart  on her sleeve yeah
                   chances and wears her  heart  on her sleeve yeah
                       funny i broke his  heart  and i took his
                           got joy in my  heart  angels on my side
                     letting you drag my  heart  around and oh i'm
                     letting you drag my  heart  around and oh i'm
                     letting you drag my  heart  around and oh i'm
                           my name on my  heart  get you wrapped in
                         fit in broke my  heart  on the playground mm
           

In [29]:
##heart collocation 2010s
heart_2010s_colls = Counter()
for song in range(len(all_charts['all_2010s'])):
    heart_2010s_colls.update(collocates(tokenize(all_charts['all_2010s'][song]['lyrics']),'heart', [3,3]))

In [30]:
heart_2010s_colls.most_common(25)

[('my', 9),
 ('her', 6),
 ('I', 5),
 ('on', 5),
 ('you', 5),
 ('the', 4),
 ('understand', 3),
 ('a', 3),
 ('like', 3),
 ('mine', 3),
 ('drag', 3),
 ('around', 3),
 ('And,', 3),
 ('oh,', 3),
 ('know', 3),
 ('will', 3),
 ('never', 3),
 ('be', 3),
 ('And', 3),
 ('wears', 3),
 ('sleeve', 3),
 ("he'd", 2),
 ('[Verse', 2),
 ('in', 2),
 ('Understand', 1)]

Again, with the `all_2010s` chart, "my" precedes "heart" most frequently. Looking at the KWIC lines, you can see repetition in lyrics which makes "heart" stand out.

## All Female - "Heart"

In [41]:
#create list of KWIC concordance lines of "heart" from 'all_female'
kwic_heart_f = []
for song in all_charts['all_female']:
    if song.get('tokens'):
        kwic_rel_heart_f = make_kwic('heart', song['tokens'])
        kwic_heart_f.extend(kwic_rel_heart_f)
        
print(f'"heart" occurs {len(kwic_heart_f)} times in your lyrics')

#sort and view them
heart_sorted_f = sort_kwic(kwic_heart_f, ['L1','R1'])
print_kwic(heart_sorted_f)

"heart" occurs 24 times in your lyrics
                   bet he'd understand a  heart  like mine daddy cried
                   bet he'd understand a  heart  like mine i'll fly
            he'd understand understand a  heart  like mine oh yes
                       gets my cold cold  heart  burnin' hotter than a
                      knight with a good  heart  soft touch fast horse
                       funny i broke his  heart  and i took his
                           got joy in my  heart  angels on my side
                     letting you drag my  heart  around and oh i'm
                     letting you drag my  heart  around and oh i'm
                     letting you drag my  heart  around and oh i'm
                      you're my world my  heart  my soul and if
                         fit in broke my  heart  on the playground mm
                     because you left my  heart  on the floor baby
                        change i know my  heart  will never be the
              

In [42]:
##love collocation all female
heart_f_colls = Counter()
for song in range(len(all_charts['all_female'])):
    heart_f_colls.update(collocates(tokenize(all_charts['all_female'][song]['lyrics']),'heart', [3,3]))

In [43]:
heart_f_colls.most_common(25)

[('my', 10),
 ('I', 6),
 ('a', 5),
 ('mine', 4),
 ('you', 4),
 ('the', 3),
 ('understand', 3),
 ('like', 3),
 ('drag', 3),
 ('around', 3),
 ('And,', 3),
 ('oh,', 3),
 ('know', 3),
 ('will', 3),
 ('never', 3),
 ('be', 3),
 ('and', 2),
 ("he'd", 2),
 ('[Verse', 2),
 ('on', 2),
 ('right', 1),
 ('to', 1),
 ('Without', 1),
 ('saying', 1),
 ('said', 1)]

The `all_female` chart yielded similar results to those found in the `all_2010s` data analyzed above. 

## All Male - "Heart"

In [46]:
#create list of KWIC concordance lines of "heart" from 'all_male'
kwic_heart_m = []
for song in all_charts['all_male']:
    if song.get('tokens'):
        kwic_rel_heart_m = make_kwic('heart', song['tokens'])
        kwic_heart_m.extend(kwic_rel_heart_m)
        
print(f'"heart" occurs {len(kwic_heart_m)} times in your lyrics')

#sort and view them
heart_sorted_m = sort_kwic(kwic_heart_m, ['L1','R1'])
print_kwic(heart_sorted_m)

"heart" occurs 33 times in your lyrics
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  he might blow up
                    heart my achy breaky  heart  i just don't think
                    heart my achy breaky  heart  i just don't think
                    heart my achy breaky  heart  i just don't think
                    heart my achy breaky  heart  i just don't think
                   chances and wears her  heart  on her sleeve yeah
                   chances and wears her  heart  on her sleeve yeah
                   chances and wears her  heart  on her sleeve yeah
                       tied the knot his  heart  wasn't in it he
                        allow i cross my  heart  and promise to give
                           be i cross my  heart  and promise to give
                  

In [47]:
##love collocation all male
heart_m_colls = Counter()
for song in range(len(all_charts['all_male'])):
    heart_m_colls.update(collocates(tokenize(all_charts['all_male'][song]['lyrics']),'heart', [3,3]))

In [48]:
heart_m_colls.most_common(25)

[('my', 13),
 ('achy', 8),
 ('breaky', 8),
 ('I', 7),
 ('her', 6),
 ('the', 5),
 ('just', 4),
 ("don't", 4),
 ('He', 4),
 ('might', 4),
 ('blow', 4),
 ('on', 4),
 ('in', 3),
 ('it', 3),
 ('from', 3),
 ('and', 3),
 ('And', 3),
 ('wears', 3),
 ('sleeve', 3),
 ('your', 2),
 ('cross', 2),
 ('promise', 2),
 ('to', 2),
 ('knot,', 1),
 ('his', 1)]

Similar to earlier analysis, a majority of the use of "heart" stems from Billy Ray Cyrus' song called "Achy Breaky Heart."

For the last part of this notebook, I decided to analyze the word "little" in the same dictionaries examined above.

## All 1990s - "Little"

In [49]:
#create list of KWIC concordance lines of "little" from 'all_90s'
kwic_little_90s = []
for song in all_charts['all_90s']:
    if song.get('tokens'):
        kwic_rel_little90s = make_kwic('little', song['tokens'])
        kwic_little_90s.extend(kwic_rel_little90s)
        
print(f'"little" occurs {len(kwic_little_90s)} times in your lyrics')

#sort and view them
little_sorted_90s = sort_kwic(kwic_little_90s, ['L1','R1'])
print_kwic(little_sorted_90s)


"little" occurs 45 times in your lyrics
                      about livin' and a  little  about love aw haw
                      about livin' and a  little  about love well way
                      about livin' and a  little  about love a lot
                      about livin' and a  little  about love yeaheee that's
                      down mama dabbed a  little  bit of perfume on
                      sofa she'll move a  little  closer she can't get
                        asphalt we got a  little  crazy but we never
                        asphalt we got a  little  crazy but we never
                     a little reckless a  little  desperate but i think
                     two months bought a  little  diamond tonight’s the night
                   prerogative to have a  little  fun and ohohoh go
                   prerogative to have a  little  fun and ohohoh go
                   prerogative to have a  little  fun fun fun ohohoh
                       eight years old a  littl

In [51]:
##little collocation 90s
little_90s_colls = Counter()
for song in range(len(all_charts['all_90s'])):
    little_90s_colls.update(collocates(tokenize(all_charts['all_90s'][song]['lyrics']),'little', [3,3]))

In [52]:
little_90s_colls.most_common(25)

[('a', 23),
 ('good-byes', 11),
 ('My', 8),
 ('and', 6),
 ('But', 5),
 ('[Chorus', 5),
 ('the', 5),
 ('have', 4),
 ("livin'", 4),
 ('about', 4),
 ('love', 4),
 ('we', 3),
 ('to', 3),
 ('fun', 3),
 ("I'm", 3),
 ('Took', 3),
 ('dance', 3),
 ('make', 3),
 ('romance', 3),
 ('Honey', 3),
 ("it'll", 3),
 ('is', 3),
 ("'bout", 3),
 ('that', 3),
 ("gal's", 3)]

"Little" used most frequently around "goodbyes", "love", and "romance".

## All 2010s - "Little"

In [54]:
#create list of KWIC concordance lines of "little" from 'all_2010s'
kwic_little_2010s = []
for song in all_charts['all_2010s']:
    if song.get('tokens'):
        kwic_rel_little2010s = make_kwic('little', song['tokens'])
        kwic_little_2010s.extend(kwic_rel_little2010s)
        
print(f'"little" occurs {len(kwic_little_2010s)} times in your lyrics')

#sort and view them
little_sorted_2010s = sort_kwic(kwic_little_2010s, ['L1','R1'])
print_kwic(little_sorted_2010s)

"little" occurs 62 times in your lyrics
                                          little  kid in a small
                            face i got a  little  bit stronger riding in
                        it i'm getting a  little  bit stronger just a
                     bit stronger just a  little  bit stronger and i'm
                            days i get a  little  bit stronger it doesn't
                            days i get a  little  bit stronger getting along
                            days i get a  little  bit stronger and just
                     stronger and just a  little  bit stronger a little
                   little bit stronger a  little  bit a little bit
                          a little bit a  little  bit a little bit
                          a little bit a  little  bit stronger get a
                      bit stronger get a  little  bit stronger  
                         the grain and a  little  bit of money we
                      house might have a  little  di

In [55]:
##little collocation 90s
little_2010s_colls = Counter()
for song in range(len(all_charts['all_2010s'])):
    little_2010s_colls.update(collocates(tokenize(all_charts['all_2010s'][song]['lyrics']),'little', [3,3]))

In [56]:
little_2010s_colls.most_common(25)

[('a', 35),
 ('bit', 15),
 ('thing', 14),
 ('I', 14),
 ('on', 14),
 ('my', 14),
 ('stronger', 12),
 ('remember', 11),
 ('dirt', 11),
 ('Every', 10),
 ('Got', 9),
 ('every', 8),
 ('Might', 8),
 ('have', 8),
 ('A', 7),
 ('get', 6),
 ('[Chorus]', 6),
 ('the', 6),
 ('The', 5),
 ('high,', 5),
 ('sting', 5),
 ('little', 4),
 ('middle', 4),
 ('farm', 4),
 ('town', 4)]

The words most frequently used around "little" are "a" (as in "a little" X), "bit" (as in "a little bit"), and "thing" (as in "every little thing").  

## All Female - "Little"

In [59]:
#create list of KWIC concordance lines of "little" from 'all_female'
kwic_little_f = []
for song in all_charts['all_female']:
    if song.get('tokens'):
        kwic_rel_little_f = make_kwic('little', song['tokens'])
        kwic_little_f.extend(kwic_rel_little_f)
        
print(f'"little" occurs {len(kwic_little_f)} times in your lyrics')

#sort and view them
little_sorted_f = sort_kwic(kwic_little_f, ['L1','R1'])
print_kwic(little_sorted_f)

"little" occurs 64 times in your lyrics
                                          little  kid in a small
                      down mama dabbed a  little  bit of perfume on
                            face i got a  little  bit stronger riding in
                        it i'm getting a  little  bit stronger just a
                     bit stronger just a  little  bit stronger and i'm
                            days i get a  little  bit stronger it doesn't
                            days i get a  little  bit stronger getting along
                            days i get a  little  bit stronger and just
                     stronger and just a  little  bit stronger a little
                   little bit stronger a  little  bit a little bit
                          a little bit a  little  bit a little bit
                          a little bit a  little  bit stronger get a
                      bit stronger get a  little  bit stronger  
                     a little reckless a  little  

In [61]:
##little collocation all female
little_f_colls = Counter()
for song in range(len(all_charts['all_female'])):
    little_f_colls.update(collocates(tokenize(all_charts['all_female'][song]['lyrics']),'little', [3,3]))

In [62]:
little_f_colls.most_common(25)

[('a', 26),
 ('I', 16),
 ('bit', 15),
 ('thing', 14),
 ('stronger', 12),
 ('good-byes', 11),
 ('remember', 11),
 ('every', 10),
 ('My', 8),
 ('to', 6),
 ('[Chorus]', 6),
 ('the', 6),
 ('little', 6),
 ('get', 6),
 ('Every', 6),
 ('A', 6),
 ('[Chorus', 5),
 ("I'm", 5),
 ('The', 5),
 ('high,', 5),
 ('sting', 5),
 ('have', 4),
 ('Now', 4),
 ('Get', 3),
 ('fun', 3)]

Again, the words surrounding "little" most frequently are "a" (as in "a little" X), "bit" (as in "a little bit"), and "thing" (as in "every little thing"). In this `all_female` chart, we also see the reference to "goodbyes" and "remember". 

## All Male - "Little"

In [65]:
#create list of KWIC concordance lines of "little" from 'all_male'
kwic_little_m = []
for song in all_charts['all_male']:
    if song.get('tokens'):
        kwic_rel_little_m = make_kwic('little', song['tokens'])
        kwic_little_m.extend(kwic_rel_little_m)
        
print(f'"little" occurs {len(kwic_little_m)} times in your lyrics')

#sort and view them
little_sorted_m = sort_kwic(kwic_little_m, ['L1','R1'])
print_kwic(little_sorted_m)

"little" occurs 43 times in your lyrics
                      about livin' and a  little  about love aw haw
                      about livin' and a  little  about love well way
                      about livin' and a  little  about love a lot
                      about livin' and a  little  about love yeaheee that's
                         the grain and a  little  bit of money we
                      sofa she'll move a  little  closer she can't get
                        asphalt we got a  little  crazy but we never
                        asphalt we got a  little  crazy but we never
                     two months bought a  little  diamond tonight’s the night
                      house might have a  little  dirt on my boots
                     lights might have a  little  dirt on my boots
                      them tonight got a  little  dirt on my boots
                      porch might have a  little  dirt on my boots
                     lights might have a  little  dirt on 

In [66]:
##little collocation all male
little_m_colls = Counter()
for song in range(len(all_charts['all_male'])):
    little_m_colls.update(collocates(tokenize(all_charts['all_male'][song]['lyrics']),'little', [3,3]))

In [67]:
little_m_colls.most_common(25)

[('a', 32),
 ('on', 14),
 ('my', 14),
 ('dirt', 11),
 ('Got', 9),
 ('Might', 8),
 ('have', 8),
 ('and', 6),
 ('the', 5),
 ("livin'", 4),
 ('about', 4),
 ('love', 4),
 ('middle', 4),
 ('Every', 4),
 ('farm', 4),
 ('town', 4),
 ('with', 4),
 ('A', 3),
 ('dance', 3),
 ('make', 3),
 ('romance', 3),
 ('Honey', 3),
 ("it'll", 3),
 ('is', 3),
 ("'bout", 3)]

With the `all_male` chart, there is a lot more lyrical variety with the word "little". More specifically, you have words such as "dance", "dirt", "love", "farm", "town", and "romance" surrounding "little" - which is much more distinguisable than those collocates around "little" in the `all_female` corpus. 