## Data Descriptives and Keyness Analysis of Pop and Hyperpop

This notebook creates dictionaries of lyrics for both the pop albums and the hyperpop albums. It also creates counters for both groups. First, data descriptives like most common words and length of albums is analyzed. It then uses keyness analysis on the pop and hyperpop lyrics dictionaries.

### Corpus Organization

The code blocks below this one take the lyric files from the "data" folder and turn them into a dictionary of counters. First, it tokenizes the files, stripping stopwords and non alphanumeric symbols, then creates a dictionary of lyrics for each artist. Finally, it puts the pop artists' dictionaries in the pop dictionary, and the hyperpop artists' dictionaries in the hyperpop dictionary.

In [1]:
import os
%run functions.ipynb
import pandas as pd
import re
import json
import requests
import random
from bs4 import BeautifulSoup
import lyricsgenius
from collections import Counter
from nltk.corpus import stopwords

In [2]:
albums = ['When we all fall asleep where do we go','Thank U Next', 'Cuz I love you', 'Igor', 'Norman Fucking Rockwell', 'Lover', 'Future Nostalgia', 'Chromatica', 'Beauty behind the madness', 'divide', 'Charli', '1000gecs', "Apple", "Flamboyant", "Alias", "Slayyyter", "Pang", "Reflections", "Product", "GFOTYBUCKS"]
artists = ['Billie Eilish', 'Ariana Grande', 'Lizzo', 'Tyler the Creator', 'Lana Del Rey', 'Taylor Swift', 'Dua Lipa', 'Lady Gaga', 'the weeknd', 'ed sheeran', 'Charli XCX', '100 gecs', "A. G. Cook", "Dorian Electra", "Shygirl", "Slayyyter", "Caroline Polachek", "Hannah Diamond", "SOPHIE", "GFOTY"]
ez_artists = ['billie', 'ariana', 'lizzo', 'tyler', 'lana', 'taylor,', 'dua', 'gaga', 'weeknd', 'ed', 'charli', 'gecs', 'ag', 'dorian', 'shygirl', 'slayyyter', 'caroline', 'hannah', 'sophie', 'gfoty']
HYPERPOP_INDEX = 10 #Hyper pop albums start at index 10 in the list.

In [3]:
def tokenize_albums(album, hyperpop=False):
    tmp_dir = '../data/pop/' if hyperpop == False else '../data/hyperpop/'
    album_lyrics = tokenize(open(tmp_dir + album + '_lyrics_stripped.txt').read(), True, '"#“.${):?-—”!,/’~;(}\'')
    stop_words = set(stopwords.words('english'))
    filtered_lyrics = []
    for lyric in album_lyrics:
        if lyric not in stop_words:
            filtered_lyrics.append(lyric)
    return filtered_lyrics

In [4]:
def counter_creator(lyrics):
    lyrics_counter = Counter()
    lyrics_counter.update(lyrics)
    
    return lyrics_counter

In [5]:
lyrics_dict = {}
for index in range(0, len(albums)):
    if index < HYPERPOP_INDEX:
        lyrics_dict[ez_artists[index]] = tokenize_albums(albums[index])
    else:
        lyrics_dict[ez_artists[index]] = tokenize_albums(albums[index], True)

In [6]:
counter_dict = {}
for key in lyrics_dict:
    counter_dict[key] = counter_creator(lyrics_dict[key])

### Pop Data Descriptives

The code block below this one creates the pop lyrics list. It is a list of all the tokens in the pop lyrics (stripped of stopwords). It then creates a counter and prints the 10 most common words in the analyzed pop albums.

In [7]:
pop_lyrics = []
for index in range(0, HYPERPOP_INDEX):
    pop_lyrics += lyrics_dict[ez_artists[index]]
    
print(len(pop_lyrics))

pop_counter = counter_creator(pop_lyrics)
pop_counter.most_common()[0:10]

26248


[('im', 678),
 ('yeah', 586),
 ('love', 539),
 ('dont', 436),
 ('know', 396),
 ('like', 390),
 ('oh', 326),
 ('got', 265),
 ('one', 234),
 ('cause', 227)]

The results above are not surprising, most of them are "filler" words such as "like" or "oh", typical to pop music. An interesting observation is the word "love", which is the only word in the top 10 that has any sort of specific sentiment.

### Hyperpop Data Descriptives

The code block below this one creates the hyperpop lyrics list. It is a list of all the tokens in the hyperpop lyrics (stripped of stopwords). It then creates a counter and prints the 10 most common words in the analyzed hyperpop albums.

In [8]:
hyperpop_lyrics = []
for index in range(HYPERPOP_INDEX, len(albums)):
    hyperpop_lyrics += lyrics_dict[ez_artists[index]]
    
print(len(hyperpop_lyrics))

hyperpop_counter = counter_creator(hyperpop_lyrics)
hyperpop_counter.most_common()[0:10]

18247


[('like', 380),
 ('im', 362),
 ('yeah', 338),
 ('dont', 331),
 ('know', 324),
 ('get', 298),
 ('go', 236),
 ('oh', 226),
 ('want', 212),
 ('wanna', 209)]

This is very interesting in comparison to the pop music top 10. For instance, the word "love" is not here. The words "want" and "wanna" (which express the same sentiment) appear to be the most used sentiment in this list.

### General Data Descriptives

The code below creates a list of all lyrics and gives descriptives about them.

In [9]:
all_lyrics = []
all_lyrics = hyperpop_lyrics + pop_lyrics
print(len(all_lyrics))

lyrics_counter = counter_creator(all_lyrics)
lyrics_counter.most_common()[0:10]

44495


[('im', 1040),
 ('yeah', 924),
 ('like', 770),
 ('dont', 767),
 ('know', 720),
 ('love', 711),
 ('oh', 552),
 ('get', 472),
 ('got', 424),
 ('want', 414)]

### Hyperpop vs. Pop Keyness

The code below analyzes words that appear more in hyperpop than in pop.

In [10]:
calculate_keyness(hyperpop_counter, pop_counter, top=50)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
body                     142       45        94.272
get                      298       174       93.561
let                      162       63        88.478
close                    83        13        85.552
going                    74        14        69.586
hard                     83        23        61.358
drink                    58        9         60.032
feel                     182       109       54.617
big                      58        11        54.469
turn                     55        11        50.189
go                       236       174       45.430
money                    59        16        44.321
youve                    49        10        44.211
running                  60        18        41.694
done                     53        14        40.580
door                     37        5         40.578
lets                     80        41        30.955
wanna                    209       173       29.043
mak

In this keyness analysis of hyperpop vs pop, it is clear that words like "body", "drink", "money", "running", and even "tongue" are utilized more in hyperpop. The words "f * ck" and "f * cking" are also used more in hyperpop, which makes sense because hyperpop music rarely acheives radio play. The top couple of words also have a higher keyness than the pop vs. hyperpop keyness analysis, which tells us that they are used much more frequently in hyperpop than pop in comparison to words that are used more frequently in pop than hyperpop.

### Pop vs. Hyperpop Keyness

The code below analyzes words that appear more in pop than in hyperpop.

In [11]:
calculate_keyness(pop_counter, hyperpop_counter, top=50)

WORD                     Corpus A Freq.Corpus B Freq.Keyness
love                     539       172       88.831
aint                     115       18        48.035
id                       75        10        35.420
rain                     61        8         29.143
ima                      73        13        27.181
bad                      96        22        27.035
life                     98        24        25.251
girl                     121       34        25.250
break                    46        5         24.753
come                     160       53        24.374
dance                    55        8         24.361
well                     55        8         24.361
ill                      212       82        21.917
cause                    227       90        21.812
real                     86        24        18.154
im                       678       362       16.840
hope                     54        12        15.808
even                     54        12        15.808
wor

In this keyness analysis of pop vs. hyperpop, the only word that is used much more in pop than hyperpop is "love". This is interesting because love is in a sense, easily contrasted with partying culture which often emphasizes the ideal of casual sex. The word "dance" could be considered a partying word, but it is relatively tame compared to the words "drink" or "body". It is also interesting to note that boy is utilized more in hyperpop, and girl is utilized more in pop.