# Song Lyrics Analysis using spaCy
## Luiza Filip (s5183685)


### Introduction

### Dataset: Sleep Token Lyrics (2019–2025)

This project uses a small corpus of lyrics from six Sleep Token songs. The first three tracks, Blood Sports, Dangerous, and Dark Signs, come from the band’s debut-era releases in 2019, while the latter three, Ghestomane, Look at Windward, and Sugar, are drawn from their most recent album released in 2025. All lyrics are in English and the goal of this dataset is to explore how Sleep Token’s lyrical style may have shifted over time, as well as to identify distinctive linguistic patterns across songs. The corpus of lyrics was taken from the website GENIUS

### Possible Research Questions

Because the dataset reflects two different creative periods in the band’s discography, it opens several potential avenues for linguistic or stylistic investigation. A research question that could be explored (especially with a larger corpus) includes:

- How has Sleep Token’s lyrical complexity evolved between 2019 and 2025?



### Installing, Importing and Preprocessing

In [101]:
import spacy

# Install English language model
!spacy download en_core_web_sm
# Import os to upload documents and metadata
import os

# Load spaCy visualizer
from spacy import displacy                 #
from IPython.display import display, HTML
import spacy

# Import pandas DataFrame packages
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

# Import graphing package
import plotly.express as px

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     - -------------------------------------- 0.5/12.8 MB 4.2 MB/s eta 0:00:03
     --- ------------------------------------ 1.0/12.8 MB 2.8 MB/s eta 0:00:05
     ----- ---------------------------------- 1.8/12.8 MB 3.0 MB/s eta 0:00:04
     -------- ------------------------------- 2.6/12.8 MB 3.1 MB/s eta 0:00:04
     ---------- ----------------------------- 3.4/12.8 MB 3.2 MB/s eta 0:00:03
     ------------ --------------------------- 3.9/12.8 MB 3.2 MB/s eta 0:00:03
     ------------- -------------------------- 4.5/12.8 MB 3.1 MB/s eta 0:00:03
     ------------------ --------------------- 6.0/12.8 MB 3.5 MB/s eta 0:00:02
     ---------------------- ----------------- 7.3/12.8 MB 3.8 MB/s eta 0:00:02
     --------------------------- --------

In [102]:
# Create empty lists for file names and contents
texts = []
file_names = []

# Iterate through each file in the folder
for _file_name in os.listdir('txt_files_ST'):
# Look for only text files
    if _file_name.endswith('.txt'):
        texts.append(open('txt_files_ST' + '/' + _file_name, 'r', encoding='utf-8').read())
        file_names.append(_file_name)

In [103]:
# Create dictionary object associating each file name with its text
d = {'Filename':file_names,'Text':texts}

In [104]:
# Turn dictionary into a dataframe
paper_df = pd.DataFrame(d)
paper_df.head(6)

Unnamed: 0,Filename,Text
0,Song1.txt,\nI wanna roll the numbers\nI wanna feel my st...
1,Song2.txt,"\nWhere I was raised, there was no streetlight..."
2,Song3.txt,\nAnd you play a twisted little game\nBut I kn...
3,Song4.txt,\nI wish I could have known that\nLook in your...
4,Song5.txt,"\nI wanted you to know, I've learned to live w..."
5,Song6.txt,\nWill you listen just as my form starts to fi...


In [105]:
# Remove extra spaces from the files
paper_df['Text'] = paper_df['Text'].str.replace(r'\s+', ' ', regex=True).str.strip()
paper_df.head(6)

Unnamed: 0,Filename,Text
0,Song1.txt,I wanna roll the numbers I wanna feel my stars...
1,Song2.txt,"Where I was raised, there was no streetlights ..."
2,Song3.txt,And you play a twisted little game But I know ...
3,Song4.txt,I wish I could have known that Look in your ey...
4,Song5.txt,"I wanted you to know, I've learned to live wit..."
5,Song6.txt,Will you listen just as my form starts to fiss...


In [106]:
# Load metadata.
metadata_df = pd.read_csv('metadata_ST.csv')
metadata_df.head(6)

Unnamed: 0,FILENAME,TITLE,ALBUM,ARTIST,YEAR
0,Song1,Blood Sports,Sundowning,Sleep Token,2019
1,Song2,Dark Signs,Sundowning,Sleep Token,2019
2,Song3,Sugar,Sundowning,Sleep Token,2019
3,Song4,Dangerous,Even in Arcadia,Sleep Token,2025
4,Song5,Ghestomane,Even in Arcadia,Sleep Token,2025
5,Song6,Look at Windward,Even in Arcadia,Sleep Token,2025


In [107]:
# Remove .txt from title of each paper
paper_df['Filename'] = paper_df['Filename'].str.replace('.txt', '', regex=True)

# Rename column from paper ID to Title
metadata_df.rename(columns={"FILENAME": "Filename"}, inplace=True)

# Merge metadata and papers into new DataFrame
final_paper_df = metadata_df.merge(paper_df,on='Filename')
final_paper_df.head(6)

Unnamed: 0,Filename,TITLE,ALBUM,ARTIST,YEAR,Text
0,Song1,Blood Sports,Sundowning,Sleep Token,2019,I wanna roll the numbers I wanna feel my stars...
1,Song2,Dark Signs,Sundowning,Sleep Token,2019,"Where I was raised, there was no streetlights ..."
2,Song3,Sugar,Sundowning,Sleep Token,2019,And you play a twisted little game But I know ...
3,Song4,Dangerous,Even in Arcadia,Sleep Token,2025,I wish I could have known that Look in your ey...
4,Song5,Ghestomane,Even in Arcadia,Sleep Token,2025,"I wanted you to know, I've learned to live wit..."
5,Song6,Look at Windward,Even in Arcadia,Sleep Token,2025,Will you listen just as my form starts to fiss...


In [108]:
# Load nlp pipeline
nlp = spacy.load('en_core_web_sm')
# Check what functions it performs
print(nlp.pipe_names)

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']


In [109]:
# Define a function that runs the nlp pipeline on any given input text
def process_text(text):
    return nlp(text)

In [110]:
# Apply the function to the "Text" column, so that the nlp pipeline is called on each song lyrics
final_paper_df['Doc'] = final_paper_df['Text'].apply(process_text)

### Text Reduction

#### Tokenization

A critical first step spaCy performs is tokenization, or the segmentation of strings into individual words and punctuation markers. Tokenization enables spaCy to parse the grammatical structures of a text and identify characteristics of each word-like part-of-speech.

To retrieve a tokenized version of each text in the DataFrame, we’ll write a function that iterates through any given Doc object and returns all functions found within it.

In [111]:
# Define a function to retrieve tokens from a doc object
def get_token(doc):
    return [(token.text) for token in doc]

In [112]:
# Run the token retrieval function on the doc objects in the dataframe
final_paper_df['Tokens'] = final_paper_df['Doc'].apply(get_token)
final_paper_df.head(6)

Unnamed: 0,Filename,TITLE,ALBUM,ARTIST,YEAR,Text,Doc,Tokens
0,Song1,Blood Sports,Sundowning,Sleep Token,2019,I wanna roll the numbers I wanna feel my stars...,"(I, wanna, roll, the, numbers, I, wanna, feel,...","[I, wanna, roll, the, numbers, I, wanna, feel,..."
1,Song2,Dark Signs,Sundowning,Sleep Token,2019,"Where I was raised, there was no streetlights ...","(Where, I, was, raised, ,, there, was, no, str...","[Where, I, was, raised, ,, there, was, no, str..."
2,Song3,Sugar,Sundowning,Sleep Token,2019,And you play a twisted little game But I know ...,"(And, you, play, a, twisted, little, game, But...","[And, you, play, a, twisted, little, game, But..."
3,Song4,Dangerous,Even in Arcadia,Sleep Token,2025,I wish I could have known that Look in your ey...,"(I, wish, I, could, have, known, that, Look, i...","[I, wish, I, could, have, known, that, Look, i..."
4,Song5,Ghestomane,Even in Arcadia,Sleep Token,2025,"I wanted you to know, I've learned to live wit...","(I, wanted, you, to, know, ,, I, 've, learned,...","[I, wanted, you, to, know, ,, I, 've, learned,..."
5,Song6,Look at Windward,Even in Arcadia,Sleep Token,2025,Will you listen just as my form starts to fiss...,"(Will, you, listen, just, as, my, form, starts...","[Will, you, listen, just, as, my, form, starts..."


In [113]:
tokens = final_paper_df[['Text', 'Tokens']].copy()
tokens.head(6)

Unnamed: 0,Text,Tokens
0,I wanna roll the numbers I wanna feel my stars...,"[I, wanna, roll, the, numbers, I, wanna, feel,..."
1,"Where I was raised, there was no streetlights ...","[Where, I, was, raised, ,, there, was, no, str..."
2,And you play a twisted little game But I know ...,"[And, you, play, a, twisted, little, game, But..."
3,I wish I could have known that Look in your ey...,"[I, wish, I, could, have, known, that, Look, i..."
4,"I wanted you to know, I've learned to live wit...","[I, wanted, you, to, know, ,, I, 've, learned,..."
5,Will you listen just as my form starts to fiss...,"[Will, you, listen, just, as, my, form, starts..."


#### Lemmatization
Another process performed by spaCy is lemmatization, or the retrieval of the dictionary root word of each word (for example “brighten” for “brightening”). We’ll perform a similar set of steps to those above to create a function to call the lemmas from the Doc object, then apply it to the DataFrame.

In [114]:
# Define a function to retrieve lemmas from a doc object
def get_lemma(doc):
    return [(token.lemma_) for token in doc]

# Run the lemma retrieval function on the doc objects in the dataframe
final_paper_df['Lemmas'] = final_paper_df['Doc'].apply(get_lemma)

Lemmatization can help reduce noise and refine results for researchers who are conducting keyword searches. For example, let’s compare counts of the word “know” in the original Tokens column and in the lemmatized Lemmas column.

In [115]:
print(f'"know" appears in the text tokens column ' + str(final_paper_df['Tokens'].apply(lambda x: x.count('know')).sum()) + ' times.')
print(f'"know" appears in the lemmas column ' + str(final_paper_df['Lemmas'].apply(lambda x: x.count('know')).sum()) + ' times.')

"know" appears in the text tokens column 13 times.
"know" appears in the lemmas column 15 times.


As expected, there are more instances of “know” in the Lemmas column, as the lemmatization process has grouped inflected word forms (knew, known, etc.) into the base word “know”.

### Text Annotation

#### Part of Speech Tagging

spaCy facilitates two levels of part-of-speech tagging: coarse-grained tagging, which predicts the simple universal part-of-speech of each token in a text (such as noun, verb, adjective, adverb), and detailed tagging, which uses a larger, more fine-grained set of part-of-speech tags (for example 3rd person singular present verb). The part-of-speech tags used are determined by the English language model we use. In this case, we’re using the small English model, and you can explore the differences between the models on spaCy’s website.

We can call the part-of-speech tags in the same way as the lemmas. Create a function to extract them from any given Doc object and apply the function to each Doc object in the DataFrame. The function we’ll create will extract both the coarse- and fine-grained part-of-speech for each token (token.pos_ and token.tag_, respectively).

In [116]:
# Define a function to retrieve lemmas from a doc object
def get_pos(doc):
    #Return the coarse- and fine-grained part of speech text for each token in the doc
    return [(token.pos_, token.tag_) for token in doc]

# Define a function to retrieve parts of speech from a doc object
final_paper_df['POS'] = final_paper_df['Doc'].apply(get_pos)

In [117]:
# Create a list of part of speech tags
list(final_paper_df['POS'])

[[('PRON', 'PRP'),
  ('VERB', 'VBP'),
  ('VERB', 'VBP'),
  ('DET', 'DT'),
  ('NOUN', 'NNS'),
  ('PRON', 'PRP'),
  ('VERB', 'VBP'),
  ('VERB', 'VBP'),
  ('PRON', 'PRP$'),
  ('NOUN', 'NNS'),
  ('VERB', 'VB'),
  ('ADV', 'RB'),
  ('ADV', 'RB'),
  ('SCONJ', 'IN'),
  ('DET', 'DT'),
  ('NOUN', 'NN'),
  ('VERB', 'VBZ'),
  ('ADP', 'IN'),
  ('VERB', 'VBN'),
  ('NOUN', 'NN'),
  ('CCONJ', 'CC'),
  ('DET', 'DT'),
  ('NOUN', 'NNS'),
  ('ADV', 'RB'),
  ('AUX', 'MD'),
  ('PART', 'RB'),
  ('VERB', 'VB'),
  ('ADP', 'RP'),
  ('ADP', 'IN'),
  ('PRON', 'PRP'),
  ('AUX', 'MD'),
  ('PRON', 'PRP'),
  ('VERB', 'VB'),
  ('PRON', 'PRP'),
  ('ADP', 'RP'),
  ('ADV', 'RB'),
  ('PUNCT', '.'),
  ('AUX', 'MD'),
  ('PART', 'RB'),
  ('PRON', 'PRP'),
  ('VERB', 'VB'),
  ('ADP', 'IN'),
  ('PRON', 'PRP$'),
  ('NOUN', 'NN'),
  ('PUNCT', '.'),
  ('AUX', 'MD'),
  ('PART', 'RB'),
  ('PRON', 'PRP'),
  ('VERB', 'VB'),
  ('PRON', 'PRP'),
  ('PRON', 'PRP$'),
  ('NOUN', 'NN'),
  ('PUNCT', '.'),
  ('PRON', 'PRP'),
  ('VERB', 'VBD'),

In [131]:
spacy.explain("IN")

'conjunction, subordinating or preposition'

#### Proper Nouns

In [119]:
# Define function to extract proper nouns from Doc object
def extract_proper_nouns(doc):
    return [token.text for token in doc if token.pos_ == 'PROPN']

# Apply function to Doc column and store resulting proper nouns in new column
final_paper_df['Proper_Nouns'] = final_paper_df['Doc'].apply(extract_proper_nouns)

Listing the nouns in each text can help us ascertain the texts’ subjects. Let’s list the nouns in two different texts, the text located in row 1 of the DataFrame and the text located in row 3.

In [120]:
list(final_paper_df.loc[[1, 3], 'Proper_Nouns'])

[['Omens', 'Alarm', 'oh', 'Alarm'], ['Whеn', 'Paradise']]

#### Named Entity Recognition

spaCy can tag named entities in the text, such as names, dates, organizations, and locations. Call the full list of named entities and their descriptions using this code:

In [121]:
# Get all NE labels and assign to variable
labels = nlp.get_pipe("ner").labels

# Print each label and its description
for label in labels:
    print(label + ' : ' + spacy.explain(label))

CARDINAL : Numerals that do not fall under another type
DATE : Absolute or relative dates or periods
EVENT : Named hurricanes, battles, wars, sports events, etc.
FAC : Buildings, airports, highways, bridges, etc.
GPE : Countries, cities, states
LANGUAGE : Any named language
LAW : Named documents made into laws.
LOC : Non-GPE locations, mountain ranges, bodies of water
MONEY : Monetary values, including unit
NORP : Nationalities or religious or political groups
ORDINAL : "first", "second", etc.
ORG : Companies, agencies, institutions, etc.
PERCENT : Percentage, including "%"
PERSON : People, including fictional
PRODUCT : Objects, vehicles, foods, etc. (not services)
QUANTITY : Measurements, as of weight or distance
TIME : Times smaller than a day
WORK_OF_ART : Titles of books, songs, etc.


We’ll create a function to extract the named entity tags from each Doc object and apply it to the Doc objects in the DataFrame, storing the named entities in a new column:

In [122]:
# Define function to extract named entities from doc objects
def extract_named_entities(doc):
    return [ent.label_ for ent in doc.ents]

# Apply function to Doc column and store resulting named entities in new column
final_paper_df['Named_Entities'] = final_paper_df['Doc'].apply(extract_named_entities)
final_paper_df['Named_Entities']

0                                             [PERSON]
1    [PERSON, PERSON, PERSON, PERSON, PERSON, GPE, ...
2    [PERSON, GPE, GPE, GPE, TIME, ORDINAL, GPE, GP...
3                                 [CARDINAL, ORG, LOC]
4                                                [ORG]
5                                      [CARDINAL, ORG]
Name: Named_Entities, dtype: object

In [123]:
# Define function to extract text tagged with named entities from doc objects
def extract_named_entities(doc):
    return [ent for ent in doc.ents]

# Apply function to Doc column and store resulting text in new column
final_paper_df['NE_Words'] = final_paper_df['Doc'].apply(extract_named_entities)
final_paper_df['NE_Words']

0                                            [(Stuck)]
1    [(‚), (Omens), (‚), (Remain), (‚), (Alarm), (m...
2    [(Believe), (Sugar), (Sugar), (Sugar), (Tonigh...
3                          [(one), (Whеn), (Paradise)]
4                                        [(favouritе)]
5                                    [(half), (Sodom)]
Name: NE_Words, dtype: object

In [124]:
# Extract the first Doc object
doc = final_paper_df['Doc'][1]

# Visualize named entity tagging in a single paper
#displacy.render(doc, style='ent', jupyter=True) #ipython >=8.0 and <9.0 needed to use displacy

#### Download Enriched Dataset
To save the dataset of doc objects, text reductions and linguistic annotations generated with spaCy, download the final_paper_df DataFrame to your local computer as a .csv file:

In [125]:
# Save DataFrame as csv (in Google Drive)
# Use this step only to save  csv to your computer's working directory
final_paper_df.to_csv('annotated_data_ST.csv')
final_paper_df.head(6)


Unnamed: 0,Filename,TITLE,ALBUM,ARTIST,YEAR,Text,Doc,Tokens,Lemmas,POS,Proper_Nouns,Named_Entities,NE_Words
0,Song1,Blood Sports,Sundowning,Sleep Token,2019,I wanna roll the numbers I wanna feel my stars...,"(I, wanna, roll, the, numbers, I, wanna, feel,...","[I, wanna, roll, the, numbers, I, wanna, feel,...","[I, wanna, roll, the, number, I, wanna, feel, ...","[(PRON, PRP), (VERB, VBP), (VERB, VBP), (DET, ...",[],[PERSON],[(Stuck)]
1,Song2,Dark Signs,Sundowning,Sleep Token,2019,"Where I was raised, there was no streetlights ...","(Where, I, was, raised, ,, there, was, no, str...","[Where, I, was, raised, ,, there, was, no, str...","[where, I, be, raise, ,, there, be, no, street...","[(SCONJ, WRB), (PRON, PRP), (AUX, VBD), (VERB,...","[Omens, Alarm, oh, Alarm]","[PERSON, PERSON, PERSON, PERSON, PERSON, GPE, ...","[(‚), (Omens), (‚), (Remain), (‚), (Alarm), (m..."
2,Song3,Sugar,Sundowning,Sleep Token,2019,And you play a twisted little game But I know ...,"(And, you, play, a, twisted, little, game, But...","[And, you, play, a, twisted, little, game, But...","[and, you, play, a, twisted, little, game, but...","[(CCONJ, CC), (PRON, PRP), (VERB, VBP), (DET, ...","[Believe, Sugar, Sugar, Sugar, Sugar, Sugar, S...","[PERSON, GPE, GPE, GPE, TIME, ORDINAL, GPE, GP...","[(Believe), (Sugar), (Sugar), (Sugar), (Tonigh..."
3,Song4,Dangerous,Even in Arcadia,Sleep Token,2025,I wish I could have known that Look in your ey...,"(I, wish, I, could, have, known, that, Look, i...","[I, wish, I, could, have, known, that, Look, i...","[I, wish, I, could, have, know, that, look, in...","[(PRON, PRP), (VERB, VBP), (PRON, PRP), (AUX, ...","[Whеn, Paradise]","[CARDINAL, ORG, LOC]","[(one), (Whеn), (Paradise)]"
4,Song5,Ghestomane,Even in Arcadia,Sleep Token,2025,"I wanted you to know, I've learned to live wit...","(I, wanted, you, to, know, ,, I, 've, learned,...","[I, wanted, you, to, know, ,, I, 've, learned,...","[I, want, you, to, know, ,, I, have, learn, to...","[(PRON, PRP), (VERB, VBD), (PRON, PRP), (PART,...",[Thought],[ORG],[(favouritе)]
5,Song6,Look at Windward,Even in Arcadia,Sleep Token,2025,Will you listen just as my form starts to fiss...,"(Will, you, listen, just, as, my, form, starts...","[Will, you, listen, just, as, my, form, starts...","[will, you, listen, just, as, my, form, start,...","[(AUX, MD), (PRON, PRP), (VERB, VB), (ADV, RB)...","[god, Sodom, Bridge]","[CARDINAL, ORG]","[(half), (Sodom)]"


### Analysis of Linguistic Annotation
Why are spaCy’s linguistic annotations useful to researchers? Below are two examples of how researchers can use data about the corpus, produced through spaCy, to draw conclusions about discipline and genre conventions in student academic writing. We will use the enriched dataset generated with spaCy for these examples.

#### Part of Speech Aanalysis
In this section, we’ll analyze the part-of-speech tags extracted by spaCy

In [126]:
# Store dictionary with indexes and POS counts in a variable
num_pos = doc.count_by(spacy.attrs.POS)

dictionary = {}

# Create a new dictionary which replaces the index of each part of speech for its label (NOUN, VERB, ADJECTIVE)
for k,v in sorted(num_pos.items()):
    dictionary[doc.vocab[k].text] = v

dictionary

{'ADJ': 42,
 'ADP': 43,
 'ADV': 1,
 'AUX': 47,
 'CCONJ': 40,
 'DET': 12,
 'INTJ': 7,
 'NOUN': 50,
 'NUM': 5,
 'PART': 29,
 'PRON': 97,
 'PROPN': 4,
 'PUNCT': 48,
 'SCONJ': 13,
 'VERB': 87}

In [127]:
# Create new DataFrame for analysis purposes
pos_analysis_df = final_paper_df[['Filename','TITLE', 'Doc']]

# Create list to store each dictionary
num_list = []

# Define a function to get part of speech tags and counts and append them to a new dictionary
def get_pos_tags(doc):
    dictionary = {}
    num_pos = doc.count_by(spacy.attrs.POS)
    for k,v in sorted(num_pos.items()):
        dictionary[doc.vocab[k].text] = v
    num_list.append(dictionary)

# Apply function to each doc object in DataFrame
pos_analysis_df.loc['C_POS'] = pos_analysis_df['Doc'].apply(get_pos_tags)

From here, we’ll take the part-of-speech counts and put them into a new DataFrame where we can calculate the frequency of each part-of-speech per document. In the new DataFrame, if a paper does not contain a particular part-of-speech, the cell will read NaN (Not a Number).

In [128]:
# Create new dataframe with part of speech counts
pos_counts = pd.DataFrame(num_list)
columns = list(pos_counts.columns)

# Add title of each song as new column to dataframe
idx = 0
new_col = pos_analysis_df['TITLE']
pos_counts.insert(loc=idx, column='TITLE', value=new_col)

pos_counts

Unnamed: 0,TITLE,ADJ,ADP,ADV,AUX,CCONJ,DET,NOUN,PART,PRON,PUNCT,SCONJ,VERB,INTJ,NUM,PROPN,X
0,Blood Sports,5,18,22,18,6,27,61,11,79,18,2,76,,,,
1,Dark Signs,42,43,1,47,40,12,50,29,97,48,13,87,7.0,5.0,4.0,
2,Sugar,8,25,32,33,2,25,35,3,72,46,14,54,3.0,,18.0,
3,Dangerous,6,20,13,25,6,10,31,11,53,10,11,41,1.0,1.0,2.0,
4,Ghestomane,24,54,34,59,17,43,74,23,134,48,16,93,8.0,1.0,1.0,
5,Look at Windward,12,54,22,39,12,48,62,7,93,38,9,70,7.0,,3.0,1.0


We can calculate the amount of times, on average, that each part-of-speech appears in each song, using the .groupby() and .mean() functions to group all part-of-speech counts from the texts together and calculate the mean usage of each part-of-speech

In [129]:
# Get average part of speech counts used in papers of each movie
average_pos_df = pos_counts.groupby(['TITLE']).mean()

# Round calculations to the nearest whole number
average_pos_df = average_pos_df.round(0)

# Reset index to improve DataFrame readability
average_pos_df = average_pos_df.reset_index()

# Show dataframe
average_pos_df

Unnamed: 0,TITLE,ADJ,ADP,ADV,AUX,CCONJ,DET,NOUN,PART,PRON,PUNCT,SCONJ,VERB,INTJ,NUM,PROPN,X
0,Blood Sports,5.0,18.0,22.0,18.0,6.0,27.0,61.0,11.0,79.0,18.0,2.0,76.0,,,,
1,Dangerous,6.0,20.0,13.0,25.0,6.0,10.0,31.0,11.0,53.0,10.0,11.0,41.0,1.0,1.0,2.0,
2,Dark Signs,42.0,43.0,1.0,47.0,40.0,12.0,50.0,29.0,97.0,48.0,13.0,87.0,7.0,5.0,4.0,
3,Ghestomane,24.0,54.0,34.0,59.0,17.0,43.0,74.0,23.0,134.0,48.0,16.0,93.0,8.0,1.0,1.0,
4,Look at Windward,12.0,54.0,22.0,39.0,12.0,48.0,62.0,7.0,93.0,38.0,9.0,70.0,7.0,,3.0,1.0
5,Sugar,8.0,25.0,32.0,33.0,2.0,25.0,35.0,3.0,72.0,46.0,14.0,54.0,3.0,,18.0,


Across these six Sleep Token tracks, we can observe substantial differences in the linguistic makeup of their lyrics, reflected in the part-of-speech frequencies. These distinctions help highlight each song’s stylistic tendencies and the ways in which the band constructs mood, narrative perspective, and emotional texture.

**“Ghestomane”** stands out as the most lexically dense and narratively heavy piece. Its exceptionally high pronoun count (134) suggests a strong focus on perspective, intimacy, and relational dynamics, common themes in Sleep Token’s writing. The high noun (78) and verb (93) frequencies reinforce the sense that *Ghestomane* is rich in concrete imagery and action, with complex clause structures that make the lyrics feel expansive and emotionally charged.

By contrast, **“Dangerous”** features some of the leanest linguistic structures: very few adverbs (13), determiners (10), nouns (33), and verbs (41). This may reflect a more minimalistic, repetitive lyrical style. The relatively low POS counts indicate a track that uses fewer syntactic elements to convey its message, relying more on atmosphere and musical delivery than on dense verbal content.

**“Dark Signs”** shows a very different pattern. High adjective use (42) and frequent coordinating conjunctions (40) point to highly descriptive passages and a tendency to string together ideas or emotional states. The elevated counts of auxiliaries (47) and adpositions (43) suggest more complex verbal constructions and more layered spatial or metaphorical relationships—fitting for a song that deals with nuanced emotional undercurrents.

**“Look at Windward”** displays high determiner use (47), substantial pronoun presence (93), and solid noun counts (66). This combination often signals lyrics that are specific, referential, and narratively clear. The structural density of the text hints at a song with detailed imagery or a developed emotional narrative.

**“Blood Sports”** is pretty moderate overall, though the relatively high verb count (76) paired with solid pronoun usage (79) suggests a strong emphasis on action and interpersonal themes. This linguistic pattern fits with the song’s urgent, confessional tone.

Finally, **“Sugar”** appears distinctive in its use of proper nouns (23), far more than any of the other tracks. This may indicate references to named entities, symbolic terms, or repeating motifs that function like proper names within the lyrical world. Otherwise, its POS distribution is relatively balanced, suggesting a lyrically steady but reference-heavy song.

Across all songs, numbers (NUM) remain consistently low (1–4), indicating that Sleep Token lyrics rarely draw on quantitative or literal detail. Instead, the band’s writing leans heavily into emotional expression, metaphor, intimacy, and imagery, reflected in the high concentrations of pronouns, verbs, and descriptive structures in many of the tracks.


In [130]:
# Use plotly to plot proper noun use per genre
fig = px.bar(average_pos_df, x="TITLE", y=["ADJ", 'VERB', "NUM"], title="Average Part-of-Speech Use in Sleep Token Lyrics", barmode='group')
fig.show()

### Conclusions
When comparing the linguistic profiles of the 2019 tracks to those from 2025, a consistent trend emerges: Sleep Token’s lyrics have become increasingly dense, elaborate, and syntactically complex over time. The first-era songs generally show lower counts across many linguistic categories, fewer nouns, verbs, adjectives, and pronouns overall. This suggests a more minimalistic or streamlined writing style early in their career, with lyrics that rely more on emotional suggestion and atmosphere than on detailed verbal construction.

In contrast, the 2025 songs, especially Ghestomane and Look at Windward, exhibit dramatically higher frequencies in key parts of speech such as pronouns, verbs, determiners, and adjectives. This shift implies that the band’s newer lyrical approach is more narrative, more descriptive, and more introspective. The rise in pronoun usage, for instance, hints at a deeper focus on identity, relationships, and internal dialogue, while the expanded use of nouns and verbs points to richer imagery and more dynamic emotional storytelling.

The spike in proper nouns in Sugar further emphasizes a newer tendency toward intertextuality, symbolic naming, or world-building elements not seen as strongly in the early tracks.

Taken together, this pattern suggests that Sleep Token’s lyrical evolution mirrors their musical evolution: over time, the writing becomes more detailed, expressive, and linguistically layered, reflecting a band growing more ambitious and explorative in how they construct meaning through language.