## 1. Importing Python Libraries 

We shall start by importing the essential Python libraries

In [1]:
### IMPORTING LIBRARIES
import numpy as np
import pandas as pd
import lyricsgenius as lg

## 2. Connecting to the API

We now use the Client Access Token we had received when we signed up for Genius API and use the lyricsgenius library to gain access to the data available in genius.com. Here, we have few parameters to keep in mind:

1. timeout: this is the time in seconds before quitting the response
2. retries: the number of retries in case of timeouts and errors
3. remove_section_headers: to decide whether we want the lyrics to include headers defining sections as verses or choruses
4. skip_non_songs: to decide whether we want only songs or songs plus other contents like interviews
5. excluded_terms: helps to get rid of the remix, live and other versions of already existing songs by filtering out titles containing these terms

Here, we exclude the songs containing the terms _'Remix'_, _'Mix'_, _'Version'_ and _'Overdub'_ in their titles so that we can only get the original studio versions of the songs. So with these, we shall create an object _genius_ of class API.

In [2]:
### USING API TOKEN TO GET DATA
api_token = 'Client Token Access from Genius API'
genius = lg.Genius(api_token, timeout = 100, retries = 100, remove_section_headers = True, skip_non_songs = True, excluded_terms = ['(Remix)', '(Mix)', '(Version)', ('Overdub')])

## 3. Pulling the Data for Any Artist from the API

To store data properly and to make searches efficient, genius.com gives a unique _id_ to every artist. So, if we want to pull data for a specific artist, we shall first try to get their unique _id_. For this, we can use the _search_artists_ method in _genius_ followed by the name of any artist whose data we require. Here, we can try to find one for The Beatles.

In [3]:
### PULLING ARTIST DATA
artist_data = genius.search_artists('The Beatles')
print(artist_data)

{'sections': [{'type': 'artist', 'hits': [{'highlights': [], 'index': 'artist', 'type': 'artist', 'result': {'_type': 'artist', 'api_path': '/artists/586', 'header_image_url': 'https://images.genius.com/817d7fb288bb1c8456140d7e4987e7e7.400x226x148.gif', 'id': 586, 'image_url': 'https://images.genius.com/c771d3ee1c0969503cdaf34edf76f38a.400x400x1.jpg', 'index_character': 'b', 'is_meme_verified': False, 'is_verified': False, 'name': 'The Beatles', 'slug': 'The-beatles', 'url': 'https://genius.com/artists/The-beatles'}}, {'highlights': [], 'index': 'artist', 'type': 'artist', 'result': {'_type': 'artist', 'api_path': '/artists/377486', 'header_image_url': 'https://assets.genius.com/images/default_avatar_300.png?1645115871', 'id': 377486, 'image_url': 'https://assets.genius.com/images/default_avatar_300.png?1645115871', 'index_character': 'b', 'is_meme_verified': False, 'is_verified': False, 'name': 'The Beatles Revival Band', 'slug': 'The-beatles-revival-band', 'url': 'https://genius.com/

If we look carefully, we see that the first value in the list that was returned has details for The Beatles, including their _'id'_ which is 586. With this unique _'id'_, we can now use the _search_artist_ method in _genius_ to get the data of all the songs recorded by The Beatles.

In [4]:
### PULLING DATA OF ALL SONGS PERFORMED BY THE ARTIST
artist = genius.search_artist('586', artist_id = 586, sort = 'title')

Changing artist name to 'The Beatles'
Song 1: "12-Bar Original"
Couldn't find the lyrics section. Please report this if the song has lyrics.
Song URL: https://genius.com/The-beatles-12-bar-original-take-2-edited-lyrics
Song 2: "12-Bar Original (Take 2, Edited)"
Song 3: "1822!"
Couldn't find the lyrics section. Please report this if the song has lyrics.
Song URL: https://genius.com/The-beatles-a-beginning-lyrics
Song 4: "A Beginning"
Song 5: "A Beginning (Take 4) / Don’t Pass Me By (Take 7)"
Song 6: "Across the Universe"
"Across the Universe (1970 Glyn Johns Mix)" is not valid. Skipping.
"Across The Universe (2021 Mix)" is not valid. Skipping.
Song 7: "Across the Universe (Take 2)"
Song 8: "Across the Universe (Take 6)"
"Across the Universe (Wildlife Version)" is not valid. Skipping.
Song 9: "Act Naturally"
Song 10: "A Day in the Life"
"A Day in the Life (2017 Remix)" is not valid. Skipping.
"A Day in the Life (LOVE Version)" is not valid. Skipping.
"A Day in the Life (Orchestra Overdub

Song 139: "Don’t Pass Me By"
"Don’t Pass Me By (2018 Mix)" is not valid. Skipping.
Song 140: "Don’t Pass Me By (Takes 3 & 5)"
Song 141: "Down in Eastern Australia"
Song 142: "Do You Want to Know a Secret"
Song 143: "Do You Want to Know a Secret (Live)"
Song 144: "Dream Baby"
Song 145: "Drive My Car"
Song 146: "Drive My Car/The Word/What You’re Doing"
"Drive My Car/The Word/What You’re Doing (LOVE Version)" is not valid. Skipping.
Song 147: "Eight Days a Week"
Song 148: "Eleanor Rigby"
"Eleanor Rigby/Julia (LOVE Version)" is not valid. Skipping.
Song 149: "Eleanor Rigby / Julia (Transition)"
Song 150: "Eleanor Rigby (Strings Only)"
Song 151: "Everybody’s Got Something to Hide Except Me and My Monkey"
"Everybody’s Got Something to Hide Except Me and My Monkey (2018 Mix)" is not valid. Skipping.
Song 152: "Everybody’s Got Something to Hide Except Me and My Monkey (Esher Demo)"
Couldn't find the lyrics section. Please report this if the song has lyrics.
Song URL: https://genius.com/The-bea

Song 268: "I’ll Get You"
Song 269: "I’ll Get You (Live At The BBC For “Saturday Club” / 5th October, 1963)"
Song 270: "I’m a Loser"
Song 271: "I’m A Loser [Live at the BBC Disk 2]"
Song 272: "I’m Down"
Song 273: "I’m Down (Take 1)"
Song 274: "I Me Mine"
"I Me Mine (1970 Glyn Johns Mix)" is not valid. Skipping.
"I Me Mine (2021 Mix)" is not valid. Skipping.
"I Me Mine (Rehearsal) [Mono]" is not valid. Skipping.
Song 275: "I Me Mine (Take 16)"
Song 276: "I’m Gonna Sit Right Down And Cry (Over You) (Live at the BBC)"
Song 277: "I’m Gonna Sit Right Down and Cry (Over You) [Live at the BBC for “Pop Go The Beatles” / 6th August, 1963]"
Song 278: "I’m Happy Just to Dance with You"
Song 279: "I’m in Love"
Song 280: "I’m Looking Through You"
Song 281: "I’m Looking Through You (Take 1)"
Song 282: "I’m Only Sleeping"
Couldn't find the lyrics section. Please report this if the song has lyrics.
Song URL: https://genius.com/The-beatles-im-only-sleeping-rehearsal-lyrics
Song 283: "I’m Only Sleeping (

Song 410: "My Bonnie (Mein Herz Ist Bei Dir Nur)"
Song 411: "Nobody’s Child"
Song 412: "No Reply"
Song 413: "No Reply (Demo)"
Song 414: "Norwegian Wood (This Bird Has Flown)"
Song 415: "Norwegian Wood (This Bird Has Flown) [Take 1]"
Song 416: "Not a Second Time"
Song 417: "Not Guilty"
Song 418: "Not Guilty (Esher Demo)"
Song 419: "Not Guilty (Take 102)"
Song 420: "Nothin’ Shakin’"
Song 421: "Nowhere Man"
Song 422: "Ob-La-Di, Ob-La-Da"
Song 423: "Ob-La-Di, Ob-La-Da (Esher Demo)"
Song 424: "Ob-La-Di, Ob-La-Da (Take 3)"
Song 425: "Ob-La-Di, Ob-La-Da (Take 5)"
Song 426: "Octopus’s Garden"
"Octopus’s Garden (2019 Mix)" is not valid. Skipping.
"Octopus’s Garden (LOVE Version)" is not valid. Skipping.
"Octopus’s Garden (Rehearsal) [Mono]" is not valid. Skipping.
Song 427: "Octopus’s Garden (Take 9)"
Song 428: "Oh! Darling"
"Oh! Darling (2019 Mix)" is not valid. Skipping.
"Oh! Darling (Jam)" is not valid. Skipping.
Song 429: "Oh! Darling (Take 4)"
Song 430: "Old Brown Shoe"
Song 431: "Old Brow

Song 547: "Suzy Parker"
Song 548: "Swanee River"
Song 549: "Sweet Georgia Brown"
Song 550: "Sweet Little Sixteen [Live at the BBC Disk 2]"
Song 551: "Sweet Little Sixteen (Live in Germany)"
Song 552: "Take Good Care of My Baby"
Song 553: "Talkin’ ’Bout You (Live in Germany)"
Song 554: "Taxman"
Song 555: "Taxman (Take 11)"
"Teddy Boy (1969 Glyn Johns Mix)" is not valid. Skipping.
Song 556: "Teddy Boy (Savile Row Sessions)"
Song 557: "Tell Me What You See"
Song 558: "Tell Me Why"
Song 559: "Tell Me Why (EP)"
Song 560: "Thank You Girl"
Song 561: "Thank You Girl (Live at the BBC for “Easy Beat” / 23rd June, 1963)"
Song 562: "That Means a Lot (Take 1)"
Song 563: "That’s All Right (Mama) [Live at the BBC for “Pop Go The Beatles” / 16th July, 1963]"
Song 564: "That’s Alright (Mama)"
Song 565: "The Ballad of John and Yoko"
Song 566: "The Ballad of John and Yoko (Take 7)"
Song 567: "The Beatles’ 1968 Christmas Record"
Song 568: "The Beatles’ Christmas Record"
Song 569: "The Beatles Compilation 

Song 683: "You’ve Got to Hide Your Love Away"
Song 684: "You’ve Got To Hide Your Love Away - Take 5, Mono"
Song 685: "You’ve Got to Hide Your Love Away (Takes 1, 2 & 5)"
Song 686: "You Won’t See Me"
Song 687: "Yo’ve Got to Hide Your Love Away"
Done. Found 687 songs.


## 4. Data Exploration

Let us now take a look at the object that was returned.

In [5]:
print(artist)

The Beatles, 687 songs


So the object _artist_ contains data for 687 songs recorded by The Beatles. Keep in mind that the actual number of total songs recorded by them is only 213; thus we need to get rid of the extra songs which might be different versions of the same songs.

Now lets look at the song title and song lyrics contained in _artist_.

In [6]:
### PRINTING THE SONG TITLE
song = artist.songs[8].title
print(song)

Act Naturally


We have our song title. Next, we have it's lyrics.

In [7]:
### PRINTING THE SONG LYRICS
lyrics = artist.songs[8].lyrics
print(lyrics)

Act Naturally Lyrics
They're gonna put me in the movies
They're gonna make a big star out of me
We'll make a film about a man that's sad and lonely
And all I gotta do is act naturally

Well I'll bet you I'm gonna be a big star
Might win an Oscar, you can never tell
The movies gonna make me a big star
'Cause I can play the part so well

Well I hope you'll come and see me in the movies
Then I know that you will plainly see
The biggest fool that ever hit the big time
And all I gotta do is act naturally

We'll make the scene about a man that's sad and lonely
And begging down upon his bended knee
I'll play the part and I won't need rehearsing
All I have to do is act naturally

Well I'll bet you I'm gonna be a big star
Might win an Oscar, you can never tell
The movies gonna make me a big star
'Cause I can play the part so well
Well I hope you'll come and see me in the movies
Then I know that you will plainly see
The biggest fool that ever hit the big time
And all I gotta do is act naturally4

Notice that the lyrics has the song title at the top followed by the word _'Lyrics'_. Also, it ends with the word _'Embed'_ along with a number. This is true for the lyrics of all the other songs. The formatting also needs to be dealt with since the lyrics contains a lot of _'\n'_ i.e. new line.

## 5. Cleaning the Lyrics Data I

Let us try to resolve these issues for this particular lyrics. To get rid of the song title on the first line, we split the lyrics using the word _'Lyrics'_ and only take the second portion. For the extra word in the last line, we again split the lyrics using the word _'Embed'_ and take only the first portion. Then in the remaining lyrics, we replace each _'\n'_ with a blank space. Let's find out if we solved these issues.

In [8]:
### CLEANING THE LYRICS OF A SINGLE SONG
lyrics1 = lyrics.split('Lyrics')[1]
lyrics2 = lyrics1.split('Embed')[0]
lyrics3 = lyrics2.replace('\n', ' ')
print(lyrics3)

 They're gonna put me in the movies They're gonna make a big star out of me We'll make a film about a man that's sad and lonely And all I gotta do is act naturally  Well I'll bet you I'm gonna be a big star Might win an Oscar, you can never tell The movies gonna make me a big star 'Cause I can play the part so well  Well I hope you'll come and see me in the movies Then I know that you will plainly see The biggest fool that ever hit the big time And all I gotta do is act naturally  We'll make the scene about a man that's sad and lonely And begging down upon his bended knee I'll play the part and I won't need rehearsing All I have to do is act naturally  Well I'll bet you I'm gonna be a big star Might win an Oscar, you can never tell The movies gonna make me a big star 'Cause I can play the part so well Well I hope you'll come and see me in the movies Then I know that you will plainly see The biggest fool that ever hit the big time And all I gotta do is act naturally4


The lyrics data looks cleaner than before although it still contains a numerical value at the end; this shall be resolved later. Now we need to do this for all the songs but before that we shall create a dataframe for the song titles and lyrics.

## 6. Creating a Lyrics Dataframe


First, let us create a list containing the song titles in one column and their respective lyrics in the other.

In [9]:
### CREATING A LIST 
lyrics_list = []
for song in range(0, len(artist)):
    song_title = artist.songs[song].title
    song_lyrics = artist.songs[song].lyrics
    lyrics_list.append([song_title, song_lyrics])

We shall now convert this list into a pandas dataframe.

In [10]:
### CREATING A PANDAS DATAFRAME
lyrics_df = pd.DataFrame(lyrics_list, columns = ['title', 'lyrics'])
lyrics_df.head(10)

Unnamed: 0,title,lyrics
0,12-Bar Original,"12-Bar Original Lyrics\nOne, two, three, four!..."
1,"12-Bar Original (Take 2, Edited)",
2,1822!,"1822! Lyrics This is a Dorsey Burnette number,..."
3,A Beginning,
4,A Beginning (Take 4) / Don’t Pass Me By (Take 7),A Beginning (Take 4) / Don’t Pass Me By (Take ...
5,Across the Universe,Across the Universe Lyrics\nWords are flowing ...
6,Across the Universe (Take 2),Across the Universe (Take 2) Lyrics\nWords are...
7,Across the Universe (Take 6),Across the Universe (Take 6) Lyrics\nWords are...
8,Act Naturally,Act Naturally Lyrics\nThey're gonna put me in ...
9,A Day in the Life,A Day in the Life Lyrics\nI read the news toda...


Notice that the dataframe does not contain lyrics for every song title i.e. there are missing values in the data.

## 7. Cleaning the Lyrics Data II


Now, let us clean the lyrics of all the songs contained in the dataframe using the same process as above but this time in a loop.

In [11]:
### CLEANING THE LYRICS OF ALL THE SONGS
for i in range(0, len(lyrics_df)):
    if len(lyrics_df['lyrics'][i]) > 0:
        song_lyrics = lyrics_df['lyrics'][i].split('Lyrics')[1]
        song_lyrics1 = song_lyrics.split('Emded')[0]
        lyrics_df['lyrics'][i] = song_lyrics1.replace('\n', ' ')

We check to see if we were indeed able to clean the lyrics like we had wanted to.

In [12]:
lyrics_df.head(10)

Unnamed: 0,title,lyrics
0,12-Bar Original,"One, two, three, four! Embed"
1,"12-Bar Original (Take 2, Edited)",
2,1822!,"This is a Dorsey Burnette number, brother of ..."
3,A Beginning,
4,A Beginning (Take 4) / Don’t Pass Me By (Take 7),"This is the introduction to Ringo's ""Don't Pa..."
5,Across the Universe,Words are flowing out like endless rain into ...
6,Across the Universe (Take 2),Words are flowing out like endless rain into ...
7,Across the Universe (Take 6),Words are flowing out like endless rain into ...
8,Act Naturally,They're gonna put me in the movies They're go...
9,A Day in the Life,"I read the news today—oh, boy About a lucky m..."


## 8. Saving the Lyrics Dataframe

Now, let us save this dataframe in csv format which can be used later to make models.

In [13]:
lyrics_df.to_csv('beatles_lyrics_data.csv', index = False)