# Spotify Lyrics

Tonight we're going to enrich the [Top Spotify Tracks of 2017](https://www.kaggle.com/nadintamer/top-tracks-of-2017) dataset with lyrics from the [Lyrics API](https://lyricsovh.docs.apiary.io)

❓ Start loading `requests`, `numpy` and `pandas` the usual way

In [1]:
%matplotlib inline
import matplotlib
import requests
import numpy as np
import pandas as pd
import re

❓ Load the `tracks.csv` file into a `tracks_df` variable (the CSV file is in the _same_ directory as this notebook).

In [2]:
tracks_df = pd.read_csv("tracks.csv")

❓ Get familiar with the dataset with the usual `.shape`, `.info()`, `.head()`, etc.

In [3]:
tracks_df.head(5)

Unnamed: 0,id,name,artists,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
0,7qiZfU4dY1lWllzX7mPBI,Shape of You,Ed Sheeran,0.825,0.652,1.0,-3.183,0.0,0.0802,0.581,0.0,0.0931,0.931,95.977,233713.0,4.0
1,5CtI0qwDJkDQGwXD1H1cL,Despacito - Remix,Luis Fonsi,0.694,0.815,2.0,-4.328,1.0,0.12,0.229,0.0,0.0924,0.813,88.931,228827.0,4.0
2,4aWmUDTfIPGksMNLV2rQP,Despacito (Featuring Daddy Yankee),Luis Fonsi,0.66,0.786,2.0,-4.757,1.0,0.17,0.209,0.0,0.112,0.846,177.833,228200.0,4.0
3,6RUKPb4LETWmmr3iAEQkt,Something Just Like This,The Chainsmokers,0.617,0.635,11.0,-6.769,0.0,0.0317,0.0498,1.4e-05,0.164,0.446,103.019,247160.0,4.0
4,3DXncPQOG4VBw3QHh3S81,I'm the One,DJ Khaled,0.609,0.668,7.0,-4.284,1.0,0.0367,0.0552,0.0,0.167,0.811,80.924,288600.0,4.0


❓ Re-assign `tracks_df` to only keep the columns `artists`, `name` & `duration_ms`.

In [4]:
tracks_df = tracks_df[['artists', 'name', 'duration_ms']]

---

## First contact with the API

The goal is now to **add an additional column `Lyrics`** to the `DataFrame` thanks to the [Lyrics API](https://lyricsovh.docs.apiary.io/)

❓ Read the documentation and perform an API call with your browser for a song you like.

❓ Use `requests` to perform an API call with Python

In [5]:
url = "https://api.lyrics.ovh/v1/The Beatles/Come Not Together"
response = requests.get(url)
if response.status_code == 200:
    lyrics = response.json()['lyrics']
    print(lyrics)
else:
    print("No lyrics found")

No lyrics found


❓ Create a function `fetch_lyrics(artists, title)` which encapsulate the behavior from the cell above 👆

❓ Make sure you handle the `404` error (no lyrics found for the song)


In [6]:
def fetch_lyrics(artists, title):
    url = f"https://api.lyrics.ovh/v1/{artists}/{title}"
    response = requests.get(url)
    if response.status_code == 200:
        lyrics = response.json()['lyrics']
        return lyrics
    else:
        return None

❓ Call the `fetch_lyrics` function with a song you like and print the returned value to check it is working

In [7]:
print(fetch_lyrics("The Beatles", "Come Together"))

Shoot me
Shoot me
Shoot me
Shoot me

Here come old flat-top
He come groovin' up slowly
He got joo-joo eyeball
He one holy roller
He got hair down to his knee
Got to be a joker he just do what he please

Shoot me
Shoot me
Shoot me
Shoot me

He wear no shoeshine
He got toe-jam football
He got monkey finger
He shoot Coca-Cola
He say I know you, you know me
One thing I can tell you is you got to be free
Come together, right now
Over me

Shoot me
Shoot me
Shoot me

He bag production
He got walrus gumboot
He got Ono sideboard
He one spinal cracker
He got feet down below his knee
Hold you in his armchair, you can feel his disease
Come together, right now
Over me

Right!

Heh, heh, heh, heh, heh, heh, heh, heh
Heh, heh, heh, heh, heh, heh, heh
Come, come, come, come, come

He roller coaster
He got muddy water
He one mojo filter
He say one and one and one is three
Got to be good-looking
'Cause he's so hard to see
Come together, right now
Over me

Shoot me
Shoot me
Shoot me

Oh!

Come together, 

---

## Enriching the dataset

Run this cell to add a new empty column `lyrics`:

In [8]:
tracks_df = tracks_df.copy()
tracks_df['lyrics'] = None


❓ Time to use the `fetch_lyrics` to create that new `lyrics` column on `tracks_df`!

<details>
    <summary>💡 <strong>Hint</strong> - Click to reveal</summary>
    Try using the following <code>DataFrame</code> functions: <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html"><code>iterrows()</code></a> (<a href="https://stackoverflow.com/a/23146038">example</a>) and
    <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html"><code>at[index, column]</code></a>
</details>

In [None]:
%%time
for index, row in tracks_df.iterrows():
    lyrics = fetch_lyrics(row['artists'], row['name'])
    tracks_df.at[index, 'lyrics'] = lyrics

In [None]:
tracks_df.head(10)

❓ **[BONUS]** Which songs talk about `love`?

<details>
    <summary>💡 <strong>Hint</strong> - Click to reveal</summary>
    Use <em>Boolean Indexing</em> with <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html"><code>pandas.Series.str.contains()</code></a>!
</details>


In [None]:
love_songs = tracks_df['lyrics'].str.contains('love', flags=re.IGNORECASE).fillna(False)
love_df = tracks_df[love_songs]
print(love_df.shape)
love_df.head(3)

❓ **[BONUS]** Plot the **distribution** of `duration_ms` of songs.

In [None]:
(tracks_df['duration_ms'] / (60*1000)).hist()