### Scraping song lyrics using Genius API

Import necessary packages

In [29]:
import json
import requests
import pandas as pd
from scrapy import Selector
from pprint import pprint

Open JSON file containing credentials

In [30]:
credentials_file_path = "../credentials.json"

with open(credentials_file_path, "r") as f:
    credentials = json.load(f)

Initialise a new session

In [31]:
session = requests.Session()

I created a custom function `generate_song_url` to generate the Genius page URL for a song using the title and artist of the song. 

In [45]:
def generate_song_url(song_artist, song_title):
    '''
    Returns a string of the URL for the Genius page of the song

        Parameters:
            song_artist (str): The artist of the song
            song_title (str): The title of the song

        Returns:
            song_url (str): The URL for the Genius page of the song
    '''
    base_url = 'https://genius.com/'
    
    formatted_artist = song_artist.lower().replace(' ', '-')
    formatted_title = song_title.lower().replace(' ', '-')
    
    song_url = f'{base_url}{formatted_artist}-{formatted_title}-lyrics'

    return song_url

I created a custom function `scrape_lyrics` to scrape song lyrics from the Genius page for any given song. 

Note that the lyrics returned are formatted such that each line of lyric appears in a new line, similar to how it is displayed on the Genius page.

In [46]:
def scrape_lyrics(song_url):
    '''
    Returns a string of song lyrics, with each line separated by a new line

        Parameters:
            song_url (str): The URL of the Genius page for the song

        Returns:
            lyrics (str): The lyrics of the song
    '''
    response = session.get(song_url)
    sel = Selector(text=response.text)
    lyrics = '\n'.join(sel.css('div.Lyrics__Container-sc-1ynbvzw-1.kUgSbL ::text').getall())

    return lyrics

At this point of data collection, we will have a pandas dataframe of already selected and filtered songs from using the YouTube API. Critically, the dataframe will have information on the name and artist of each song.

We now want to add the lyrics of each song into the dataframe.

In [40]:
# create placeholder dataframe for testing
songs_data = {
    'Title': ['Lose Yourself', 'Bones', 'Love Story'],
    'Artist': ['Eminem', 'Imagine Dragons', 'Taylor Swift']
}

songs_df = pd.DataFrame(songs_data)

In [42]:
# add Genius URL of each song to dataframe
songs_df['Genius_URL'] = songs_df.apply(lambda row: generate_song_url(row['Artist'], row['Title']), axis=1)

In [43]:
# add Genius lyrics of each song to dataframe
songs_df['Genius_lyrics'] = songs_df.apply(lambda row: scrape_lyrics(row['Genius_URL']), axis=1)

In [44]:
songs_df

Unnamed: 0,Title,Artist,Genius_URL,Genius_lyrics
0,Lose Yourself,Eminem,https://genius.com/eminem-lose-yourself-lyrics,"[Intro]\nLook, if you had one shot or one oppo..."
1,Bones,Imagine Dragons,https://genius.com/imagine-dragons-bones-lyrics,"[Verse 1]\nGimme, gimme, gimme some time to th..."
2,Love Story,Taylor Swift,https://genius.com/taylor-swift-love-story-lyrics,[Verse 1]\nWe were both young when I first saw...
