## Scraping song lyrics using Genius API

**Set-up**

Import necessary packages and check current working directory

Note: Do not run this more than once. Restart the kernel before running this code chunk.

In [None]:
import json
import pandas as pd
from pprint import pprint
from tqdm import tqdm

import os
os.chdir(os.path.expanduser("../"))

from dees_package.genius_functions import *

print("Current working directory:", os.getcwd())

Open JSON file containing credentials

In [None]:
credentials_file_path = './credentials.json'

with open(credentials_file_path, 'r') as f:
    credentials = json.load(f)

Initialise a new session

In [None]:
my_session = requests.Session()

### **Scrape lyrics of songs in CSV**

Import and clean YouTube data into dataframe

In [None]:
raw_df = pd.read_csv('./data/actual_youtube_data_50.csv')
cleaned_df = raw_df.drop(['Unnamed: 0', 'video_id', 'channel_title', 'channel_id', 'description'], axis=1)
cleaned_df = cleaned_df.rename(columns={'title': 'video_title'})
cleaned_df['video_title'] = cleaned_df['video_title'].str.replace(r'\(.*\)|\s+ft\..*', '', regex=True)

Create dataframe containing scraped data for the title, artist and URL

In [None]:
scraped_df = pd.DataFrame([search_genius(q) for q in tqdm(cleaned_df['video_title'])])

Add lyrics to dataframe

In [None]:
scraped_df['lyrics'] = scraped_df.apply(lambda row: scrape_lyrics(my_session, row['URL']) if row['URL'] else '', axis=1)

Clean dataframe by removing rows with empty lyrics

In [None]:
scraped_df.dropna(subset=['lyrics'], inplace=True)
df = scraped_df[scraped_df.astype(bool).any(axis=1)]

Save dataframe to csv

In [None]:
scraped_df.to_csv('./data/data_with_lyrics.csv')