<h1>Webscrapping Lyrics</h1>

I used BeautifulSoup to parse through websites that contained the lyrics of my different artists which became my training data for my n-gram model. I did this with prior knowledge and some debugging with Chat GBT. The outcomes of this webscrapping processes was to get a data frame of lyrics by the artists. 

In [3]:
import pandas as pd
import numpy as np
from lexicon import Lexicon
import requests
from bs4 import BeautifulSoup
import os

In [2]:
url = "https://lyrics.az/joji/allsongs.html"


response = requests.get(url)


print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

#parse html using beautiful soup library
try:
    soup = BeautifulSoup(response.text, 'lxml') 
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

for row in rows:
    print(row.get_text())

Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 65 links
18
ATTENTION*
Bitter f**
CAN'T GET OVER YOU
Cocaine
COME THRU
Demons
Demons (Lunice Remix)
Don't You Know Who / Wrong Caller ID
Emdbt
Erase
Experience/Ego d**h
FeelTheRage
​foie
From You
Generation Zzz
Head in the Clouds
I Don't Wanna Waste My Time
I Know
I'LL SEE YOU IN 40*
I'm gonna wait
IHATESIMPING
La Cienega
Lemme Know
Lov U
Make It Out Alive/I Can Tell
Medicine
Medicine//You
Midsummer Madness
MissU
NO FUN *
Nomadic
Old Yeller
OMG
Party Monster
Peach Jam
Pills
Plastic Taste
Presidential
R.I.P.*
Rain on me
She's So Nice (Remix)
Slow Dance
Slow Dancing In The Dark
Test Drive
The Foreskin
They Don't Understand
Thom
Unsaved Info
VISA
WANTED U
Weź już tą line w końcu.
Where Does the Time Go*
WHY AM I STILL IN LA
Will He
Window
World$star Money
WORLD$TAR MONEY
Worldstar Money
XNXX
XXX
Yeah Right
You s** Charlie
Yung Michael
지워 (Erase)


In [3]:
song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")

# Print the collected song information
for s in song:
    print(s)


Processing hyperlink: https://lyrics.az/joji/-/18.html
Processing hyperlink: https://lyrics.az/joji/ballads-1/attention.html
Processing hyperlink: https://lyrics.az/joji/in-tongues/bitter-f**.html
Processing hyperlink: https://lyrics.az/joji/ballads-1/cant-get-over-you.html
Processing hyperlink: https://lyrics.az/joji/-/c**aine.html
Processing hyperlink: https://lyrics.az/joji/ballads-1/come-thru.html
Processing hyperlink: https://lyrics.az/joji/in-tongues/demons.html
Processing hyperlink: https://lyrics.az/joji/in-tongues/demons-lunice-remix.html
Processing hyperlink: https://lyrics.az/joji/ballads-1/dont-you-know-who-wrong-caller-id.html
Processing hyperlink: https://lyrics.az/joji/-/emdbt.html
Processing hyperlink: https://lyrics.az/joji/-/erase.html
Processing hyperlink: https://lyrics.az/joji/-/experience-ego-d**h.html
Processing hyperlink: https://lyrics.az/joji/-/feeltherage.html
Processing hyperlink: https://lyrics.az/joji/-/foie.html
Processing hyperlink: https://lyrics.az/joj

In [4]:
# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/jojisongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")


File saved successfully to /Users/hannahshu/jojisongs.csv


In [6]:
url = "https://lyrics.az/taylor-swift/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 358 links
'tis the damn season
"Bad Blood" Video Breakdown
"Blank Space" Video Breakdown
1989
1989 Era Mashup
1989 Tour Setlist
22
22's
A Message From Taylor
Ain't Nothing 'Bout You
Album of the Year Acceptance Speech (Grammys 2016)
All Because Of Ellen
All Night Diner
All Too Well
All Too Well (Sad Girl Autumn Version)
All You Had To Do Was Stay
Am I Ready For Love
American Boy
American Girl
Angelina
Animal
Apologize
Ayu Brazil - Shake It Off (Cover Taylor Swift) Legendado
Baby
Baby, Don't You Break My Heart Slow
Back to December
Back to December / Apologize / You're Not Sorry
Bad Blood
Bad Blood (Remix)
Bad Blood (Traduction Française)
Beautiful Eyes
Begin Again
Begin Again (Taylor’s Version)
Being With My Baby
Bette Davis Eyes
Bette Davis Eyes (Live 2011)
Better Man
Better Off
Better Than Revenge
Birch
Blank Space
Blank Space (Traduction Française)
Blank Space (Voice Memo)
Blank 

In [9]:
song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")



Processing hyperlink: https://lyrics.az/taylor-swift/evermore/tis-the-damn-season.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/bad-blood-video-breakdown.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/blank-space-video-breakdown.html
Processing hyperlink: https://lyrics.az/taylor-swift/1989/1989.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/1989-era-mashup.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/1989-tour-setlist.html
Processing hyperlink: https://lyrics.az/taylor-swift/red/22.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/22s.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/a-message-from-taylor.html
Processing hyperlink: https://lyrics.az/taylor-swift/unreleased-songs/aint-nothing-bout-you.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/album-of-the-year-acceptance-speech-grammys-2016.html
Processing hyperlink: https://lyrics.az/taylor-swift/-/all-because-of-ellen.html
Processing hyperlin

In [10]:
# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/taylorsongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")

File saved successfully to /Users/hannahshu/taylorsongs.csv


In [16]:
url = "https://lyrics.az/hozier/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 57 links
Almost (Sweet Music)
Alright*
Angel Of Small d**h & The Codeine Scene
Arsonist's Lullaby
As It Was
Be
Better Love
Blood
Cherry Wine
Dinner & Diatribes
Fireworks*
Foreigner's God
From Eden
From Eden (Live In America)
Get Away*
Ghost*
How To Explain*
Hozier
If I Sleep*
In A Week
In The Woods Somewhere
It Will Come Back
Jackie And Wilson
Jackie and Wilson (Live In America)
Like Real People Do
Like Real People Do (Live In America)
Moment's Silence
Movement
My Love Will Never Die
Nfwmb
Nina Cried Power
No Plan
Nobody
Optimist*
Pagliacci's In Town*
Problem (Cover) *
Run
Sedated
Shrike
Shrike
Snowman Song*
Someone New
Someone New (Live In America)
Sunlight
Take Me To Church
Talk
Tell It to My Heart
To be Alone
To Noise Making (Sing)
Wasteland, Baby!
Who Are You In The Dark?*
Whole Lotta Love
Work Song
Work Song (Live In America)
Would That I
Young Americans
Yours Right Now*


In [17]:
import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")



Processing hyperlink: https://lyrics.az/hozier/wasteland-baby/almost-sweet-music.html
Processing hyperlink: https://lyrics.az/hozier/-/alright.html
Processing hyperlink: https://lyrics.az/hozier/hozier/angel-of-small-d**h-the-codeine-scene.html
Processing hyperlink: https://lyrics.az/hozier/hozier/arsonists-lullaby.html
Processing hyperlink: https://lyrics.az/hozier/wasteland-baby/as-it-was.html
Processing hyperlink: https://lyrics.az/hozier/wasteland-baby/be.html
Processing hyperlink: https://lyrics.az/hozier/-/better-love.html
Processing hyperlink: https://lyrics.az/hozier/-/blood.html
Processing hyperlink: https://lyrics.az/hozier/hozier/cherry-wine.html
Processing hyperlink: https://lyrics.az/hozier/wasteland-baby/dinner-diatribes.html
Processing hyperlink: https://lyrics.az/hozier/-/fireworks.html
Processing hyperlink: https://lyrics.az/hozier/hozier/foreigners-god.html
Processing hyperlink: https://lyrics.az/hozier/hozier/from-eden.html
Processing hyperlink: https://lyrics.az/hoz

In [18]:
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/hoziersongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")

File saved successfully to /Users/hannahshu/hoziersongs.csv


In [19]:
url = "https://lyrics.az/kendrick-lamar/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/kendricksongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 722 links
"Alright" Video Breakdown
"i" (Live Performance on SNL)
"i" merchandise
"Sing About Me" Part 1 Video
07 | levitate
1 Train
100 Favors
2012 BET CYPHERS
2013 Yeezus Tour Dates
2DopeBoyz Freestyle
3 Minutes of Watts
5200
6'7
6'7 (Freestyle)
A Little Appalled
A Milli (Freestyle)
A Song For Buffy (Freestyle)
A Tale Of 2 Citiez Remix
A.D.H.D.
A1 Everything
Ab-Soul's Outro
Act Tuff
Ain't That Funkin Kinda Hard On You? (We Ain't never Gonna Stop Remix)
Ain't That The Truth
Alien Girl (Today w/ Her)
Alien Girl (Today With Her)
All Day
All Day (Demo)
All Day (Remix)
All My Life (Remix)
All The Stars
Alright
Alright (11)
Alright (BET Performance)
Alright (BET Version)
Alright (Music Video)
Alright (Remix)
American Dream
American Soul
Another n***a (To Pimp A Butterfly)
As We Proceed (To Give You What You Need)
Average Joe
b**h Don't k** My Vibe (Remix)
b**h Don't k** My Vibe (Traduçã

In [20]:
url = "https://lyrics.az/luke-combs/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements= soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/lukesongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 28 links
6 feet apart*
A Long Way
Be Careful What You Wish For
Beautiful Crazy
Beer Can
Beer Never Broke My Heart
Can I Get An Outlaw
Dive
Doin’ This
Don't Tempt Me
Honky Tonk Highway
Houston, We Got A Problem
Hurricane
I Got Away With You
I Know She Ain't Ready
Let the Moonshine
Lonely One
Memories Are Made Of
Must've Never Met You
One Number Away
Out There
She Got the Best of Me
South On Ya
The great divide
The Way She Rides
This One's for You
Used To You
When It Rains It Pours
Processing hyperlink: https://lyrics.az/luke-combs/-/6-feet-apart.html
Processing hyperlink: https://lyrics.az/luke-combs/this-ones-for-you/a-long-way.html
Processing hyperlink: https://lyrics.az/luke-combs/this-ones-for-you/be-careful-what-you-wish-for.html
Processing hyperlink: https://lyrics.az/luke-combs/this-ones-for-you/beautiful-crazy.html
Processing hyperlink: https://lyrics.az/luke-combs/this-ones-

In [21]:
url = "https://lyrics.az/olivia-rodrigo/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/oliviasongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 39 links
1 step forward, 3 steps back
All I Want
All I want
All I Want (Love That Lasts Mix)
Born to Be Brave
Breaking Free (Nini, Ricky & E.J. Version)
Brutal
Can't Take My Eyes Off You
Crazy
Deja vu
Drivers Licence,Arcade remix
Drivers license (SOUR Prom)
Drivers License Clean
enough for you
favorite crime
Good 4 u
Granted
happier
hope ur ok
hope ur ok
I Think I Kinda, You Know (Duet)
I Think I Kinda, You Know (Nini Version)
jealousy, jealousy
Just for a Moment
Love Is Blindness
Maniac
Out of the Old
Perfect
Recall the times**
River
Space Oddity
Start of Something New (Nini Version)
Tainted Love
The Best Part
The Rose Song
traitor
Trouble
Wondering
Wondering (Ashlyn & Nini Piano Version)
Processing hyperlink: https://lyrics.az/olivia-rodrigo/sour/1-step-forward-3-steps-back.html
Processing hyperlink: https://lyrics.az/olivia-rodrigo/best-of-high-school-musical-the-musical-the-seri

In [3]:
url = "https://lyrics.az/ed-sheeran/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/edsongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 391 links
(All Along The) Watchtower
#FoodRevolutionDay Song
1000 Nights
18
25 Tracks
2step
A Team
Addicted
Afire Love
Afire Love- AP Lit
Afterglow
Ag Smaoineamh Ã³s Ard
All About It
All of the Stars
Amsterdam (Sweet Mary Jane)
Angels Can't Fly
Antisocial
Are You With Me?
Autumn Leaves
Baby One More Time
Back Someday
Bad Habits
Bad Habits (MEDUZA Remix)
Barcelona
Bartender
Be Like You
Be My Forever
Be My Husband
Be my husband - live from glastonbury
Be Right Now
Beautiful People
Będę raperem
Best Part Of Me
Beyond The Pale
Bibia Be Ye Ye
Billy Ruskin
Blind Faith Radio One Live Lounge
Bloodstream
Bloodstream - live from zugspitze
Blow
Boa Me
Bonus Track
Boulevard of Broken Dreams
Brand New Style
Can't See Straight
Candle In The Wind
Candle In The Wind (2018 Version)
Candle In The Wind (Elton John Cover)
Castle On The Hill
Castle on the Hill (Acoustic)
Ciao Adios
Cold Coffee
Cold Wate

In [5]:
url = "https://lyrics.az/mitski/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/mitskisongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 57 links
A Burning Hill
A Loving Feeling
Abbey
Bag of Bones
Because Dreaming Costs Money, My Dear
Between The Breaths
Brand New City
Broken Necks
Carry Me Out
Circle
Cla** of 2013
Crack Baby
Dan The Dancer
Drunk Walk Home
Eric
Everyone
Fireworks
First Love / Late Spring
Folly
Francis Forever
Goodbye, My Danish Sweetheart
Happy
Heat Lightning
Humpty
I Bet On Losing Dogs
I Don't Smoke
I Guess
I Will
Jobless Monday
Last Words of a Shooting Star
Liquid Smooth
Love Me More
My Body's Made Of Crushed Little Stars
Once More To See You
Pearl Diver
Pocket Full of Posies
Puberty 2
Real Men
Shame
Should’ve Been Me
Sparrow
Square
Stay Soft
Strawberry Blond
Susie Save Your Love
Texas Reznikoff
That’s Our Lamp
The Baddy Man
The Only Heartbreaker
There’s Nothing Left Here for You
Thursday Girl
Townie
Valentine, Texas
Wife
Working for the Knife
Your Best American Girl
Your Best American Girls *
Proc

In [6]:
url = "https://lyrics.az/billie-eilish/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/billiesongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 68 links
!!!!!!!
&burn
8
All The Good Girls Go To Hell
​another stupid song**
b**hes broken hearts
​bad company/so n so**
Bad Guy
​because i'm in love with you**
Bellyache
Billie Bossa Nova
Bored
Boy
Bury A Friend
Come Out And Play
Copycat
daddy**
Everybody Dies
Ex Files
Fingers Crossed
Getting Older
GOLDWING
Goodbye
Halley's Comet
Happier Than Ever
Hostage
I Didn't Change My Number
I Love You
I'm the beauty
Idontwannabeyouanymore
Ilomilo
​jupiter*
​let me know*
limbo
Listen
Listen Before I Go
Lo Vas A Olvidar
Lost Cause
Lovely
Male Fantasy
My boy
my future
My Strange Addiction
NDA
No Time To Die
Not My Responsibility
Ocean Eyes
Ocean Eyes (Astronomyy Edit)
Ocean Eyes (Blackbear Remix)
OverHeated
Oxytocin
​panic*
Party favor
​see-through
She's Broken
SIRENS | Z1RENZ
Six Feet Under
The End of the World
Therefore I Am
​true blue
Watch
When I Was Older
When The Party's Over
winner*
Wis

In [3]:
url = "https://lyrics.az/tyler-the-creator/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/tylersongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 139 links
2SEATER
435
48
911 / Mr. Lonely
After The Storm
After The Storm (Pete Rock Remix)
Analog
Answer
Appraised*
AssMilk
AU79
Awkward
b**h s** Dick
Bastard
BIG PERSONA
Big Persona (Portugese version)
Biking
BLESSED
Blow
BLOW MY LOAD
Boppin' b**h
Boredom
Bring It Back (Remix)
Bronco*
BUFFALO
Burger
Castaway
CHERRY BOMB
Colossus
CORSO
Cowboy
Crust In Their Eyes
DEATHCAMP
Domo23
Dropping Seeds
Enjoy Right Now Today
FAWN
FIND YOUR WINGS
Fish
Foreward
Foreword
French!
FUCKING YOUNG / PERFECT
Garbage
Garden Shed
GELATO
Glitter
Goblin
Golden
Her
Here We Go... Again
HOT WIND BLOWS
I Ain't Got Time
I'M A RAPPER**
IFHY
Inglorious
Jack And The Beanstalk
Jamba
Jipata (จิปาถะ)
JUGGERNAUT
KEEP DA O'S
LEMONHEAD
Lights On
Lone
LUMBERJACK
MANIFESTO
MASSA
MOMMA TALK
NAGA
Nightmare
NOIZE
November
Odd Toddlers
OKAGA, CA
OKRA
OPEN A WINDOW
Open That Coca-Cola
Parade
Parking Lot
Parliament
PartyIsntO

In [4]:
url = "https://lyrics.az/the-beatles/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/beatlessongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 782 links
1
1. Lucy in the Sky with Diamonds
12-bar Original
1822!
A Beginning
A Collection of Beatles Oldies (But Goldies!)
A Day in the Life
A Day In The Life(lennon/mccartney)
A Hard Day's Night
A Hard Day's Night (lennon/mccartney)
A Hard Day's Night (U.S. version)
A Hard Day's Night script
A Little Rhyme
A Picture Of You
A Shot Of Rhythm And Blues
A Taste of Honey
A Taste Of Honey (scott/marlow)
A Taste Of Honey(marlow/scott)
Abbey Road Medley
Across the Universe
Across The Universe (lennon/mccartney)
Across the Universe (Wildlife Version)
Act Naturally
Act Naturally (morrison/russell)
Ain't She Sweet
All I've Got to Do
All I've Got To Do (lennon/mccartney)
All My Loving
All My Loving (lennon/mccartney)
All Things Must Pa**
All Those Years Ago *
All Together Now
All Together Now (lennon/mccartney)
All Together on the Wireless Machine
All You Need Is Love
All You Need Is Love (l

In [4]:
url = "https://lyrics.az/seal/allsongs.html"

# Get the response from the URL
response = requests.get(url)

# Check the response status and content type
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers['Content-Type']}")

# Parse the HTML using BeautifulSoup
try:
    soup = BeautifulSoup(response.text, 'lxml')  # Try using 'lxml' parser
    print("Soup object created successfully with lxml")
except Exception as e:
    print(f"Error creating Soup object with lxml: {e}")
    print("Trying with html.parser")
    try:
        soup = BeautifulSoup(response.text, 'html.parser')  # Fallback to 'html.parser'
        print("Soup object created successfully with html.parser")
    except Exception as e:
        print(f"Error creating Soup object with html.parser: {e}")
        soup = None

# Find all the relevant links
if soup:
    try:
        rows = soup.find_all("a", class_="px-2 py-1 default-link list-group-item-action d-block")
        print(f"Found {len(rows)} links")
    except Exception as e:
        print(f"Error finding links: {e}")
        rows = []
else:
    rows = []

# Display the links
for row in rows:
    print(row.get_text())

import requests
from bs4 import BeautifulSoup

song = []

for row in rows:
    # Extract and clean song title and hyperlink
    song_title = row.get_text().replace('Lyrics', '').replace('Joji - ', '')
    hyperlink = row.get("href")
    
    # Check if hyperlink is valid
    if hyperlink:
        print(f"Processing hyperlink: {hyperlink}")
    else:
        print("No hyperlink found, skipping...")
        continue

    try:
        response1 = requests.get(hyperlink)
        response1.raise_for_status()  # Check if the request was successful

        soup1 = BeautifulSoup(response1.text, 'html.parser')  # Ensure parser is valid
        lyrics_elements = soup1.find_all("p", class_="song-lyrics")
        
        # Extract lyrics and clean up text
        lyrics = []
        for lyric in lyrics_elements:
            lyrics.append(lyric.get_text().replace("\n", " <br> "))

        # Join the lyrics into a single string
        lyrics_text = " ".join(lyrics)

        row_info = {"Song Title": song_title, "Lyrics": lyrics_text}
        song.append(row_info)

    except requests.RequestException as e:
        print(f"Request failed for URL {hyperlink}: {e}")
    except Exception as e:
        print(f"An error occurred while processing the URL {hyperlink}: {e}")
    
import os
import pandas as pd

# Create a DataFrame from the song list
df = pd.DataFrame(song)

# Remove any duplicates based on the 'Lyrics' column
df_songs = df.drop_duplicates(subset='Lyrics', keep='first')

# Define a writable file path
file_path = os.path.expanduser('~/sealsongs.csv')  # This writes to the home directory

try:
    # Save the DataFrame to a CSV file
    df_songs.to_csv(file_path, index=False)
    print(f"File saved successfully to {file_path}")

    # Reset the index of the DataFrame
    df_songs.reset_index()
except Exception as e:
    print(f"Failed to save the file: {e}")



Status Code: 200
Content Type: text/html; charset=UTF-8
Soup object created successfully with lxml
Found 232 links
A Change Is Gonna Come
A Father's Way
A Minor Groove
Ain't nothing but a house party - bonus track
Ain’t No Better Love
All for Love
Amazing
Amazing - kaskade dub
Amazing - kaskade remix
Amazing - thin white duke dub
Amazing - Thin White Duke Main
Amazing (Thin White Duke Edit)
Anyone Who Knows What Love Is
Ashley Wednesday
Autumn Leaves
Back Stabbers
Backstabbers
Beginning
Best of Me
Big Time
Bird Of Freedom
Blues In 'e'
Came See What Love Has Done
Can't Stop A River
Christmas Song (Chestnuts Roasting)
Color
Colour
Come See What Love Has Done
Come See What Love Has Done (Live Version)
Crazy
Crazy - live in paris
Crazy (Acoustic Version)
Crazy (Acoustic Version/Instrumental Version)
Crazy (Acoustic/Instrumental Version)
Crazy (Ananda Project Vocal Mix)
Crazy (Chick on My Tip Mix)
Crazy (Orange Factory Mix)
Cry To Me
Daylight Reprise
Daylight Saving
Daylight Savings
Deep Wa