Issue on page /04-Data-Collection/08-Collect-Genius-Lyrics.html #36

adamlporter · 2023-02-24T21:40:39Z

When I tried to work through this page, I got an error when trying to execute

artist = LyricsGenius.search_artist("Missy Elliott", max_songs=6)

The error is

HTTPError: 403 Client Error: Forbidden for url: https://genius.com/api/search/multi?q=Missy+Elliott

Apparently, genius.com has changed one (or more) of their settings, so that LyricsGenius no longer works. See
https://stackoverflow.com/questions/72078610/getting-lyrics-from-genius-api-gives-error
johnwmillr/LyricsGenius#190
johnwmillr/LyricsGenius#220
The conclusion from these is (unhappily) not to use LyricsGenius.

The text was updated successfully, but these errors were encountered:

adamlporter · 2023-03-01T00:07:52Z

The procredures clean_up() and get_all_songs_from_the_album() work. I rewrote Melanie Walsh's download_album_lyrics() procedure to work without accessing LyricsGenius.

def download_album_lyrics(artist, album_name):
    clean_songs = get_all_songs_from_album(artist, album_name)
    
    artist = artist.replace(" ", "-")
    album_name = album_name.replace(' ','-')

    for song in clean_songs:
        song_title = re.sub("[^\w\s]",'',song) #get rid of punctuation
        song_title = song_title.replace(' ','-')
        try:
            url = f"https://genius.com/{artist}-{song_title}-lyrics"
            response = requests.get(url)
            if response.status_code == 200:
                Path(f"{artist}_{album_name}").mkdir(parents=True, exist_ok=True)
                html = response.text
                document = BeautifulSoup(html, "html.parser")
                div = document.find("div", class_=re.compile("^lyrics$|Lyrics__Root"))
                try:
                    lyrics = div.get_text("\n")
                    filen = f"{artist}-{album_name}/{song_title}.txt"
                    with open(filen, 'w') as file:
                        file.write(lyrics)
                    print(f"saving {filen}")
                except AttributeError:
                    print(f"No lyrics found for {song_title}")

            else:
                print(f"problem getting lyrics for {artist} - {song_title}")
                print(f"error code was {response.status_code}")
        except FileNotFoundError:
            print(f"{url} is not found")

I have tested this and is works -- sort of. I was able to download the lyrics for three albums, then the requests.get(url) started throwing FileNotFoundErrors.

I suspect genius.com is tracking IP addresses and starts blacklisting them if they make too many requests (either total or in a specific period of time). Interestingly, even after the download_album_lyrics() stops working, the get_all_songs_from_album() continues to work.

adamlporter · 2023-03-01T00:08:05Z

It might be possible to replace genius.com with lyrics.com. The latter site has an easier HTML structure that makes it possible to extract lyrics text without using a regular expression. (This may be similar to what genius.com used when Melanie first wrote the textbook.)

response = requests.get("https://www.lyrics.com/lyric/8237688")
html = response.text
document = BeautifulSoup(html, "html.parser")
print(document.find('pre').text)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on page /04-Data-Collection/08-Collect-Genius-Lyrics.html #36

Issue on page /04-Data-Collection/08-Collect-Genius-Lyrics.html #36

adamlporter commented Feb 24, 2023

adamlporter commented Mar 1, 2023 •

edited

Loading

adamlporter commented Mar 1, 2023

Issue on page /04-Data-Collection/08-Collect-Genius-Lyrics.html #36

Issue on page /04-Data-Collection/08-Collect-Genius-Lyrics.html #36

Comments

adamlporter commented Feb 24, 2023

adamlporter commented Mar 1, 2023 • edited Loading

adamlporter commented Mar 1, 2023

adamlporter commented Mar 1, 2023 •

edited

Loading