Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts at seemingly random moments #121

Closed
Arsanian opened this issue Nov 7, 2019 · 12 comments · Fixed by #162
Closed

Timeouts at seemingly random moments #121

Arsanian opened this issue Nov 7, 2019 · 12 comments · Fixed by #162
Labels

Comments

@Arsanian
Copy link

Arsanian commented Nov 7, 2019

I'm trying to download a huge number of lyrics for a university project. I have files that represent a genre which contain 50 artists I want to download all lyrics from.

So I wrote a python script that scans the folder and reads the lists one by one, trying to download the lyrics for every artist in these lists.

Sometimes the following happens:

Timeout raised and caught:
HTTPSConnectionPool(host='api.genius.com', port=443): Read timed out. (read timeout=5)
Traceback (most recent call last):
File "lyricsapi.py", line 54, in
artist = api.search_artist(a.strip(), max_songs=max_songs, sort="title")
File "/home/duke/anaconda3/envs/dynamusic/lib/python3.7/site-packages/lyricsgenius/api.py", line 356, in search_artist
song = Song(info, lyrics)
File "/home/duke/anaconda3/envs/dynamusic/lib/python3.7/site-packages/lyricsgenius/song.py", line 26, in init
self._body = json_dict['song'] if 'song' in json_dict else json_dict
TypeError: argument of type 'NoneType' is not iterable

This error happens pretty randomly, sometimes after 50 texts, sometimes after 600. Earlier today it happened after downloading 113 texts by Eminem, but in the next try it managed to download all 490 of his songs, just to fail after a few songs from the next artist in line.

This also happened, when I ran the script on my server, which has a separate internet connection.

Version info

  • Package version 1.7.0
  • OS: Ubuntu 19.10 (also happened on a 18.04 machine)
@johnwmillr johnwmillr added the bug label Nov 7, 2019
@mxdillon
Copy link

I'm facing the same issue

@GiorgioGhisotti
Copy link

A workaround for this is to use a try...except block and place the request in a while loop

artists = []
while True:
    try:
        artists.append(genius.search_artist(artist, max_songs=10000))
        break
    except:
        pass

This will simply retry the call until it works. I successfully used this to scrape the full discography of 50 artists and I didn't run into any further problems.

@dmlunde
Copy link

dmlunde commented Feb 27, 2020

@Arsanian how did you manage to narrow down the Eminem number of songs to 490?

@danielhorizon
Copy link

I've tried the above and am still getting a timeout..

"HTTPSConnectionPool(host='api.genius.com', port=443): Read timed out. (read timeout=5)"

Any suggestions? I've tried using a timeout as well (for 60 seconds) and tried the while() and a try/catch.

@ArinkB
Copy link

ArinkB commented Oct 6, 2020

I am also having this same issue, my loop is pulling lyrics based on the artist name and song title. then appending that to a list. I have a try and except and the error still pops up. I also have time.sleep(15) just in case.
The code can run anywhere from 30min - 5hours. It requires a lot of time monitoring.

@allerter
Copy link
Collaborator

allerter commented Oct 6, 2020

@ArinkB, could you please provide the following info so we can re-create and debug your issue:

  • the version of LyricsGenius
  • your traceback
  • a minimal working script so that we can re-create the error.

@ArinkB
Copy link

ArinkB commented Oct 6, 2020

@ArinkB, could you please provide the following info so we can re-create and debug your issue:

  • the version of LyricsGenius
  • your traceback
  • a minimal working script so that we can re-create the error.

sure, the dataframe:
image

lyrics = []

def get_lyrics(): #no arguments needed
    while len(lyrics) != len(end_df): 
        genius = lyricsgenius.Genius("API KEY") # call to lyricsgenius
        for track in end_df.values: 
            song = genius.search_song(track[2], track[0])
            try:    
                lyrics.append(song.lyrics) 
            except:
                lyrics.append(np.NAN) 
        time.sleep(40)

The error:
D:\Anaconda\lib\site-packages\lyricsgenius\api\base.py in make_request(self, path, method, params, public_api, **kwargs)
58 except Timeout as e:
59 error = "Request timed out:\n{e}".format(e=e)
---> 60 raise Timeout(error)
61 except HTTPError as e:
62 error = str(e)

Timeout: Request timed out:
HTTPSConnectionPool(host='api.genius.com', port=443): Read timed out. (read timeout=5)

@allerter
Copy link
Collaborator

allerter commented Oct 6, 2020

@ArinkB, thanks for providing the information. Although this issue is probably a valid issue, I don't think your script's primary issue is the one with the Timeout. I tested Spotify's Viral 50 songs using your script and here are a couple of things that you could improve:

from requests.exceptions import Timeout

lyrics = []


def get_lyrics():
    # while len(lyrics) != len(end_df): #1
    genius = lyricsgenius.Genius(token)
    genius.timeout = 15
    genius.sleep_time = 40  # 2
    # or: Genius(token, timeout=15, sleep_time=40)
    for track in end_df.values:
        retries = 0
        while retries < 3:
            try:
                song = genius.search_song(track[2], track[0])
            except Timeout as e:
                retries += 1
                continue
            if song is not None:
                lyrics.append(song.lyrics)
            else:
                lyrics.append(np.NAN)
            break
  1. This will result in an infinite loop since some songs can't be found, and there's no need for it in the first place.
  2. With the genius.sleep_time attribute, there's no need for time.sleep(40) anymore. Also, I don't think there's a need for a 40-sec sleep from the API's end. When I tested your script, I removed the time.sleep(40) line and everything worked fine.

Now your script will search for the songs and in case of timeouts, your script will retry the search three times before moving on to the next song (this should probably be a feature, @johnwmillr).

@ArinkB
Copy link

ArinkB commented Oct 6, 2020

@allerter Thank you! I appreciate your help and insight. It has been pulling for 3 hours now and no issues so far.

@NIkitabala
Copy link

@ArinkB Hi, can you show me, how exactly do you use your script? I'm trying to use this solution, but I'm still getting an error.

@ArinkB
Copy link

ArinkB commented Nov 2, 2020

@NIkitabala sure,
this is the notebook I used it in, I modified it slightly because my original project plan didn't work out at the time:
https://github.com/ArinkB/Predicting-Song-Skips/blob/master/1_Data%20Acquisition.ipynb

@allerter
Copy link
Collaborator

allerter commented Nov 2, 2020

Based on this comment that I posted on #168, I think these random timeout errors will be solved by #162. We'll see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants