Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YTMusic responses are unreliable for get_library_songs and get_playlist #52

Closed
czifumasa opened this issue Aug 1, 2020 · 15 comments · Fixed by #53
Closed

YTMusic responses are unreliable for get_library_songs and get_playlist #52

czifumasa opened this issue Aug 1, 2020 · 15 comments · Fixed by #53
Labels
yt-update A server-side change caused this issue

Comments

@czifumasa
Copy link
Contributor

In my project I am using ytmusicapi to fetch full content of the user's library and save it in csv file. Then I can use these csv files to compare changes in my library or find songs removed from youtube etc.

Unfortunately currently it's very unreliable.
For example: In my library currently I have 2040 songs. To get the library songs I call the api with high limit:

api = YTMusic('headers_auth.json')
library_songs = api.get_library_songs(50000)

Everytime I send that request, the number of returned songs is different, it varies between 1800-2035 songs.
I know that the problem is in YTM itself, because I observed the same problem on the web client and it hasn't been fixed on their side for months. YTM should return library songs in chunks containing 25 items, but very often it's less than 25.
In the end, on average, at least 10% of my library is missing, making my scripts kinda useless. The same problem occurs for get_playlist method.

@sigma67
Copy link
Owner

sigma67 commented Aug 3, 2020

I've noticed this issue as well since tests were failing randomly. I attempted to fix it in 90bc753. It might not be a fix if the API result skips songs randomly, in that case those would be missing from the response. In your experience, do invalid responses return songs in the correct order without skips? Or are songs randomly missing in between?

To be honest I don't really like the option of implementing retry logic for a server-side issue, as it might become obsolete in the near future. I suggest we wait another month to see if the issue gets resolved by YouTube. If this is not the case, I'll go ahead and merge #53.

@czifumasa
Copy link
Contributor Author

Yes, from my observation, API skips songs randomly and fix from 90bc753 is not enough.

To be honest I don't really like the option of implementing retry logic for a server-side issue, as it might become obsolete in the near future. I suggest we wait another month to see if the issue gets resolved by YouTube. If this is not the case, I'll go ahead and merge #53.

That's completely fine for me. I know that "retry" solution is not very elegant, but unfortunately the problem exists since I moved from GPM so it's been at least a few months already. I kinda lost my patience and decided to workaround it with my PR. Although I agree with you, that proper fix should be on Youtube's server, so let's give them one more month.

@akraus53
Copy link
Contributor

I think this is happening to getHistory() as well!

@xplorr
Copy link

xplorr commented Aug 19, 2020

I use the ytmusic.get_library_upload_songs(50000) call and did not notice any problems so far. I have about 23000 songs in my library and they are all returned except 2. Have to figure out why 2 are missing.

@sigma67 sigma67 added the yt-update A server-side change caused this issue label Aug 23, 2020
@sigma67
Copy link
Owner

sigma67 commented Aug 25, 2020

It's been almost a month with no updates from YouTube's side. I suggest we merge this PR, however I want to request two changes if possible.

  1. the PR needs to be rebased on top of current master
  2. I'd like to make the retry behavior optional

The reasoning for 2) is that the changes from this PR doubled the average execution time for me (based on test_get_library_songs - previously 3-4s, now 7-8s). I suggest we introduce an optional parameter validate_responses=False for get_library_songs. If False, the current faulty behavior should occur by calling get_continuations. If True, get_validated_continuations should be used.

The default should be False imo, since the objective of the API is to replicate the web client as closely as possible, which also exhibits this odd behavior. Therefore, it would be an optional feature of ytmusicapi, which validates responses for the user to ensure the response is correct. What do you think?

@sigma67
Copy link
Owner

sigma67 commented Aug 25, 2020

In the original issue, you also noted that get_playlist has the same issue, but didn't end up including it in your PR. I just did some tests and it seems to behave consistently (i.e. no varying track counts). Am I correct in assuming that only get_library_songs is affected by this issue for now?

@czifumasa
Copy link
Contributor Author

Am I correct in assuming that only get_library_songs is affected by this issue for now?

Yes, indeed, it seems that get_library_songs has been fixed. Today, I've made some tests for both methods. I wasn't able to reproduce the problem for get_playlist anymore. At first I haven't include it in my PR, because I wasn't sure If you will approve the general concept so I created a fix for only one method. Luckily it's no longer needed.

Unfortunately for get_library_songs problem still exists. I reproduced it in every test I've made.
Regarding your proposed changes, I agree, retry behaviour should be optional. I'll update my PR next weekend when I will have a bit more time.

@czifumasa
Copy link
Contributor Author

I updated my PR(#53) with requested changes, please take a look.

@sigma67
Copy link
Owner

sigma67 commented Aug 31, 2020

Thanks for updating the PR! I did some rather extensive testing with the changes and ran get_library_songs(300, validate_responses=True) a few times. I noticed that retries only rarely managed to produce the full 25 results. If they did, it was always after the first retry. Unless you have significantly different results, I propose reducing the max_retries to 1 to improve performance.

(edit: I did some more tests and found 1 or 2 continuations where it worked after 2 or 3 tries (after >15 function calls with 11 continuations each). I believe the performance penalty isn't worth the additional 2 retries).

Check this log:

25
retries: 0
24
24
24
24
retries: 3
24
24
23
24
retries: 3
25
retries: 0
25
retries: 0
25
retries: 0
22
22
22
22
retries: 3
25
retries: 0
19
25
retries: 1
24
24
24
24
retries: 3
22
25
retries: 1
25
retries: 0
24
retries: 0

Here are the debug changes in utils.py l.104:

print(len(parsed_object['parsed']))
    while not validate_func(parsed_object) and retry_counter < max_retries:
        response = request_func(request_additional_params)
        parsed_object = parse_func(response)
        print(len(parsed_object['parsed']))
        retry_counter += 1
    print("retries: " + str(retry_counter))

@sigma67
Copy link
Owner

sigma67 commented Aug 31, 2020

I also found some isolated instances where the key contents is missing completely from the continuation response, causing an error.

We should catch that in both get_parsed_continuation_items and get_continuations. If you want you can add these changes as well, or I can do it.

@sigma67
Copy link
Owner

sigma67 commented Sep 1, 2020

After some more tests I decided to leave the retries at 3, as the number of retries to success seems to vary a lot depending on time of day and account used.

It also seems that the API "warms up" to your requests. For example, if you repeat the same call (get_library_songs(300)) multiple times, subsequent calls have significantly fewer missing items and take less retries. This effect subsides after a while, so I suspect that YouTube's API uses some form of caching here.

Will merge this PR shortly with the bugfix mentioned in the previous comment.

sigma67 pushed a commit that referenced this issue Sep 1, 2020
Always keep response with most results for retries
@czifumasa
Copy link
Contributor Author

Regarding the error with content key, it never occurred for the accounts I tested. I am glad you found it and fixed it.

And regarding max_retries param, I observed exactly the same behaviour that you described, the first time when I use get_library_songs is usually the worst and requires many retries to get correct results. Next calls are much faster, but some continuations still require 1 or 2 retries.

I set the max_retries to 3, because in my tests it returned the most consistent results. Decreasing it to 1 or 2 caused, that sometimes response still had missing songs. Increasing to values higher than 3 never worked. If YTM still sends response with less than 25 songs after more than 3 retries, it probably means that missing songs are permanently unavailable for some reason.

@sigma67
Copy link
Owner

sigma67 commented Sep 29, 2022

Hi, I'm curious. Are you still using this functionality? I feel like the API has gotten a lot more reliable and the code to achieve this is pretty messy. If it's not being used I'd rather remove it.

@xplorr
Copy link

xplorr commented Oct 4, 2022

Still use this in my project

@sigma67
Copy link
Owner

sigma67 commented Oct 4, 2022

Alright, good to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
yt-update A server-side change caused this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants