Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins.youtube.py: 404 Intermittent and Repeatable (and related, but not duplicate of 3724 of other 404 issues) #3795

Closed
1 of 2 tasks
frisch1 opened this issue Jun 16, 2021 · 13 comments · Fixed by #3797
Closed
1 of 2 tasks
Labels
plugin issue A Plugin does not work correctly

Comments

@frisch1
Copy link

frisch1 commented Jun 16, 2021

Plugin Issue

  • This is a plugin issue and I have read the contribution guidelines.
  • I am using the latest development version from the master branch. [I apologize, I'm on latest stable, but plugin hasn't been updated in 28 days and we have that installed]

Description

There have been issues over the past couple of months with YouTube reporting 404s. It's been tricky because it's not always reproducable. We had no problems last two days, and today, half our captures have this problem.

The symptom is no different than 3724 and other previously reported items. The error is the same. Using the simplest execution.

ubuntu@ip-XXX-XXX-XXX-XXX:~$ streamlink https://www.youtube.com/watch?v=YertdWavZdk
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=YertdWavZdk
[cli][error] Unable to open URL: https://youtube.com/get_video_info (404 Client Error: Not Found for url: https://www.youtube.com/get_video_info?video_id=YertdWavZdk&html5=1&el=detailpage)

However, what we are seeing is similar to what we couldn't narrow down when this occurred a month ago. If you wait 3-5 minutes, or do from different machines, the same call can work, or it can report an error.

From the same machine:

ubuntu@ip-XXX-XXX-XXX-XXX:~$ streamlink https://www.youtube.com/watch?v=YertdWavZdk
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=YertdWavZdk
[cli][error] Unable to open URL: https://youtube.com/get_video_info (404 Client Error: Not Found for url: https://www.youtube.com/get_video_info?video_id=YertdWavZdk&html5=1&el=detailpage)
[Note: Paused 10 minutes manually and tried again]
ubuntu@ip-XXX-XXX-XXX-XXX:~$ streamlink https://www.youtube.com/watch?v=YertdWavZdk
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=YertdWavZdk
Available streams: audio_mp4a, audio_opus, 144p (worst), 240p, 360p, 480p (best)

If you can test from machines with different IP addresses, you can run the same plain command a few times at the same time, and you'll see about half work, half return the 404.

I'm not sure if I'm missing something on the retry or other recommended fixes (e.g. cookies) but regardless of how complex or simple the query is, we see this intermittent vs. consistent behavior.

Thinking out loud / speculation: a partial rollout of something new in the youtube cluster, a potential change being tested. Just plain don't know. Clearly it's not the plug-in since it behaves some of the time on the same call, but some new behavior (or bug) on YouTube?

Reproduction steps / Explicit stream URLs to test

  1. Run this command on multiple machines (or wait a few minutes in between each call, either works: streamlink https://www.youtube.com/watch?v=YertdWavZdk - note you can use any youtube link that was a live stream originally, whether it's not yet live, currently live, or after the stream finishes. URL, have not tested with previously recorded only

Log output

When the command works / doesn't 404:

ubuntu@ip-XXX-XXX-XXX-XXX:~$ streamlink --loglevel debug https://www.youtube.com/watch?v=YertdWavZdk
[cli][debug] OS:         Linux-5.4.0-1049-aws-x86_64-with-Ubuntu-18.04-bionic
[cli][debug] Python:     3.6.9
[cli][debug] Streamlink: 2.1.2
[cli][debug] Requests(2.22.0), Socks(1.6.5), Websocket(0.58.0)
[cli][debug] Arguments:
[cli][debug]  url=https://www.youtube.com/watch?v=YertdWavZdk
[cli][debug]  --loglevel=debug
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=YertdWavZdk
[plugins.youtube][debug] Video ID from URL
[plugins.youtube][debug] Using video ID: YertdWavZdk
[plugins.youtube][debug] get_video_info - 1: Found data
[plugins.youtube][debug] MuxedStream: v 135 a 251 = 480p
[plugins.youtube][debug] MuxedStream: v 133 a 251 = 240p
[plugins.youtube][debug] MuxedStream: v 160 a 251 = 144p
Available streams: audio_mp4a, audio_opus, 144p (worst), 240p, 360p, 480p (best)

When the command 404s:

ubuntu@ip-XXX-XXX-XXX-XXX:~$ streamlink --loglevel debug https://www.youtube.com/watch?v=YertdWavZdk
[cli][debug] OS:         Linux-5.4.0-1049-aws-x86_64-with-Ubuntu-18.04-bionic
[cli][debug] Python:     3.6.9
[cli][debug] Streamlink: 2.1.2
[cli][debug] Requests(2.22.0), Socks(1.6.5), Websocket(0.58.0)
[cli][debug] Arguments:
[cli][debug]  url=https://www.youtube.com/watch?v=YertdWavZdk
[cli][debug]  --loglevel=debug
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=YertdWavZdk
[plugins.youtube][debug] Video ID from URL
[plugins.youtube][debug] Using video ID: YertdWavZdk
error: Unable to open URL: https://youtube.com/get_video_info (404 Client Error: Not Found for url: https://www.youtube.com/get_video_info?video_id=YertdWavZdk&html5=1&el=detailpage)

Note the two calls above were run on the same machine ~3 minutes apart.

Additional comments, etc.

I've seen and tried the other suggestions: manually adding a cookie, playing with retries. As best I can tell, this is something on YouTube's end that's now been appearing intermittently for at least 2 months and, in our daily use, has not once been affecting 100% of our calls, which are all structured identically other than the youtube link. Either everything works fine, or half the YouTube calls fail when this appears. Never 100%. In going through our internal Jira, it's always been an intermittent thing.

Love Streamlink? Please consider supporting our collective. Thanks!

@frisch1 frisch1 added the plugin issue A Plugin does not work correctly label Jun 16, 2021
@frisch1
Copy link
Author

frisch1 commented Jun 16, 2021

Apologies, one key piece. This is NOT a 429 or similar block. We've seen this on First calls to a URL. Also, we can make an identical call via ffmpeg or youtube-dl within seconds to the same URL and neither reports a 429 / block / other countermeasures. Not clear if one or the other or both are compensating for this behind the scenes, but can confirm 100% of the time if we see this intermittent 404, we have not yet encountered any block or restriction on running the exact same youtube URL on the same machine via ytdl / ffmpeg or even manually scraping a stream payload.

@bastimeyer
Copy link
Member

@back-to Could the private /get_video_info API call be skipped maybe? If I'm not mistaken, the necessary data should already be available in the ytInitialData JSON object which is embedded on each video page (I think, not sure). The plugin currently only requests self.url and reads the ytInitialData data (after returning from the consent dialog) if it can't figure out the video ID from the input URL. That would need to get changed if we need to avoid the private API call.

@frisch1
Copy link
Author

frisch1 commented Jun 17, 2021

Apologies, I should have checked the other repos first. 18 days ago, there was an update on the youtube-dl plug-in with the following notes: "youtube.py [youtube] Fix get_video_info request (closes #29086, closes #29165)

This seems to be the age gate, and it appears the plug in now looks for the message about sign in or age gating on get_video_info and falls back to extracting it from https://youtube.googleapis.com/v/ (function starts around line 1475 - direct link: https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/youtube.py#L1475).

I'd resolve and check in the fix if I could, but afraid I'm a php person and barely competent in writing python (reading I can fake, so the reference on line 1476 appears to be how they resolved, but I could be way off... )...

Does this look right?

@bastimeyer
Copy link
Member

I've been working on a complete rewrite of the youtube plugin over the past couple of hours and will submit a PR soon, maybe later today. I will have to split this into multiple commits for reviewing purposes, but I can share a branch on my fork with my current/finished results, so you guys can test it.

As said, the private API calls seem to be unnecessary as it looks like the data is always embedded in the video page via ytInitialPlayerResponse (I was looking at the wrong variable name earlier).

To be honest, I haven't really touched the Youtube plugin since I've been working on Streamlink, so I don't know what the edge cases here are as I'm also unfamiliar with Youtube's API and their player data/code, but I haven't found any video/stream types so far which didn't work with my rewrite. Apart from the protected streams of course, which are currently also not supported, but I can't change that.

@bastimeyer
Copy link
Member

You can check my rewrite here:
master...bastimeyer:plugins/youtube/rewrite

Only use this for testing, as it's the first iteration and will probably change again. I will submit a pull request once I've rebased my changes, but that will be later today. I've linked the changes via the branch name and the not commit ID, so I don't have to update my post when something gets changed.

Install via pip (install it in a virtualenv)
https://streamlink.github.io/latest/install.html#pypi-package-and-source-code

$ pip install -U git+https://github.com/bastimeyer/streamlink.git@plugins/youtube/rewrite

Or sideload from here (use Streamlink's latest development version from the master branch)
https://streamlink.github.io/latest/cli.html#sideloading-plugins

https://raw.githubusercontent.com/bastimeyer/streamlink/plugins/youtube/rewrite/src/streamlink/plugins/youtube.py

@back-to
Copy link
Collaborator

back-to commented Jun 17, 2021

@bastimeyer

streamlink "https://www.youtube.com/channel/UCM39V4aT21lAebPlJToSN2Q/live"

status needs an update

https://github.com/bastimeyer/streamlink/blob/1d1fc45703fc30173d1ed62eef1a2415d386507c/src/streamlink/plugins/youtube.py#L120-L123

error: Unable to validate result: Unable to validate key 'playabilityStatus': Unable to validate key 'status': 'LIVE_STREAM_OFFLINE' does not equal 'OK' or Unable to validate key 'status': 'LIVE_STREAM_OFFLINE' does not equal 'ERROR' or Unable to validate key 'status': 'LIVE_STREAM_OFFLINE' does not equal ''

https://github.com/bastimeyer/streamlink/blob/1d1fc45703fc30173d1ed62eef1a2415d386507c/src/streamlink/plugins/youtube.py#L244

also if status != "OK": might be better`


urls like https://www.youtube.com/c/euronews are currently not supported, not sure if you want to drop it.

@bastimeyer
Copy link
Member

bastimeyer commented Jun 17, 2021

status needs an update

Thanks, done.

Also improved the embedded URL handling and fixed the data of the consent dialog which was html-escaped and could redirect to invalid URLs when more than one query params was set (&).

Diff here:
https://github.com/bastimeyer/streamlink/compare/1d1fc45..3d2848ae

https://www.youtube.com/c/euronews

Looks like this could be dropped. It's not a video URL and if a channel is live, their live stream can be accessed via /live at the end of the URL, eg. https://www.youtube.com/c/euronews/live.

I'll take a look at this later and then submit a PR.

@frisch1
Copy link
Author

frisch1 commented Jun 17, 2021

Thank you, acknowledging receipt. We capture about 150 live events per day (government press conferences and hearings... very exciting!). We're firing up a parallel process to run the new branch to test not yet live/live/live stream but complete/pre-recorded. Thank you!

@bastimeyer
Copy link
Member

btw, PR here: #3797

@sou7611
Copy link

sou7611 commented Jun 18, 2021

Is this going into a release anytime soon? would rather not have to wrestle with ripping out an installed version and replacing with a development one if I don't have to.

YT is completely dead for me with Streamlink. It won't parse a single URL.

The issue described here is exactly what I am experiencing.

@bastimeyer
Copy link
Member

My pull request has just been submitted 7 hours ago. Not sure what you're expecting here...
The changes first have to be reviewed before they can be merged into master and after that there will be a 2.2.0 release. What I have shared in this thread before submitting the PR was to get early feedback, as it's a complete rewrite of the plugin and not just a small bugfix. This has the potential to break certain video types which were working before and since no official YT API is being used, this needs to be validated first.

@sou7611
Copy link

sou7611 commented Jun 18, 2021 via email

@bastimeyer
Copy link
Member

Already explained here:
#3795 (comment)
As well as in the docs:
https://streamlink.github.io/latest/install.html#pypi-package-and-source-code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin issue A Plugin does not work correctly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants