Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #1060 #1067

Merged
merged 29 commits into from Aug 3, 2021
Merged

Fix #1060 #1067

merged 29 commits into from Aug 3, 2021

Conversation

Zeecka
Copy link
Contributor

@Zeecka Zeecka commented Jul 23, 2021

Here is a proposition for #1060 . The commit name says "#1342" because I was thinking about spotDL/spotify-downloader#1342 (which is the same issue) while fixing the bug.

@glubsy
Copy link
Contributor

glubsy commented Jul 23, 2021

Why do you change "player_response" to something else? Is this really necessary? This will break downstream projects.

@Zeecka
Copy link
Contributor Author

Zeecka commented Jul 23, 2021

I did so because the old endpoint doesn't work anymore and the JSON output of the new endpoint has a new key name. Which is responseContext instead of player_response. The fix didn't works without renaming it (you can try it).
I think it would be better if someone with better understanding of the codebase can edit the fix as needed before merging. I do not feel confident putting the web request inside of in __main__.py [edit] Code looks better now.

Quick check:

Old field player_response

curl "https://www.youtube.com/youtubei/v1/player" -H 'X-Goog-Api-Key: AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -H "Content-Type: application/json" -X POST -d '{"context":{"client":{"clientName":"ANDROID","clientVersion":"16.05"}},"videoId":"9bZkp7q19f0"}' | jq .player_response

New field player_response

curl "https://www.youtube.com/youtubei/v1/player" -H 'X-Goog-Api-Key: AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -H "Content-Type: application/json" -X POST -d '{"context":{"client":{"clientName":"ANDROID","clientVersion":"16.05"}},"videoId":"9bZkp7q19f0"}' | jq .responseContext

@Zeecka
Copy link
Contributor Author

Zeecka commented Jul 23, 2021

It turns out that local class method InnerTube.player() already had the endpoint with predefined parameters. I cleared the custom code.

@Zeecka
Copy link
Contributor Author

Zeecka commented Jul 23, 2021

@glubsy I got your point. I decided to alterate the new endpoint output to ensure backward compatibility.

Copy link

@TeddyKahwaji TeddyKahwaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@tfdahlin
Copy link
Collaborator

I've been wanting to swap over to using innertube exclusively for months now (see: #1022, #994), however I keep running into the same issue which is why I haven't made the jump yet, and that's still applicable here.

Innertube can't seem to bypass any age-restrictions on videos. If you test this code against this video, you'll see the problem

@tfdahlin
Copy link
Collaborator

I made one minor change to your code, and added some stuff to innertube. Doesn't seem to fix the problem for age-gated videos though.

@tfdahlin tfdahlin linked an issue Jul 24, 2021 that may be closed by this pull request
…cess_token as Bearer Authorization header. Need to investigate more.
@tfdahlin
Copy link
Collaborator

I need to sleep -- the oauth flow almost works. It seems to correctly get an access_token and refresh_token, but the requests are failing with a 400 error when I try to use the token in the Authorization header. If somebody can get that working, I think this may be the best path forward.

I've been using video id DkffTDkSADI to test, with code something like this:

from pytube.innertube import InnerTube

video_id = 'DkffTDkSADI'
i = InnerTube(use_oauth=True, allow_cache=True)
i.player(video_id)

@tfdahlin tfdahlin linked an issue Jul 24, 2021 that may be closed by this pull request
@Zeecka
Copy link
Contributor Author

Zeecka commented Jul 24, 2021

It looks like other project got the same issue (see ytdl-org/youtube-dl#29086 ) and didn't found workaround yet. Nice work for the oAuth. The 400 error says :

{
  "error": {
    "code": 400,
    "message": "The API Key and the authentication credential are from different projects.",
    "errors": [
      {
        "message": "The API Key and the authentication credential are from different projects.",
        "domain": "global",
        "reason": "badRequest"
      }
    ],
    "status": "INVALID_ARGUMENT"
  }
}

Note that your oAuth implementation works with a non restricted video. I don't think there is a "good" workaround to bypass that. We can either ask the user to provide an account with API key, or register an account for pytube that might be ban.

Since the whole project is broken for now, maybe should you release the fix (without oAuth ?) and open a new issue for age restricted video ?

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 24, 2021

Ty for providing the output of that 400 error, it was getting too late for me to investigate further last night, but it made it much easier to debug this morning. I got OAuth working, just need to incorporate it into the main part of the library now.

@tfdahlin
Copy link
Collaborator

I'm working on carving out some of the old code now that we're accessing very differently-formatted data, it's being finnicky about though. Hopefully will have an update this evening.

@tfdahlin
Copy link
Collaborator

Running into a 403 error on age-restricted videos that still needs solving. Normal videos seem to work though currently.

@Zeecka
Copy link
Contributor Author

Zeecka commented Jul 25, 2021

Your fix works on my computer. However, I encounter a 403 error for a video that has no age restriction: _Tr1lrd4co0, dX2497jB-Ag, ZZxOdjay8Kg, GopLL_S8wzI. These video are longer than 20min however it seems to work on 8h video such as RDfjXj5EGqI. Here is my output:

from pytube import YouTube
yt = YouTube('http://youtube.com/watch?v=_Tr1lrd4co0')
yt.streams.filter(progressive=True, file_extension='mp4').first().download()

Error:

Traceback (most recent call last):
  File "/tmp/pytube/test.py", line 3, in <module>
    yt.streams.filter(progressive=True, file_extension='mp4').first().download()
  File "/tmp/pytube/pytube/streams.py", line 258, in download
    for chunk in request.stream(
  File "/tmp/pytube/pytube/request.py", line 157, in stream
    response = _execute_request(
  File "/tmp/pytube/pytube/request.py", line 37, in _execute_request
    return urlopen(request, timeout=timeout)  # nosec
  File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@tfdahlin tfdahlin linked an issue Jul 28, 2021 that may be closed by this pull request
@tfdahlin tfdahlin linked an issue Jul 28, 2021 that may be closed by this pull request
@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 28, 2021

I've set this branch to default to the android client, in spite of the fact that we do lose some stream types as a result. Worth testing to see if there are any further 403 errors, but in my very brief testing I couldn't produce any. I've also bundled some miscellaneous bugfixes for other issues and linked them into this.

W/r/t signature calculation, I'm seeing something a little unusual in how signatures are being calculated:

Before: AAOSAOq0QJ8wRQIhAMy92pITfVpcKrf_vKGx9_1zKSvSYY1Aq3ZRcevK05EFAiBS-n0vzH2JId4lARhvbaNFL_aa7I7iJEVp9hym1HpE-Q==
After:  V=Q-EpH1myh9pFEJi7I7aa_L=NabvhRAl4dIJ2Hzv0n-SBiAFE50KvecRZ3qA1YYSvSKz1_9xGKv_frKcpVfTIp29yMAhIQRw8JQ0qOASOAA

It looks like the calculation is incomplete, and I'm unsure why that is. When looking at dev tools in Chrome, the signature for the video typically resembles a b64-encoded string, with the == at the end as padding. Our calculations seem to have some equal signs in the middle, which shouldn't really happen. I'll have to look into what's contained in the data, maybe we can calculate this in a different way.

@MinePlayersPE
Copy link

MinePlayersPE commented Jul 29, 2021

Hopefully not off-topic, since I saw that age-gating is a problem in this PR I would like to point to the new bypass method: yt-dlp/yt-dlp#574 (comment) which should have the same success rate as the old method

@Zeecka
Copy link
Contributor Author

Zeecka commented Jul 29, 2021

Worth testing to see if there are any further 403 errors, but in my very brief testing I couldn't produce any.

Same for me, nice job 👍🏻 .

Do you have a video sample where the signature is not already provided ?

@tfdahlin
Copy link
Collaborator

Hopefully not off-topic, since I saw that age-gating is a problem in this PR I would like to point to the new bypass method: yt-dlp/yt-dlp#574 (comment) which should have the same success rate as the old method

Definitely not off-topic, thanks for the tip! I've incorporated that bypass into this patch. Pytube will default to the normal android client. If it fails to retrieve the stream data, it will attempt to fall back to using that bypass instead. The videos that I've been testing against seem to work correctly for me, but would appreciate some additional testing.

I'm going to note once again so it doesn't get lost:
This patch defaults to using the ANDROID client because the signatureCipher isn't being calculated correctly. Because it's using the ANDROID client, some stream types are not available that are available when using the WEB client. A notable stream type that I saw was missing on the video I tested against was itag 22, which is the 720p progressive stream (typically the highest quality progressive stream available on a video). Ideally, the cipher calculation should be fixed so we can default to the WEB client again, but I think that should be pushed into a different issue/PR.


# If we still can't access the video, raise an exception
# (tier 3 age restriction)
if playability_status == 'LOGIN_REQUIRED':
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Tier 3 videos will return an UNPLAYABLE status on embed clients instead (as they're likely just videos with embed disabled by creator)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@GPRicci
Copy link

GPRicci commented Jul 31, 2021

I have been doing some silly tests today with the fixes in this branch, and I ran into an error when trying to retrieve the captions of a video:

Traceback (most recent call last):
  File "C:\Users\gaspa\pytube_tests\dl.py", line 14, in <module>
    c = yt.captions
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 299, in captions
    return pytube.CaptionQuery(self.caption_tracks)
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in caption_tracks
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in <listcomp>
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\captions.py", line 22, in __init__
    self.name = caption_track["name"]["simpleText"]
KeyError: 'simpleText'

I'm not sure if this error arose with this fix.

@tfdahlin tfdahlin linked an issue Jul 31, 2021 that may be closed by this pull request
@tfdahlin
Copy link
Collaborator

tfdahlin commented Aug 1, 2021

I have been doing some silly tests today with the fixes in this branch, and I ran into an error when trying to retrieve the captions of a video:

Traceback (most recent call last):
  File "C:\Users\gaspa\pytube_tests\dl.py", line 14, in <module>
    c = yt.captions
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 299, in captions
    return pytube.CaptionQuery(self.caption_tracks)
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in caption_tracks
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in <listcomp>
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\captions.py", line 22, in __init__
    self.name = caption_track["name"]["simpleText"]
KeyError: 'simpleText'

I'm not sure if this error arose with this fix.

Looking into a fix for this now. I think sometimes caption details return in two different formats, not sure what causes it.

@tfdahlin tfdahlin linked an issue Aug 1, 2021 that may be closed by this pull request
@GPRicci
Copy link

GPRicci commented Aug 1, 2021

I have been doing some silly tests today with the fixes in this branch, and I ran into an error when trying to retrieve the captions of a video:

Traceback (most recent call last):
  File "C:\Users\gaspa\pytube_tests\dl.py", line 14, in <module>
    c = yt.captions
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 299, in captions
    return pytube.CaptionQuery(self.caption_tracks)
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in caption_tracks
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in <listcomp>
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\captions.py", line 22, in __init__
    self.name = caption_track["name"]["simpleText"]
KeyError: 'simpleText'

I'm not sure if this error arose with this fix.

Looking into a fix for this now. I think sometimes caption details return in two different formats, not sure what causes it.

With your commit, I can now download XML captions, however, conversion to SRT fails. By exploring the XML files and reading the xml_caption_to_srt function, I think the problem is that YouTube changed the format.
Here is an example:

<?xml version="1.0" encoding="utf-8" ?><timedtext format="3">
<body>
<p t="2254" d="2872">FRIDAY</p>
<p t="33434" d="1193">IT IS TIME FOR THE BOYS</p>
</body>
</timedtext>

So now, the root element timedtext contains just one child named body. So the iteration should be done for the elements of body and not root. Also, the caption elements have different attribute names: t instead of start and d instead of dur for duration.

@tfdahlin tfdahlin linked an issue Aug 3, 2021 that may be closed by this pull request
@tfdahlin
Copy link
Collaborator

tfdahlin commented Aug 3, 2021

I have been doing some silly tests today with the fixes in this branch, and I ran into an error when trying to retrieve the captions of a video:

Traceback (most recent call last):
  File "C:\Users\gaspa\pytube_tests\dl.py", line 14, in <module>
    c = yt.captions
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 299, in captions
    return pytube.CaptionQuery(self.caption_tracks)
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in caption_tracks
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\__main__.py", line 291, in <listcomp>
    return [pytube.Caption(track) for track in raw_tracks]
  File "C:\Users\gaspa\AppData\Local\Programs\Python\Python39\lib\site-packages\pytube\captions.py", line 22, in __init__
    self.name = caption_track["name"]["simpleText"]
KeyError: 'simpleText'

I'm not sure if this error arose with this fix.

Looking into a fix for this now. I think sometimes caption details return in two different formats, not sure what causes it.

With your commit, I can now download XML captions, however, conversion to SRT fails. By exploring the XML files and reading the xml_caption_to_srt function, I think the problem is that YouTube changed the format.
Here is an example:

<?xml version="1.0" encoding="utf-8" ?><timedtext format="3">
<body>
<p t="2254" d="2872">FRIDAY</p>
<p t="33434" d="1193">IT IS TIME FOR THE BOYS</p>
</body>
</timedtext>

So now, the root element timedtext contains just one child named body. So the iteration should be done for the elements of body and not root. Also, the caption elements have different attribute names: t instead of start and d instead of dur for duration.

It looks like this is inconsistent between auto-generated and user-provided captions. I'll work on a fix for captions as a separate PR, rather than trying to fix that in this one.

@tfdahlin
Copy link
Collaborator

tfdahlin commented Aug 3, 2021

Now that tests have been fixed, I'm going to merge this and get pypi updated.

@tfdahlin tfdahlin merged commit fc9aec5 into pytube:master Aug 3, 2021
@tfdahlin
Copy link
Collaborator

tfdahlin commented Aug 3, 2021

Pypi should be updated. python -m pip install --upgrade pytube

JNYH added a commit to JNYH/pytube that referenced this pull request May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants