Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed playback issue: When embed playback is blocked, pytube would detect it as age restriction #1621

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

piltom
Copy link

@piltom piltom commented May 11, 2023

Hello. This morning while running some test downloads I ran into an issue with the following video: https://www.youtube.com/watch?v=e8RcQoGY4OE

Pytube would detect it as Age restricted even though it is not. After some digging I found it was due to the video having embed playback blocked.

The current solution in this PR will check the if reason for no playability is present in the newly introduced map and obtain which player to use to work around it.

This solution could be problematic in the future for two reasons:

  • Are the reasons for no playability always in english?
  • Is it often that they change the text? If so, maybe we would need to interpret the reason with searching some string in it.. (like "embed" or "website")

Here are some links for you to test:

  1. The link that is not allowed in embedded: https://www.youtube.com/watch?v=e8RcQoGY4OE which should now be downloaded correctly (previously would throw age restriction exception)
  2. An Eminem song that is age restricted and will properly throw the exception: https://www.youtube.com/watch?v=mQvteoFiMlg
  3. Pick any other video from youtube and it should still work as originally

@piltom
Copy link
Author

piltom commented May 11, 2023

Here is some minimal code to reproduce the issue before the patch:

from pytube import YouTube as PyYouTube

# Video that is not allowed as embedded
yt = PyYouTube("https://www.youtube.com/watch?v=e8RcQoGY4OE")
yt.check_availability()
stream = yt.streams.first()
filename = stream.download("./")

# normal video
yt = PyYouTube("https://www.youtube.com/watch?v=wMRbSKjEtrc")
yt.check_availability()
stream = yt.streams.first()
filename = stream.download("./")

# age restricted video
yt = PyYouTube("https://www.youtube.com/watch?v=mQvteoFiMlg")
yt.check_availability()
stream = yt.streams.first()
filename = stream.download("./")

@pbxforce
Copy link

pbxforce commented May 13, 2023

I tried this commit. It worked for some videos but not for all. I still got error of age restricted when they are not restricted. Check this url: "https://www.youtube.com/shorts/DYVUwiB6fLM"
and this one: "https://www.youtube.com/watch?v=QBvzEcXSUjg"

both URLs are not age restricted and no embedding blocked but still giving exception of AgeRestrictedError

@piltom
Copy link
Author

piltom commented May 13, 2023

I tried this commit. It worked for some videos but not for all. I still got error of age restricted when they are not restricted. Check this url: "https://www.youtube.com/shorts/DYVUwiB6fLM" and this one: "https://www.youtube.com/watch?v=QBvzEcXSUjg"

both URLs are not age restricted and no embedding blocked but still giving exception of AgeRestrictedError

Hey, there is also this PR #1619 that seems to change the way data is obtained at the start, I haven't tried it but you can give it a shot.

My solution was for this particular case of embed blocking, but it seems there's more cases in which this happens so it should be generalized (I don't like the way I structured this solution particularly).

@pbxforce
Copy link

I tried this commit. It worked for some videos but not for all. I still got error of age restricted when they are not restricted. Check this url: "https://www.youtube.com/shorts/DYVUwiB6fLM" and this one: "https://www.youtube.com/watch?v=QBvzEcXSUjg"
both URLs are not age restricted and no embedding blocked but still giving exception of AgeRestrictedError

Hey, there is also this PR #1619 that seems to change the way data is obtained at the start, I haven't tried it but you can give it a shot.

My solution was for this particular case of embed blocking, but it seems there's more cases in which this happens so it should be generalized (I don't like the way I structured this solution particularly).

I'll give #1619 a try. BTW your commit does worked for embedding blocked videos. I tried on some more videos that have embedding blocked, this commit can bypass that embedding block gate.

@piltom
Copy link
Author

piltom commented May 13, 2023

@pbxforce I have updated the code and now both your links work. It is also a lot faster that the other PR mentioned previously.

What I added now is a mapping from unplayability reason to player to use to bypass said reason. This way, if you encounter new problems you can update this map (check the reasons.py map). You can extract the reason by adding a print statement in the newly updated code for vid_info

@piltom
Copy link
Author

piltom commented May 13, 2023

The mapping for now looks like this:

PLAYER_FOR_REASON = {
  'Playback on other websites has been disabled by the video owner': 'ANDROID',
  'This video is not available': 'ANDROID',
  'This video is unavailable': 'ANDROID',
}

Which makes it seem redundant because all the vales are ANDROID. Could come in handy if there are other reasons that are not bypassed by ANDROID, but I don't have much experience with youtube to tell.

This could also eventually be updated to have a list of players for each key, so that the code can try all of them until one succeeds.

@pbxforce
Copy link

This is good. I somewhat fixed it by changing 'client=ANDROID_EMBED to 'client=ANDROID' but it was not totally fixed. Your way is much better and getting playabilityStatus reason is also good to know for future errors. Thank you

Copy link

@pbxforce pbxforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes the issue for me related to #1620 of AgeRestrictionError for non-restricted and non-embedding videos (which also gives exception of AgeRestrictionError).

@piltom
Copy link
Author

piltom commented May 15, 2023

@nficano / @glubsy/ @tfdahlin / @RONNCC could I get a review/feedback on this?

The age restriction reason should probably be moved to the map too no?

If android will fix all reasons then probably we don't even need a map, also it would be nice to include the reason for unplayability in the exception, wouldn't you agree?

@felipeucelli
Copy link
Contributor

I tested the code and it manages to get the video when it is prevented from embedding. But for some reason the ANDROID client is limiting the validity of the url to about 30 seconds, after that time, a 403 error is generated.

We could use the IOS client, which has no throttling. But it returns few stream options.

I think the best thing would be if we could make the ANDROID client work normally to use it as the default in innertube.py.

Comment on lines 257 to 271
def update_vid_info_with(self, client='ANDROID'):
"""Attempt to update the vid_info by using client passed as arg."""
innertube = InnerTube(
client=client,
use_oauth=self.use_oauth,
allow_cache=self.allow_oauth_cache
)

innertube_response = innertube.player(self.video_id)
playability_status = innertube_response['playabilityStatus'].get('status', None)
# If we still can't access the video, raise an exception
if playability_status == 'UNPLAYABLE':
raise exceptions.VideoUnavailable(self.video_id)
self._vid_info = innertube_response

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance I would say this entire method should be part of YouTube.vid_info, no need for a separate detour like that? vid_info could take a default client parameter?

Comment on lines 3 to 7
PLAYER_FOR_REASON = {
'Playback on other websites has been disabled by the video owner': 'ANDROID',
'This video is not available': 'ANDROID',
'This video is unavailable': 'ANDROID',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module's name and this variable name are not very clear. I don't think it needs its own separate module.
This could be better documented with nested key/value pairs, ie

FALLBACK_CLIENTS = [
    {
        'error_message': 'Playback on other websites has been disabled by the video owner',
        'client': 'ANDROID'
    },
    ...
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again I don't think this should be handled here.

Copy link
Contributor

@glubsy glubsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to think of this workaround. It might work, but forcing whatever client just because the first one did not work sounds like a poor approach.
If anything the "client" application should handle that, so perhaps the only thing worth doing here would be to throw the appropriate exception instead of trying to work around whatever data you get in the "library" part of the code.

@RONNCC
Copy link
Collaborator

RONNCC commented May 16, 2023

aligned /w glubsy ... slightly scared if this would break other things 🤔

@RONNCC
Copy link
Collaborator

RONNCC commented May 16, 2023

would you have any insight into a generic solution across clients perhaps @piltom ?

@piltom
Copy link
Author

piltom commented May 16, 2023

Not sure what to think of this workaround. It might work, but forcing whatever client just because the first one did not work sounds like a poor approach. If anything the "client" application should handle that, so perhaps the only thing worth doing here would be to throw the appropriate exception instead of trying to work around whatever data you get in the "library" part of the code.

Alright, this approach was just a generalization of the original bypass_age_gate method that was originally assuming that the problem was age restriction no matter what, then trying ANDROID_EMBED and finally throwing an age restriction excpetion (incredibly confusing when it is not the case).

Would you agree then to remove the bypass_age_gate method and instead throw an appropriate exception that contains the reason returned by Youtube? This exception can also then contain a suggestion to use the new client parameter to vid_info that @glubsy suggested.

I tested the code and it manages to get the video when it is prevented from embedding. But for some reason the ANDROID client is limiting the validity of the url to about 30 seconds, after that time, a 403 error is generated.

I will try to reproduce that and see if there's something I can do

@piltom
Copy link
Author

piltom commented May 16, 2023

As a matter of fact, it seems that bypass_age_gate is being called always in the original code, so the Innertube client being used by default is technically ANDROID_EMBED

@glubsy
Copy link
Contributor

glubsy commented May 16, 2023

Would you agree then to remove the bypass_age_gate method and instead throw an appropriate exception that contains the reason returned by Youtube? This exception can also then contain a suggestion to use the new client parameter to vid_info that @glubsy suggested.

That sounds much more reasonable to me, although the exception should not really care about workarounds, only the calling code should then suggest what to do in that case.

I tested the code and it manages to get the video when it is prevented from embedding. But for some reason the ANDROID client is limiting the validity of the url to about 30 seconds, after that time, a 403 error is generated.

As far as I know, this has been going on since early april of 2023. This PR does not aim at solving this issue.

@piltom
Copy link
Author

piltom commented May 21, 2023

I'll push a cleaner version later today. In the meantime here is some playability status details for each client, see how many of them differ in both status and reason depending on the client. Also it seems that the original age bypass does not work anymore:

Format is client: (status, reason)

Embedded blocked (https://www.youtube.com/watch?v=e8RcQoGY4OE)
{
 'WEB': ('OK', ''),
 'ANDROID': ('OK', ''),
 'IOS': ('OK', ''),
 'WEB_EMBED': ('UNPLAYABLE', 'Video unavailable'),
 'ANDROID_EMBED': ('UNPLAYABLE', 'Playback on other websites has been disabled by the video owner'),
 'IOS_EMBED': ('OK', ''),
 'WEB_MUSIC': ('UNPLAYABLE', 'This video is not available'),
 'ANDROID_MUSIC': ('UNPLAYABLE', 'This video is not available'),
 'IOS_MUSIC': ('UNPLAYABLE', 'This video is not available'),
 'WEB_CREATOR': ('OK', ''),
 'ANDROID_CREATOR': ('OK', ''),
 'IOS_CREATOR': ('OK', ''),
 'MWEB': ('OK', ''),
 'TV_EMBED': ('UNPLAYABLE', 'Playback on other websites has been disabled by the video owner')
}

Normal video (https://www.youtube.com/watch?v=wMRbSKjEtrc)
{
 'WEB': ('OK', ''),
 'ANDROID': ('OK', ''),
 'IOS': ('OK', ''),
 'WEB_EMBED': ('OK', ''),
 'ANDROID_EMBED': ('OK', ''),
 'IOS_EMBED': ('OK', ''),
 'WEB_MUSIC': ('UNPLAYABLE', 'This video is not available'),
 'ANDROID_MUSIC': ('UNPLAYABLE', 'This video is not available'),
 'IOS_MUSIC': ('UNPLAYABLE', 'This video is not available'),
 'WEB_CREATOR': ('OK', ''),
 'ANDROID_CREATOR': ('OK', ''),
 'IOS_CREATOR': ('OK', ''),
 'MWEB': ('OK', ''),
 'TV_EMBED': ('OK', '')
}

Age restricted (https://www.youtube.com/watch?v=mQvteoFiMlg)
{
 'WEB': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'ANDROID': ('LOGIN_REQUIRED', 'This video may be inappropriate for some users.'),
 'IOS': ('LOGIN_REQUIRED', 'This video may be inappropriate for some users.'),
 'WEB_EMBED': ('UNPLAYABLE', 'Sorry, this content is age-restricted'),
 'ANDROID_EMBED': ('UNPLAYABLE', 'Sorry, this content is age-restricted'),
 'IOS_EMBED': ('UNPLAYABLE', 'Sorry, this content is age-restricted'),
 'WEB_MUSIC': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'ANDROID_MUSIC': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'IOS_MUSIC': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'WEB_CREATOR': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'ANDROID_CREATOR': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'IOS_CREATOR': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'MWEB': ('LOGIN_REQUIRED', 'Sign in to confirm your age'),
 'TV_EMBED': ('UNPLAYABLE', 'This video is unavailable')
 }

@piltom
Copy link
Author

piltom commented May 21, 2023

@glubsy @RONNCC what do you think of the current solution? I force pushed a squashed commit :)

The pytube constructor takes the optional innertube client argument (vid_info is a property getter, it cannot take arguments). Then the vid_info getter will check for availability on the innertube response. There's another check for availability in the code but it seems to only be used on the watch_html... should this maybe be updated or generalized?

Doing the check for availability in the vid_info getter might be too soon? Where would you place that instead?

I can also add a new feature for troubleshooting that will analyze the problems on each client to suggest the user what the problem is. If you check my previous comment in the PR, in some cases the youtube reasons are vague and overlap with other problems. This would be something like:

def troubleshoot_url(self):
    # go through all clients to find a good reason why the link does not work
    # maybe raise the correct exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants