-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PBS] Fix AttributeError: 'NoneType' #16623
Conversation
This is a fix for #15373
youtube_dl/extractor/pbs.py
Outdated
try: | ||
video_id = self._search_regex( | ||
r'<div\s+id="video_([0-9]+)"', player_page, 'video ID') | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never use bare except. Use list of regexes in _search_regex
instead.
youtube_dl/extractor/pbs.py
Outdated
@@ -455,7 +455,9 @@ def _extract_webpage(self, url): | |||
|
|||
if not url: | |||
url = self._og_search_url(webpage) | |||
|
|||
|
|||
if url.strip().startswith("//"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single quotes.
youtube_dl/extractor/pbs.py
Outdated
|
||
|
||
if url.strip().startswith("//"): | ||
url = "https:" + url.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_proto_relative_url
.
youtube_dl/extractor/pbs.py
Outdated
@@ -466,7 +467,7 @@ def _extract_webpage(self, url): | |||
url, display_id, note='Downloading player page', | |||
errnote='Could not download player page') | |||
video_id = self._search_regex( | |||
r'<div\s+id="video_([0-9]+)"', player_page, 'video ID') | |||
[r'<div\s+id="video_([0-9]+)"', r'"id":[\s]*"([0-9]+)"'], player_page, 'video ID') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]
is pointless. Must be more relaxed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What ? I need to use [] for making a list like you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not about list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So execuse me but I don't understand your comment.
What do I have to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]
is superfluous in your regex. What's not clear here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I thought you were talking about the []
around the 2 regex.
Is there anything else to do before the merge ? |
Add a test. |
youtube_dl/extractor/pbs.py
Outdated
@@ -466,7 +467,7 @@ def _extract_webpage(self, url): | |||
url, display_id, note='Downloading player page', | |||
errnote='Could not download player page') | |||
video_id = self._search_regex( | |||
r'<div\s+id="video_([0-9]+)"', player_page, 'video ID') | |||
[r'<div\s+id="video_([0-9]+)"', r'"id":\s*"([0-9]+)"'], player_page, 'video ID') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already pointed out: relax regex. It must match arbitrary quotes and whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I can't change the new regex otherwise it will point to other things that are very similar but incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this approach is redundant since id can be extracted right from webpage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rare cases it cannot be extracted directly from the webpage (like embed video)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not an excuse for not adding another id patern for extraction from webpage
that works fine with your test. Also add a test with "rare" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems that it works due to this commit (d7be705) which was published after my pull request but merge before me. However my test is valid without the previous commit.
I also maintain that my id extraction can serve but if you insist I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/ can be processed without downloading player_page
by looking for media id in webpage
.
This pull request is also a fix for #16684 |
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
This is a fix for #15373