Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PBS] Fix AttributeError: 'NoneType' #16623

Closed
wants to merge 8 commits into from
Closed

[PBS] Fix AttributeError: 'NoneType' #16623

wants to merge 8 commits into from

Conversation

Urgau
Copy link
Contributor

@Urgau Urgau commented Jun 3, 2018

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This is a fix for #15373

try:
video_id = self._search_regex(
r'<div\s+id="video_([0-9]+)"', player_page, 'video ID')
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never use bare except. Use list of regexes in _search_regex instead.

@@ -455,7 +455,9 @@ def _extract_webpage(self, url):

if not url:
url = self._og_search_url(webpage)


if url.strip().startswith("//"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes.



if url.strip().startswith("//"):
url = "https:" + url.strip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_proto_relative_url.

@@ -466,7 +467,7 @@ def _extract_webpage(self, url):
url, display_id, note='Downloading player page',
errnote='Could not download player page')
video_id = self._search_regex(
r'<div\s+id="video_([0-9]+)"', player_page, 'video ID')
[r'<div\s+id="video_([0-9]+)"', r'"id":[\s]*"([0-9]+)"'], player_page, 'video ID')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] is pointless. Must be more relaxed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ? I need to use [] for making a list like you want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not about list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So execuse me but I don't understand your comment.
What do I have to change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] is superfluous in your regex. What's not clear here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I thought you were talking about the [] around the 2 regex.

@Urgau
Copy link
Contributor Author

Urgau commented Jun 4, 2018

Is there anything else to do before the merge ?

@dstftw
Copy link
Collaborator

dstftw commented Jun 4, 2018

Add a test.

@@ -466,7 +467,7 @@ def _extract_webpage(self, url):
url, display_id, note='Downloading player page',
errnote='Could not download player page')
video_id = self._search_regex(
r'<div\s+id="video_([0-9]+)"', player_page, 'video ID')
[r'<div\s+id="video_([0-9]+)"', r'"id":\s*"([0-9]+)"'], player_page, 'video ID')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already pointed out: relax regex. It must match arbitrary quotes and whitespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I can't change the new regex otherwise it will point to other things that are very similar but incorrect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this approach is redundant since id can be extracted right from webpage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rare cases it cannot be extracted directly from the webpage (like embed video)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not an excuse for not adding another id patern for extraction from webpage that works fine with your test. Also add a test with "rare" case.

Copy link
Contributor Author

@Urgau Urgau Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems that it works due to this commit (d7be705) which was published after my pull request but merge before me. However my test is valid without the previous commit.

I also maintain that my id extraction can serve but if you insist I can remove it.

Copy link
Collaborator

@dstftw dstftw Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/ can be processed without downloading player_page by looking for media id in webpage.

@Urgau
Copy link
Contributor Author

Urgau commented Jun 11, 2018

This pull request is also a fix for #16684

@dstftw dstftw closed this in 87f89da Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants