[PBS] Fix AttributeError: 'NoneType' #16623

Urgau · 2018-06-03T08:50:49Z

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

This is a fix for #15373

dstftw · 2018-06-03T09:00:54Z

youtube_dl/extractor/pbs.py

+            try:
+                video_id = self._search_regex(
+                    r'<div\s+id="video_([0-9]+)"', player_page, 'video ID')
+            except:


Never use bare except. Use list of regexes in _search_regex instead.

dstftw · 2018-06-03T09:04:28Z

youtube_dl/extractor/pbs.py

@@ -455,7 +455,9 @@ def _extract_webpage(self, url):

            if not url:
                url = self._og_search_url(webpage)
-
+
+            if url.strip().startswith("//"):


Single quotes.

dstftw · 2018-06-03T09:06:52Z

youtube_dl/extractor/pbs.py

-
+
+            if url.strip().startswith("//"):
+                url = "https:" + url.strip()


_proto_relative_url.

dstftw · 2018-06-03T09:29:20Z

youtube_dl/extractor/pbs.py

@@ -466,7 +467,7 @@ def _extract_webpage(self, url):
                url, display_id, note='Downloading player page',
                errnote='Could not download player page')
            video_id = self._search_regex(
-                r'<div\s+id="video_([0-9]+)"', player_page, 'video ID')
+                [r'<div\s+id="video_([0-9]+)"', r'"id":[\s]*"([0-9]+)"'], player_page, 'video ID')


[] is pointless. Must be more relaxed.

What ? I need to use [] for making a list like you want.

I'm not about list.

So execuse me but I don't understand your comment.
What do I have to change?

[] is superfluous in your regex. What's not clear here?

Sorry I thought you were talking about the [] around the 2 regex.

Urgau · 2018-06-04T11:12:10Z

Is there anything else to do before the merge ?

dstftw · 2018-06-04T17:16:31Z

Add a test.

dstftw · 2018-06-04T17:20:43Z

youtube_dl/extractor/pbs.py

@@ -466,7 +467,7 @@ def _extract_webpage(self, url):
                url, display_id, note='Downloading player page',
                errnote='Could not download player page')
            video_id = self._search_regex(
-                r'<div\s+id="video_([0-9]+)"', player_page, 'video ID')
+                [r'<div\s+id="video_([0-9]+)"', r'"id":\s*"([0-9]+)"'], player_page, 'video ID')


I've already pointed out: relax regex. It must match arbitrary quotes and whitespace.

I'm sorry but I can't change the new regex otherwise it will point to other things that are very similar but incorrect.

Actually this approach is redundant since id can be extracted right from webpage.

In rare cases it cannot be extracted directly from the webpage (like embed video)

That's not an excuse for not adding another id patern for extraction from webpage that works fine with your test. Also add a test with "rare" case.

Yes, it seems that it works due to this commit (d7be705) which was published after my pull request but merge before me. However my test is valid without the previous commit.

I also maintain that my id extraction can serve but if you insist I can remove it.

Again: https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/ can be processed without downloading player_page by looking for media id in webpage.

Urgau · 2018-06-11T13:15:37Z

This pull request is also a fix for #16684

[PBS] Fix AttributeError: 'NoneType'

f051742

This is a fix for #15373

dstftw requested changes Jun 3, 2018

View reviewed changes

dstftw added the pending-fixes label Jun 3, 2018

Improve code convention

5f2aac8

dstftw requested changes Jun 3, 2018

View reviewed changes

Remove unnecessary [] in regex

4acaa8f

dstftw requested changes Jun 4, 2018

View reviewed changes

Urgau added 4 commits June 4, 2018 21:53

Add test

fe42b2b

Merge branch 'master' into patch-1

dad5c0c

Oups fix merging.

3f4fda5

Add test for the second id extractor

dbe8b9f

Remove regex

7f6fd81

dstftw closed this in 87f89da Jun 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PBS] Fix AttributeError: 'NoneType' #16623

[PBS] Fix AttributeError: 'NoneType' #16623

Urgau commented Jun 3, 2018 •

edited

Loading

dstftw Jun 3, 2018

dstftw Jun 3, 2018

dstftw Jun 3, 2018

dstftw Jun 3, 2018

Urgau Jun 3, 2018

dstftw Jun 3, 2018

Urgau Jun 3, 2018

dstftw Jun 3, 2018

Urgau Jun 3, 2018

Urgau commented Jun 4, 2018

dstftw commented Jun 4, 2018

dstftw Jun 4, 2018

Urgau Jun 4, 2018

dstftw Jun 4, 2018

Urgau Jun 4, 2018

dstftw Jun 7, 2018

Urgau Jun 7, 2018 •

edited

Loading

dstftw Jun 12, 2018 •

edited

Loading

Urgau commented Jun 11, 2018



		if url.strip().startswith("//"):
		url = "https:" + url.strip()

[PBS] Fix AttributeError: 'NoneType' #16623

[PBS] Fix AttributeError: 'NoneType' #16623

Conversation

Urgau commented Jun 3, 2018 • edited Loading

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Urgau commented Jun 4, 2018

dstftw commented Jun 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Urgau Jun 7, 2018 • edited Loading

Choose a reason for hiding this comment

dstftw Jun 12, 2018 • edited Loading

Choose a reason for hiding this comment

Urgau commented Jun 11, 2018

Urgau commented Jun 3, 2018 •

edited

Loading

Urgau Jun 7, 2018 •

edited

Loading

dstftw Jun 12, 2018 •

edited

Loading