-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vlive:playlist] Add new extractor #13613
Conversation
Looks like the builds are failing because multiple extractors are not allowed to match the same URL, is there any way around this without making the regular VLive video URL pattern restrictive? |
Override |
3c8fdeb
to
7acd0b1
Compare
Made the change, the builds don't seem to be failing because of this extractor anymore. |
youtube_dl/extractor/vlive.py
Outdated
def _real_extract(self, url): | ||
playlist_id = self._match_id(url) | ||
video_id = self._search_regex( | ||
self._VALID_URL, url, 'video id', group='video_id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use bare re.match
.
youtube_dl/extractor/vlive.py
Outdated
webpage, 'playlist name', fatal=False) | ||
|
||
item_ids = self._search_regex( | ||
r'\bvar\s+playlistVideoSeqs\s*=\s*\[([^\]]+)\]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to escape ]
is character set.
youtube_dl/extractor/vlive.py
Outdated
|
||
item_ids = self._search_regex( | ||
r'\bvar\s+playlistVideoSeqs\s*=\s*\[([^\]]+)\]', | ||
webpage, 'playlist item ids', default='') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default
is senseless here.
youtube_dl/extractor/vlive.py
Outdated
webpage, 'playlist item ids', default='') | ||
|
||
entries = [] | ||
for item_id in re.split(r'\s*,\s*', item_ids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_parse_json
instead.
webpage = self._download_webpage( | ||
'http://www.vlive.tv/video/%s/playlist/%s' % (video_id, playlist_id), video_id) | ||
|
||
playlist_name = self._html_search_regex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--no-playlist
is not respected.
youtube_dl/extractor/extractors.py
Outdated
@@ -1203,6 +1203,7 @@ | |||
VKWallPostIE, | |||
) | |||
from .vlive import ( | |||
VLivePlaylistIE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not alphabetic.
Thanks for your feedback, I've pushed another commit where I've tried to address your comments. Let me know if there's still something you would like me to improve. |
youtube_dl/extractor/vlive.py
Outdated
webpage, 'playlist item ids') | ||
|
||
entries = [] | ||
for item_id in self._parse_json('[%s]' % item_ids, playlist_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have []
in regex, capture it and parse there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated.
youtube_dl/extractor/vlive.py
Outdated
assert video_id_match | ||
video_id = compat_str(video_id_match.group('video_id')) | ||
|
||
video_url_format = 'http://www.vlive.tv/video/%s' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uppercase is used for constants. Also this is a template not format.
youtube_dl/extractor/vlive.py
Outdated
return self.url_result( | ||
video_url_format % video_id, | ||
ie=VLiveIE.ie_key(), video_id=video_id) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
is superfluous.
Thanks, I've changed the variable name for the constant to uppercase and removed the unnecessary |
youtube_dl/extractor/vlive.py
Outdated
}, | ||
'playlist_mincount': 20 | ||
}, { | ||
'url': 'http://www.vlive.tv/video/22867/playlist/22912', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this test only_matching
and add a comment about why it even exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make it only_matching
does that mean it will only test that the URL matches the regex? If so, that defeats the point of that test. The URL in the second test links to a video which appears to be linked to the playlist somewhat differently to the URL in the first test.
From my observations, if someone were to download the webpage using only the video id from the given URL, it would not link to the playlist, whereas it would with the URL in the first test. Therefore I think both tests should perform the full webpage download.
i.e:
http://www.vlive.tv/video/30824
the HTML for this page does contain the playlist
http://www.vlive.tv/video/22867
the HTML for this page does not
I can add a code comment to clarify this if my assumption about only_matching
is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both test the same extraction scenario. The rule is one test per extraction scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think it'll be better to keep the second test and remove the first one entirely, I'll do that.
I've pushed the change to remove a test so there's only one. |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
A couple more examples of vlive playlists:
The reason why there are two tests is because the webpages for the two are slightly different. The webpage for the video in the first test contains playlist information even if you remove the 'playlist' and playlist id segments of the URL. Whereas the second video doesn't contain playlist information if you remove the 'playlist' and playlist id segments of the URL therefore the playlist extractor would not work.
I've put this extractor before the regular video extractor in the extractors file because the playlist URLs match the regular video pattern. This way they both work.