Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vlive:playlist] Add new extractor #13613

Merged
merged 6 commits into from
Jul 9, 2017

Conversation

coreynicholson
Copy link
Contributor

@coreynicholson coreynicholson commented Jul 9, 2017

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

A couple more examples of vlive playlists:

http://www.vlive.tv/video/24371/playlist/24384
http://www.vlive.tv/video/20928/playlist/20933
http://www.vlive.tv/video/16181/playlist/17079

The reason why there are two tests is because the webpages for the two are slightly different. The webpage for the video in the first test contains playlist information even if you remove the 'playlist' and playlist id segments of the URL. Whereas the second video doesn't contain playlist information if you remove the 'playlist' and playlist id segments of the URL therefore the playlist extractor would not work.

I've put this extractor before the regular video extractor in the extractors file because the playlist URLs match the regular video pattern. This way they both work.

@coreynicholson
Copy link
Contributor Author

Looks like the builds are failing because multiple extractors are not allowed to match the same URL, is there any way around this without making the regular VLive video URL pattern restrictive?

@dstftw
Copy link
Collaborator

dstftw commented Jul 9, 2017

Override suitable.

@coreynicholson
Copy link
Contributor Author

Made the change, the builds don't seem to be failing because of this extractor anymore.

def _real_extract(self, url):
playlist_id = self._match_id(url)
video_id = self._search_regex(
self._VALID_URL, url, 'video id', group='video_id')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use bare re.match.

webpage, 'playlist name', fatal=False)

item_ids = self._search_regex(
r'\bvar\s+playlistVideoSeqs\s*=\s*\[([^\]]+)\]',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to escape ] is character set.


item_ids = self._search_regex(
r'\bvar\s+playlistVideoSeqs\s*=\s*\[([^\]]+)\]',
webpage, 'playlist item ids', default='')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is senseless here.

webpage, 'playlist item ids', default='')

entries = []
for item_id in re.split(r'\s*,\s*', item_ids):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parse_json instead.

webpage = self._download_webpage(
'http://www.vlive.tv/video/%s/playlist/%s' % (video_id, playlist_id), video_id)

playlist_name = self._html_search_regex(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-playlist is not respected.

@@ -1203,6 +1203,7 @@
VKWallPostIE,
)
from .vlive import (
VLivePlaylistIE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not alphabetic.

@coreynicholson
Copy link
Contributor Author

Thanks for your feedback, I've pushed another commit where I've tried to address your comments. Let me know if there's still something you would like me to improve.

webpage, 'playlist item ids')

entries = []
for item_id in self._parse_json('[%s]' % item_ids, playlist_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have [] in regex, capture it and parse there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated.

assert video_id_match
video_id = compat_str(video_id_match.group('video_id'))

video_url_format = 'http://www.vlive.tv/video/%s'
Copy link
Collaborator

@dstftw dstftw Jul 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercase is used for constants. Also this is a template not format.

return self.url_result(
video_url_format % video_id,
ie=VLiveIE.ie_key(), video_id=video_id)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else is superfluous.

@coreynicholson
Copy link
Contributor Author

Thanks, I've changed the variable name for the constant to uppercase and removed the unnecessary else.

},
'playlist_mincount': 20
}, {
'url': 'http://www.vlive.tv/video/22867/playlist/22912',
Copy link
Collaborator

@dstftw dstftw Jul 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this test only_matching and add a comment about why it even exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make it only_matching does that mean it will only test that the URL matches the regex? If so, that defeats the point of that test. The URL in the second test links to a video which appears to be linked to the playlist somewhat differently to the URL in the first test.

From my observations, if someone were to download the webpage using only the video id from the given URL, it would not link to the playlist, whereas it would with the URL in the first test. Therefore I think both tests should perform the full webpage download.

i.e:
http://www.vlive.tv/video/30824 the HTML for this page does contain the playlist
http://www.vlive.tv/video/22867 the HTML for this page does not

I can add a code comment to clarify this if my assumption about only_matching is correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both test the same extraction scenario. The rule is one test per extraction scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think it'll be better to keep the second test and remove the first one entirely, I'll do that.

@coreynicholson
Copy link
Contributor Author

I've pushed the change to remove a test so there's only one.

@dstftw dstftw merged commit b71c18b into ytdl-org:master Jul 9, 2017
dstftw added a commit that referenced this pull request Sep 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants