Skip to content

Loading…

Adding support to indentify multipl html5 elements in a webpage #2191

Closed
wants to merge 6 commits into from

2 participants

@renuyarday

Example
--verbose -g http://flowplayer.org/

results in only the first asset being shown.

which these changes it show all nine.

@phihag phihag commented on the diff
youtube_dl/extractor/common.py
@@ -312,6 +312,17 @@ def url_result(url, ie=None, video_id=None):
if video_id is not None:
video_info['id'] = video_id
return video_info
+
+ @staticmethod
+ def video_result(video_url=None, video_id=None, uploader=None, video_title=None):
@phihag Collaborator
phihag added a note

This helper function is unnecessary, it's easier to just specify the dictionary. In general, I want to remove the other helper functions as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@phihag phihag commented on the diff
youtube_dl/extractor/common.py
@@ -312,6 +312,17 @@ def url_result(url, ie=None, video_id=None):
if video_id is not None:
video_info['id'] = video_id
return video_info
+
+ @staticmethod
+ def video_result(video_url=None, video_id=None, uploader=None, video_title=None):
+ """Returns a url that points to a page that should be processed"""
@phihag Collaborator
phihag added a note

This comment is misleading, that would be _type url.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@phihag phihag commented on the diff
youtube_dl/extractor/generic.py
@@ -348,7 +348,13 @@ def _real_extract(self, url):
mobj = re.search(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if mobj is None:
# HTML5 video
- mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
+ matches = re.findall(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
+ if matches:
+ urlrs = [self.video_result(unescapeHTML(tuppl), video_id, video_uploader,video_title)
@phihag Collaborator
phihag added a note

video_id and all other properties will be the same for all results. That's most likely not a good idea. Instead, construct it as below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@phihag
Collaborator

Well, the problem is that youtube-dl will nevertheless only download three or so, because we can't get video titles. With the above changes, it may work for some webpages (although not flowplayer). We could also implement format selection for HTML5 videos, btw.

@phihag
Collaborator

As mentioned above, this has now been implemented. If it still fails with a current version of youtube-dl, please open a new issue with the complete output of youtube-dl when called with the -v option. Thanks!

@phihag phihag closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 18, 2013
  1. @renuyarday
  2. @renuyarday
Commits on Jan 22, 2014
  1. @renuyarday

    upstream

    renuyarday committed
  2. @renuyarday
  3. @renuyarday

    new line

    renuyarday committed
  4. @renuyarday

    only required changes

    renuyarday committed
This page is out of date. Refresh to see the latest.
Showing with 18 additions and 1 deletion.
  1. +11 −0 youtube_dl/extractor/common.py
  2. +7 −1 youtube_dl/extractor/generic.py
View
11 youtube_dl/extractor/common.py
@@ -312,6 +312,17 @@ def url_result(url, ie=None, video_id=None):
if video_id is not None:
video_info['id'] = video_id
return video_info
+
+ @staticmethod
+ def video_result(video_url=None, video_id=None, uploader=None, video_title=None):
@phihag Collaborator
phihag added a note

This helper function is unnecessary, it's easier to just specify the dictionary. In general, I want to remove the other helper functions as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ """Returns a url that points to a page that should be processed"""
@phihag Collaborator
phihag added a note

This comment is misleading, that would be _type url.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ video_info = {'_type': 'video',
+ 'url': video_url,
+ 'id': video_id,
+ 'uploader': uploader,
+ 'title': video_title}
+ return video_info
+
@staticmethod
def playlist_result(entries, playlist_id=None, playlist_title=None):
"""Returns a playlist"""
View
8 youtube_dl/extractor/generic.py
@@ -348,7 +348,13 @@ def _real_extract(self, url):
mobj = re.search(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if mobj is None:
# HTML5 video
- mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
+ matches = re.findall(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
+ if matches:
+ urlrs = [self.video_result(unescapeHTML(tuppl), video_id, video_uploader,video_title)
@phihag Collaborator
phihag added a note

video_id and all other properties will be the same for all results. That's most likely not a good idea. Instead, construct it as below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ for tuppl in matches]
+ return self.playlist_result(
+ urlrs, playlist_id=video_id, playlist_title=video_title)
+
if mobj is None:
raise ExtractorError('Unsupported URL: %s' % url)
Something went wrong with that request. Please try again.