-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Vidbit] Add new extractor (Closes #9688) #9759
Conversation
return { | ||
'id': video_id, | ||
'title': self._html_search_regex(r'<h1>(.+)</h1>', webpage, 'title'), | ||
'url': self._BASE_URL % self._html_search_regex(r'file:\s*["\'](.+)["\']', webpage, 'video URL'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url
should be used as base URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
["\'](.+)["\']
will capture everything including "
and '
till trailing "
or '
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the site both uses single and double quotes. Changing it to e.g. ([^"\'])
wouldn't extract the correct title if it contains one type of quotes. Do you know of a good regex for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capture opening quote and match it with closing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url
should be used as base URL.
@dsftw I'm not sure what you mean. I've seen the same setup in other extractors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean exactly that: base part of url
should be used as base URL instead of this hardcode.
This is a custom JWPlayer. Should use |
Current |
|
So should |
It's OK to just use the current approach. Rewriting jwplayer-related codes is not a top priority. |
I've added a quick fix for the base URL, but it's not very pretty, so any suggestions for a better solution are appreciated :) |
Use |
@dfstw Does changes seem OK? |
'id': video_id, | ||
'title': self._html_search_regex(r'<h1>(.+)</h1>', webpage, 'title'), | ||
'url': compat_urlparse.urljoin(url, self._html_search_regex(r'file:\s*(["\'])((?:(?!\1).)+)\1', webpage, 'video URL', group=2)), | ||
'thumbnail': compat_urlparse.urljoin(url, self._html_search_regex(r'image:\s*(["\'])((?:(?!\1).)+)\1', webpage, 'thumbnail', None, group=2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail no thumbnails extracted. You claim thumbnail to be optional. Is there any example URL of such video?
Also og:image
seems like easier way to extract thumbnail.
Also carry long lines and squash commits. |
@dstftw Does this seem good? |
No description provided.