Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vidbit] Add new extractor (Closes #9688) #9759

Closed
wants to merge 1 commit into from

Conversation

TRox1972
Copy link
Contributor

No description provided.

return {
'id': video_id,
'title': self._html_search_regex(r'<h1>(.+)</h1>', webpage, 'title'),
'url': self._BASE_URL % self._html_search_regex(r'file:\s*["\'](.+)["\']', webpage, 'video URL'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url should be used as base URL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["\'](.+)["\'] will capture everything including " and ' till trailing " or '.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the site both uses single and double quotes. Changing it to e.g. ([^"\']) wouldn't extract the correct title if it contains one type of quotes. Do you know of a good regex for this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capture opening quote and match it with closing.

Copy link
Contributor Author

@TRox1972 TRox1972 Jun 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url should be used as base URL.

@dsftw I'm not sure what you mean. I've seen the same setup in other extractors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean exactly that: base part of url should be used as base URL instead of this hardcode.

@yan12125
Copy link
Collaborator

This is a custom JWPlayer. Should use JWPlatformBaseIE._parse_jwplayer_data() instead.

@dstftw
Copy link
Collaborator

dstftw commented Jun 12, 2016

Current _parse_jwplayer_data won't work with this input since the path is relative.

@yan12125
Copy link
Collaborator

_parse_jwplayer_data can be extended to support relative URLs.

@TRox1972
Copy link
Contributor Author

So should _parse_jwplayer_databe modified to handle relative URLs, or should I use current approach?

@yan12125
Copy link
Collaborator

It's OK to just use the current approach. Rewriting jwplayer-related codes is not a top priority.

@TRox1972
Copy link
Contributor Author

I've added a quick fix for the base URL, but it's not very pretty, so any suggestions for a better solution are appreciated :)

@dstftw
Copy link
Collaborator

dstftw commented Jun 18, 2016

Use urljoin?

@TRox1972
Copy link
Contributor Author

@dfstw Does changes seem OK?

'id': video_id,
'title': self._html_search_regex(r'<h1>(.+)</h1>', webpage, 'title'),
'url': compat_urlparse.urljoin(url, self._html_search_regex(r'file:\s*(["\'])((?:(?!\1).)+)\1', webpage, 'video URL', group=2)),
'thumbnail': compat_urlparse.urljoin(url, self._html_search_regex(r'image:\s*(["\'])((?:(?!\1).)+)\1', webpage, 'thumbnail', None, group=2)),
Copy link
Collaborator

@dstftw dstftw Jun 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail no thumbnails extracted. You claim thumbnail to be optional. Is there any example URL of such video?
Also og:image seems like easier way to extract thumbnail.

@dstftw
Copy link
Collaborator

dstftw commented Jun 22, 2016

Also carry long lines and squash commits.

@TRox1972
Copy link
Contributor Author

@dstftw Does this seem good?

@dstftw dstftw closed this in f484c5f Jun 26, 2016
@TRox1972 TRox1972 deleted the vidbit branch June 26, 2016 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants