-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[watchindianporn] Fix parser #13415
[watchindianporn] Fix parser #13415
Conversation
@@ -41,34 +37,26 @@ def _real_extract(self, url): | |||
webpage = self._download_webpage(url, display_id) | |||
|
|||
video_url = self._html_search_regex( | |||
r"url: escape\('([^']+)'\)", webpage, 'url') | |||
r'<source[^<]+type=[\'"]video/mp4[\'"\s]*src=[\'"]([^\'"]+)', webpage, 'url') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_parse_html5_media_entries
.
webpage, 'title') | ||
title = self._html_search_regex(( | ||
r'<title>(.*?)- Indian Porn</title>', | ||
r'<h4>(.*?)</h4>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not capture empty strings.
r'<td>Comments:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>', | ||
webpage, 'comment count', fatal=False)) | ||
r'Time:\s*<strong>\s*.+?\s*<\/strong>.*?<strong>\s*(\d+)\s*</strong>', | ||
webpage, 'view count', flags=re.DOTALL, fatal=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move flags into regex.
|
||
categories = re.findall( | ||
r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>', | ||
r'<a>[^<]+?class=[\'"]categories[\'">]*([^<]+)</a>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect.
Updated |
upload_date = unified_strdate(self._html_search_regex( | ||
r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False)) | ||
title = self._html_search_regex(( | ||
r'<title>(.+?)-[\s]+Indian[\s]+Porn</title>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]
superfluous.
webpage, 'duration', fatal=False)) | ||
|
||
view_count = int_or_none(self._search_regex( | ||
r'<td>Views:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>', | ||
r'(?s)Time:\s*<strong>\s*.+?\s*<\/strong>.*?<strong>\s*(\d+)\s*</strong>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No escape for /
.
|
||
categories = re.findall( | ||
r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>', | ||
r'<a[^<]+?class=[\'"]categories[\'">]*([^<]+)</a>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still incorrect.
upload_date = unified_strdate(self._html_search_regex( | ||
r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False)) | ||
title = self._html_search_regex(( | ||
r'<title>(.+?)-[\s]+Indian[\s]+Porn</title>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[]
superfluous.
I'm not sure if the last regex is correct. Look forward to your answer @dstftw |
Not correct. |
I hope I understood |
No you don't. |
Oh, I swear, I didn't see it. I thought the mistake was at the end. If the rest is right I fix it and squash commits |
Fixed @dstftw |
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Fix #13411. Note that uploader, upload_info and comment_count infos are no more present in the service.