[watchindianporn] Fix parser #13415

gfabiano · 2017-06-17T23:33:29Z

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Fix #13411. Note that uploader, upload_info and comment_count infos are no more present in the service.

dstftw · 2017-06-18T01:11:31Z

youtube_dl/extractor/watchindianporn.py

@@ -41,34 +37,26 @@ def _real_extract(self, url):
        webpage = self._download_webpage(url, display_id)

        video_url = self._html_search_regex(
-            r"url: escape\('([^']+)'\)", webpage, 'url')
+            r'<source[^<]+type=[\'"]video/mp4[\'"\s]*src=[\'"]([^\'"]+)', webpage, 'url')


_parse_html5_media_entries.

dstftw · 2017-06-18T01:11:58Z

youtube_dl/extractor/watchindianporn.py

-            webpage, 'title')
+        title = self._html_search_regex((
+            r'<title>(.*?)- Indian Porn</title>',
+            r'<h4>(.*?)</h4>'


Do not capture empty strings.

dstftw · 2017-06-18T01:12:25Z

youtube_dl/extractor/watchindianporn.py

-            r'<td>Comments:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
-            webpage, 'comment count', fatal=False))
+            r'Time:\s*<strong>\s*.+?\s*<\/strong>.*?<strong>\s*(\d+)\s*</strong>',
+            webpage, 'view count', flags=re.DOTALL, fatal=False))


Move flags into regex.

dstftw · 2017-06-18T01:12:29Z

youtube_dl/extractor/watchindianporn.py


        categories = re.findall(
-            r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>',
+            r'<a>[^<]+?class=[\'"]categories[\'">]*([^<]+)</a>',


gfabiano · 2017-06-18T11:40:16Z

Updated

dstftw · 2017-06-19T16:50:19Z

youtube_dl/extractor/watchindianporn.py

-        upload_date = unified_strdate(self._html_search_regex(
-            r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False))
+        title = self._html_search_regex((
+            r'<title>(.+?)-[\s]+Indian[\s]+Porn</title>',


[] superfluous.

dstftw · 2017-06-19T16:51:35Z

youtube_dl/extractor/watchindianporn.py

            webpage, 'duration', fatal=False))

        view_count = int_or_none(self._search_regex(
-            r'<td>Views:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
+            r'(?s)Time:\s*<strong>\s*.+?\s*<\/strong>.*?<strong>\s*(\d+)\s*</strong>',


No escape for /.

dstftw · 2017-06-19T16:53:05Z

youtube_dl/extractor/watchindianporn.py


        categories = re.findall(
-            r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>',
+            r'<a[^<]+?class=[\'"]categories[\'">]*([^<]+)</a>',


Still incorrect.

dstftw · 2017-06-19T16:53:27Z

youtube_dl/extractor/watchindianporn.py

-        upload_date = unified_strdate(self._html_search_regex(
-            r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False))
+        title = self._html_search_regex((
+            r'<title>(.+?)-[\s]+Indian[\s]+Porn</title>',


[] superfluous.

gfabiano · 2017-06-19T17:24:15Z

I'm not sure if the last regex is correct. Look forward to your answer @dstftw

dstftw · 2017-06-19T17:26:06Z

Not correct.

gfabiano · 2017-06-19T17:38:11Z

I hope I understood

dstftw · 2017-06-19T17:50:47Z

No you don't. <a[^<] you should look for anything but closing angle bracket, not opening.

gfabiano · 2017-06-19T17:56:38Z

Oh, I swear, I didn't see it. I thought the mistake was at the end. If the rest is right I fix it and squash commits

gfabiano · 2017-06-19T20:27:11Z

Fixed @dstftw

dstftw requested changes Jun 18, 2017

View reviewed changes

dstftw added the pending-fixes label Jun 18, 2017

dstftw requested changes Jun 19, 2017

View reviewed changes

[watchindianporn] Fix parser

4eed7f1

Update watchindianporn.py

31aedad

dstftw merged commit 048b558 into ytdl-org:master Jun 19, 2017

gfabiano deleted the watchindianporn branch June 20, 2017 20:09

dstftw added a commit that referenced this pull request Jul 6, 2017

Credit @gfabiano for #13382, #13385, #13415

ddeff4b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[watchindianporn] Fix parser #13415

[watchindianporn] Fix parser #13415

gfabiano commented Jun 17, 2017 •

edited

Loading

dstftw Jun 18, 2017

dstftw Jun 18, 2017

dstftw Jun 18, 2017

dstftw Jun 18, 2017

gfabiano commented Jun 18, 2017

dstftw Jun 19, 2017

dstftw Jun 19, 2017

dstftw Jun 19, 2017

dstftw Jun 19, 2017

gfabiano commented Jun 19, 2017

dstftw commented Jun 19, 2017

gfabiano commented Jun 19, 2017

dstftw commented Jun 19, 2017

gfabiano commented Jun 19, 2017 •

edited

Loading

gfabiano commented Jun 19, 2017

[watchindianporn] Fix parser #13415

[watchindianporn] Fix parser #13415

Conversation

gfabiano commented Jun 17, 2017 • edited Loading

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfabiano commented Jun 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfabiano commented Jun 19, 2017

dstftw commented Jun 19, 2017

gfabiano commented Jun 19, 2017

dstftw commented Jun 19, 2017

gfabiano commented Jun 19, 2017 • edited Loading

gfabiano commented Jun 19, 2017

gfabiano commented Jun 17, 2017 •

edited

Loading

gfabiano commented Jun 19, 2017 •

edited

Loading