Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[peertube] Added extractor #16329

Closed
wants to merge 12 commits into from

Conversation

parth-verma
Copy link

@parth-verma parth-verma commented Apr 29, 2018

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Adds the ability to download videos from peertube.touhoppai.moe thus resolving #16301

_BASE_THUMBNAIL_URL = 'https://peertube.touhoppai.moe/static/previews/%s.jpg'
IE_DESC = 'Peertube Videos'
IE_NAME = 'Peertube'
_VALID_URL = r'https?:\/\/peertube\.touhoppai\.moe\/videos\/watch\/(?P<id>[0-9|\-|a-z]+)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line in particular are you referring to as invalid?

Copy link
Collaborator

@dstftw dstftw Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exact line, namely the capture group.



class PeertubeIE(InfoExtractor):
_BASE_VIDEO_URL = 'https://peertube.touhoppai.moe/static/webseed/%s-1080.mp4'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hardcodes. Use API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which api are you talking about?

Copy link
Collaborator

@dstftw dstftw Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Website's API.

'ext': 'mp4',
'title': 'David Revoy Live Stream: Speedpainting',
'description': 'md5:5c09a6e3fdb5f56edce289d69fbe7567',
'thumbnail': 'https://peertube.touhoppai.moe/static/previews/7f3421ae-6161-4a4a-ae38-d167aec51683.jpg',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relax.

@parth-verma
Copy link
Author

@dstftw please review

class PeertubeIE(InfoExtractor):
IE_DESC = 'Peertube Videos'
IE_NAME = 'Peertube'
_VALID_URL = r'https?:\/\/peertube\.touhoppai\.moe\/videos\/watch\/(?P<id>[0-9|\-|a-z]+)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is exactly wrong with the capture group as it seems correct to me and runs without any errors?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code in bracket is a set of matching characters. oring won't work.

@@ -1332,7 +1333,7 @@
WebOfStoriesPlaylistIE,
)
from .weibo import (
WeiboIE,
WeiboIE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all unrelated changes.

def _real_extract(self, url):
video_id = self._match_id(url)
url_data = compat_urlparse.urlparse(url)
api_url = "%s://%s/api/v1/videos/%s" % (url_data.scheme, url_data.hostname, video_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urljoin.

return {
'id': video_id,
'title': details['name'],
'description': details['description'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not break if missing.

'id': video_id,
'title': details['name'],
'description': details['description'],
'url': details['files'][-1]['fileUrl'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All formats must be extracted.

'title': details['name'],
'description': details['description'],
'url': details['files'][-1]['fileUrl'],
'thumbnail': url_data.scheme + '://' + url_data.hostname + details['thumbnailPath']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

@parth-verma
Copy link
Author

@dstftw Made the required changes. Please review.

class PeertubeIE(InfoExtractor):
IE_DESC = 'Peertube Videos'
IE_NAME = 'Peertube'
_VALID_URL = r'https?:\/\/peertube\.touhoppai\.moe\/videos\/watch\/(?P<id>[0-9|a-z]{8}-[0-9|a-z]{4}-[0-9|a-z]{4}-[0-9|a-z]{4}-[0-9|a-z]{12})'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you read my messages at all?

video_id = self._match_id(url)
url_data = compat_urlparse.urlparse(url)
base_url = "%s://%s" % (url_data.scheme, url_data.hostname)
api_url = urljoin(urljoin(base_url, "/api/v1/videos/"), video_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the hell are you doing? urljoin(url, '/api/v1/videos/%s' % video_id). All.

details = self._download_json(api_url, video_id)
return {
'id': video_id,
'title': details.get('name'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read: coding conventions, mandatory fields.

'title': details.get('name'),
'description': details.get('description'),
'formats': [{'url': file_data['fileUrl'], 'filesize': file_data.get('size')} for file_data in sorted(details['files'], key=lambda x: x['size'])],
'thumbnail': urljoin(base_url, details['thumbnailPath'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read: coding conventions, optional fields.

'id': video_id,
'title': details.get('name'),
'description': details.get('description'),
'formats': [{'url': file_data['fileUrl'], 'filesize': file_data.get('size')} for file_data in sorted(details['files'], key=lambda x: x['size'])],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. _sort_formats.
  2. Must not break if any of these keys is missing.

@parth-verma
Copy link
Author

@dstftw Made the changes. Please verify.

video_id = self._match_id(url)
url_data = compat_urlparse.urlparse(url)
base_url = "%s://%s" % (url_data.scheme, url_data.hostname)
api_url = urljoin(base_url, "/api/v1/videos/%s" % video_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trolling or what?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGAIN Can you please be more descriptive!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've provided clear working piece of code that you must just copy paste. Instead you introduced mess with base URL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did just copy that😐. The change with the base url is incase the url is received with no scheme defined

'duration': details.get('duration'),
'view_count': details.get('views'),
'like_count': details.get('likes'),
'dislike_count': details.get('dislikes'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int_or_none.

'thumbnail': urljoin(base_url, details['thumbnailPath']) if 'thumbnailPath' in details else None,
'uploader': details.get('account', {}).get('name'),
'uploader_id': details.get('account', {}).get('id'),
'uploder_url': details.get('account', {}).get('url'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_get.

class PeertubeIE(InfoExtractor):
IE_DESC = 'Peertube Videos'
IE_NAME = 'Peertube'
_VALID_URL = r'(?:https?:)//peertube\.touhoppai\.moe\/videos\/watch\/(?P<id>[0-9|a-z]{8}-[0-9|a-z]{4}-[0-9|a-z]{4}-[0-9|a-z]{4}-[0-9|a-z]{12})'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please be more descriptive

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read my previous comments. I'm not going to repeat myself.

Copy link
Author

@parth-verma parth-verma May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is appropriate as the id is always in uuid format which follows the exact same format and the values are hex code so though a-z should be replaced by a-f i believe the rest of the format will remain the same.

Link

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You: [0-9|a-f]. Stackoverflow: [0-9a-f]. Difference? You have to learn to distinguish (...) and [...].

@parth-verma
Copy link
Author

@h-h-h-h made the changes, check now.

@dstftw dstftw closed this in c561b75 May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants