Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[universcience] Add new extractor #10405

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

[universcience] Add new extractor #10405

wants to merge 6 commits into from

Conversation

flatgreen
Copy link
Contributor

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

What is the purpose of your pull request?

  • Bug fix
  • New extractor
  • New feature

Description of your pull request and other information

This is an extractor for a french science videos site : universcience.tv

example

format_url = xpath_text(media_source, 'source', fatal=True)
media_label = xpath_attr(media_source, './streaming_type', 'label')
media_width = self._search_regex(
r'.* (\d*) x \d*', media_label, 'width', default='None', fatal=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.* at border does not make any sense. With \d* you allow empty match. Using default implies non fatal. What is the point of 'None' string for default?

@flatgreen
Copy link
Contributor Author

Here the new version. I followed your advice (thank you). Regexp is always a work in progress (for me !).
Tell me if it suits you.

@@ -116,6 +116,7 @@
from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canalu import CanalUIE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrelevant to this PR.

@flatgreen
Copy link
Contributor Author

OK, the "import" is irrelevant.
I don't understand the 're.sub'. I believe the ytdl function is more efficient (?).

@TRox1972
Copy link
Contributor

@flatgreen The line @dstftw mentions removes anything matching \d* x \d* from media_label, so instead just use re.sub like this: re.sub(' \d* x \d*', '', media_label).

@flatgreen
Copy link
Contributor Author

string manipulation vs extraction. @TRox1972, @dstftw : Thank you

Here the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants