New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL builder utils #1675
URL builder utils #1675
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1675 +/- ##
==========================================
+ Coverage 51.26% 51.36% +0.09%
==========================================
Files 242 243 +1
Lines 14358 14383 +25
==========================================
+ Hits 7361 7388 +27
+ Misses 6997 6995 -2 |
f1b7283
to
1e5cecb
Compare
This fell through and I didn't ever merge it, sorry @beardypig. Do you want to rebase and we can get this in? |
well, I tried to rebase it
|
I cannot work out why it's failing on 3.7 right now ... When I have some time I will look at it, unless @back-to is able to work it out :) |
57d86dd
to
1e5cecb
Compare
I know why it fails
https://docs.python.org/3/library/re.html#re.finditer https://docs.python.org/3/whatsnew/3.7.html#changes-in-the-python-api here is something similar https://bugs.python.org/issue33585
that has maybe something todo with it, import re
test_html = """
<title>Title</title>
<meta property="og:type" content= "website" />
<meta property="og:url" content="http://test.se/"/>
<meta property="og:site_name" content="Test" />
<script src="https://test.se/test.js"></script>
<link rel="stylesheet" type="text/css" href="https://test.se/test.css">
<script>Tester.ready(function () {
alert("Hello, world!"); });</script>
<a
href="http://test.se/foo">bar</a>
"""
tag_re = re.compile(r'''(?=<(?P<tag>[a-zA-Z]+)(?P<attr>.*?)(?P<end>/)?>(?:(?P<inner>.*?)</\s*(?P=tag)\s*>)?)''', re.MULTILINE | re.DOTALL)
print([m.groups() for m in tag_re.finditer(test_html)]) Python 3.7 [('title', '', None, 'Title'),
('meta', ' property="og:type" content= "website" ', '/', 'Title'),
('meta', ' property="og:url" content="http://test.se/"', '/', None),
('meta', ' property="og:site_name" content="Test" ', '/', None),
('script', ' src="https://test.se/test.js"', '/', ''),
('link', ' rel="stylesheet" type="text/css" href="https://test.se/test.css"', None, ''),
('script', '', None, 'Tester.ready(function () {\nalert("Hello, world!"); });'),
('a', '\nhref="http://test.se/foo"', None, 'bar')] Python 3.6 [('title', '', None, 'Title'),
('meta', ' property="og:type" content= "website" ', '/', None),
('meta', ' property="og:url" content="http://test.se/"', '/', None),
('meta', ' property="og:site_name" content="Test" ', '/', None),
('script', ' src="https://test.se/test.js"', None, ''),
('link', ' rel="stylesheet" type="text/css" href="https://test.se/test.css"', None, None),
('script', '', None, 'Tester.ready(function () {\nalert("Hello, world!"); });'),
('a', '\nhref="http://test.se/foo"', None, 'bar')] |
I must be missing something - I cannot see why the |
with removing the diff --git a/src/streamlink/plugin/api/utils.py b/src/streamlink/plugin/api/utils.py
index f6f3f12..00445e2 100644
--- a/src/streamlink/plugin/api/utils.py
+++ b/src/streamlink/plugin/api/utils.py
@@ -7,9 +7,9 @@ from ...utils import parse_qsd as parse_query, parse_json, parse_xml
__all__ = ["parse_json", "parse_xml", "parse_query"]
-tag_re = re.compile('''(?=<(?P<tag>[a-zA-Z]+)(?P<attr>.*?)(?P<end>/)?>(?:(?P<inner>.*?)</\s*(?P=tag)\s*>)?)''',
+tag_re = re.compile(r'''<(?P<tag>[a-zA-Z]+)(?P<attr>.*?)(?P<end>/)?>(?:(?P<inner>.*?)</\s*(?P=tag)\s*>)?''',
re.MULTILINE | re.DOTALL)
-attr_re = re.compile('''\s*(?P<key>[\w-]+)\s*(?:=\s*(?P<quote>["']?)(?P<value>.*?)(?P=quote)\s*)?''')
+attr_re = re.compile(r'''\s*(?P<key>[\w-]+)\s*(?:=\s*(?P<quote>["']?)(?P<value>.*?)(?P=quote)\s*)?''')
Tag = namedtuple("Tag", "tag attributes text")
@@ -26,4 +26,3 @@ def itertags(html, tag):
if match.group("tag") == tag:
attrs = dict((a.group("key").lower(), a.group("value")) for a in attr_re.finditer(match.group("attr")))
yield Tag(match.group("tag"), attrs, match.group("inner"))
-
|
Not sure that the positive look ahead is even required, might have been left over from a previous |
bede436
to
ed2d14d
Compare
ed2d14d
to
974725e
Compare
@@ -4,26 +4,27 @@ | |||
|
|||
from streamlink import NoPluginError | |||
from streamlink.plugin import Plugin | |||
from streamlink.plugin.api import http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http
can be removed, it got added with a rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True that, forgot to clean up after I rebased. I saw it (using PyCharm to do the rebase), and i was going to tidy it up but I forgot :)
it can be tested with the
here is a posible fix back-to@ab7d6bf |
9a70ad1
to
3e87603
Compare
3e87603
to
e449aa3
Compare
@back-to I updated the test html with your changes, and added a test for inner/outer tags - to maintain the existing behaviour. I think this might be a bug in Python. I have created a bug report, so we can see what they say. |
well, this might take awhile. Maybe the tests should be ignored on py37 and py38 for now, As the Issue was merged in #1693 and it is not really related to this PR, also
|
🤦♂️ The rebase for this one got complex :) I'll add a |
due to a possible issue with re.finditer and positive lookaheads
I updated the tests so those that were failing allow failure on 3.7+. With a note to monitor bpo-34294. |
@gravyboat, I think this is OK to merge with the exception that it might be wonky on Python 3.7. |
@beardypig Sounds good, changes look good as well. We can always open another issue depending on what occurs. |
@gravyboat I opened an issue to remind us :) |
streamlink 1.3.1 (2020-01-27) A small patch release that addresses the removal of MPV's legacy option syntax, also with fixes of several plugins, the addition of the --twitch-disable-reruns parameter and dropped support for Python 3.4. streamlink 1.3.0 (2019-11-22) A new release with plugin updates and fixes, including Twitch.tv (see #2680), which had to be delayed due to back and forth API changes. The Twitch.tv workarounds mentioned in #2680 don't have to be applied anymore, but authenticating via --twitch-oauth-token has been disabled, regardless of the origin of the OAuth token (via --twitch-oauth-authenticate or the Twitch website). In order to not introduce breaking changes, both parameters have been kept in this release and the user name will still be logged when using an OAuth token, but receiving item drops or accessing restricted streams is not possible anymore. Plugins for the following sites have also been added: albavision news.now.com twitcasting.tv viu.tv vlive.tv willax.tv streamlink 1.2.0 (2019-08-18) Here are the changes for this month's release Multiple plugin fixes Fixed single hyphen params at the beginning of --player-args (#2333) --http-proxy will set the default value of --https-proxy to same as --http-proxy. (#2536) DASH Streams will handle headers correctly (#2545) the timestamp for FFMPEGMuxer streams will start with zero (#2559) streamlink 1.1.1 (2019-04-02) This is just a small patch release which fixes a build/deploy issue with the new special wheels for Windows on PyPI. (#2392) streamlink 1.0.0 (2019-01-30) The celebratory release of Streamlink 1.0.0! A lot of hard work has gone into getting Streamlink to where it is. Not only is Streamlink used across multiple applications and platforms, but companies as well. Streamlink started from the inaugural fork of Livestreamer on September 17th, 2016. Since then, We've hit multiple milestones: Over 886 PRs Hit 3,000 commits in Streamlink Obtaining our first sponsors as well as backers of the project The creation of our own logo (streamlink/streamlink#1123) Thanks to everyone who has contributed to Streamlink (and our backers)! Without you, we wouldn't be where we are today. Without further ado, here are the changes in release 1.0.0: We have a new icon / logo for Streamlink! (streamlink/streamlink#2165) Updated dependencies (streamlink/streamlink#2230) A ton of plugin updates. Have a look at this search query for all the recent updates. You can now provide a custom key URI to override HLS streams (streamlink/streamlink#2139). For example: --hls-segment-key-uri <URI> User agents for API communication have been updated (streamlink/streamlink#2194) Special synonyms have been added to sort "best" and "worst" streams (streamlink/streamlink#2127). For example: streamlink --stream-sorting-excludes '>=480p' URL best,best-unfiltered Process output will no longer show if tty is unavailable (streamlink/streamlink#2090) We've removed BountySource in favour of our OpenCollective page. If you have any features you'd like to request, please open up an issue with the request and possibly consider backing us! Improved terminal progress display for wide characters (streamlink/streamlink#2032) Fixed a bug with dynamic playlists on playback (streamlink/streamlink#2096) Fixed makeinstaller.sh (streamlink/streamlink#2098) Old Livestreamer deprecations and API references were removed (streamlink/streamlink#1987) Dependencies have been updated for Python (streamlink/streamlink#1975) Newer and more common User-Agents are now used (streamlink/streamlink#1974) DASH stream bitrates now round-up to the nearest 10, 100, 1000, etc. (streamlink/streamlink#1995) Updated documentation on issue templates (streamlink/streamlink#1996) URL have been added for better processing of HTML tags (streamlink/streamlink#1675) Fixed sort and prog issue (streamlink/streamlink#1964) Reformatted issue templates (streamlink/streamlink#1966) Fixed crashing bug with player-continuous-http option (streamlink/streamlink#2234) Make sure all dev dependencies (streamlink/streamlink#2235) -r parameter has been replaced for --rtmp-rtmpdump (streamlink/streamlink#2152) Breaking changes: A large number of unmaintained or NSFW plugins have been removed. You can find the PR that implemented that change here: streamlink/streamlink#2003 . See our CONTRIBUTING.md documentation for plugin policy.
streamlink 1.3.1 (2020-01-27) A small patch release that addresses the removal of MPV's legacy option syntax, also with fixes of several plugins, the addition of the --twitch-disable-reruns parameter and dropped support for Python 3.4. streamlink 1.3.0 (2019-11-22) A new release with plugin updates and fixes, including Twitch.tv (see #2680), which had to be delayed due to back and forth API changes. The Twitch.tv workarounds mentioned in #2680 don't have to be applied anymore, but authenticating via --twitch-oauth-token has been disabled, regardless of the origin of the OAuth token (via --twitch-oauth-authenticate or the Twitch website). In order to not introduce breaking changes, both parameters have been kept in this release and the user name will still be logged when using an OAuth token, but receiving item drops or accessing restricted streams is not possible anymore. Plugins for the following sites have also been added: albavision news.now.com twitcasting.tv viu.tv vlive.tv willax.tv streamlink 1.2.0 (2019-08-18) Here are the changes for this month's release Multiple plugin fixes Fixed single hyphen params at the beginning of --player-args (#2333) --http-proxy will set the default value of --https-proxy to same as --http-proxy. (#2536) DASH Streams will handle headers correctly (#2545) the timestamp for FFMPEGMuxer streams will start with zero (#2559) streamlink 1.1.1 (2019-04-02) This is just a small patch release which fixes a build/deploy issue with the new special wheels for Windows on PyPI. (#2392) streamlink 1.0.0 (2019-01-30) The celebratory release of Streamlink 1.0.0! A lot of hard work has gone into getting Streamlink to where it is. Not only is Streamlink used across multiple applications and platforms, but companies as well. Streamlink started from the inaugural fork of Livestreamer on September 17th, 2016. Since then, We've hit multiple milestones: Over 886 PRs Hit 3,000 commits in Streamlink Obtaining our first sponsors as well as backers of the project The creation of our own logo (streamlink/streamlink#1123) Thanks to everyone who has contributed to Streamlink (and our backers)! Without you, we wouldn't be where we are today. Without further ado, here are the changes in release 1.0.0: We have a new icon / logo for Streamlink! (streamlink/streamlink#2165) Updated dependencies (streamlink/streamlink#2230) A ton of plugin updates. Have a look at this search query for all the recent updates. You can now provide a custom key URI to override HLS streams (streamlink/streamlink#2139). For example: --hls-segment-key-uri <URI> User agents for API communication have been updated (streamlink/streamlink#2194) Special synonyms have been added to sort "best" and "worst" streams (streamlink/streamlink#2127). For example: streamlink --stream-sorting-excludes '>=480p' URL best,best-unfiltered Process output will no longer show if tty is unavailable (streamlink/streamlink#2090) We've removed BountySource in favour of our OpenCollective page. If you have any features you'd like to request, please open up an issue with the request and possibly consider backing us! Improved terminal progress display for wide characters (streamlink/streamlink#2032) Fixed a bug with dynamic playlists on playback (streamlink/streamlink#2096) Fixed makeinstaller.sh (streamlink/streamlink#2098) Old Livestreamer deprecations and API references were removed (streamlink/streamlink#1987) Dependencies have been updated for Python (streamlink/streamlink#1975) Newer and more common User-Agents are now used (streamlink/streamlink#1974) DASH stream bitrates now round-up to the nearest 10, 100, 1000, etc. (streamlink/streamlink#1995) Updated documentation on issue templates (streamlink/streamlink#1996) URL have been added for better processing of HTML tags (streamlink/streamlink#1675) Fixed sort and prog issue (streamlink/streamlink#1964) Reformatted issue templates (streamlink/streamlink#1966) Fixed crashing bug with player-continuous-http option (streamlink/streamlink#2234) Make sure all dev dependencies (streamlink/streamlink#2235) -r parameter has been replaced for --rtmp-rtmpdump (streamlink/streamlink#2152) Breaking changes: A large number of unmaintained or NSFW plugins have been removed. You can find the PR that implemented that change here: streamlink/streamlink#2003 . See our CONTRIBUTING.md documentation for plugin policy.
* utils: add some URL manipulation methods * utils: method to find html tags using regex * plugins.gardenersworld: use new itertags method to find iframes * plugins.tf1: use update_qsd method to update hls url * add an extra test for inner tags that should generate separate matches * fix rebase issues * allow tests to fail with Python 3.7+ due to a possible issue with re.finditer and positive lookaheads * use OrderedDict in update_qsd to maintain stable query argument ordering
As part of the solution for the issues raised in #1519 by @amurzeau, I have started adding some URL utils that can be used in plugins to better manipulate URLs.
As well as a useful method for iterating through HTML tags, using regex in a brute force kind of way - like most of the plugins do.
update_qsd
andurl_concat
are the two URL manipulation methods.url_concat
will join together URL parts to make a URL, andupdate_qsd
can be used to add, remove or update query string parameters in a URL.I updated a couple of plugins as examples.