Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-work URL joining to follow RFC 3986 #316

Merged
merged 2 commits into from
May 10, 2023

Conversation

bbayles
Copy link
Contributor

@bbayles bbayles commented May 3, 2023

This PR updates how the absolute_uri method for objects that use the BasePathMixin.

I think the changes related to issue #245 and PR #263 introduced some unusual and surprising behavior. The issue related to handling relative URLs that start with //, like //segment.ts.

Joining URLs is covered by RFC 3986, which specifically describes the // case. From section 5.4.1:
image

Python's built-in urllib.parse.urljoin produces the expected result, but m3u8.parse.urljoin does not:

>>> from urllib.parse import urljoin as python_urljoin
>>> from m3u8.parser import urljoin as m3u8_urljoin

>>> m3u8_urljoin('http://a/b/c/d;p?q', '//g')
'http://a//g'

>>> python_urljoin('http://a/b/c/d;p?q', '//g')
'http://g'

This PR updates m3u8 to prefer the standards-based approach.

@@ -15,7 +15,6 @@
http://stackoverflow.com/questions/2785755/how-to-split-but-ignore-separators-in-quoted-strings-in-python
'''
ATTRIBUTELISTPATTERN = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
URI_PREFIXES = ('https://', 'http://', 's3://', 's3a://', 's3n://')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows playlists to use additional URL schemes without having to update this list. RFC 8216 does mention at least the data:// scheme.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! You remove a line while adding a new feature 👏🏼

@@ -42,7 +42,7 @@ def test_load_should_create_object_from_file_with_relative_segments():
obj = m3u8.load(playlists.RELATIVE_PLAYLIST_FILENAME)
expected_key_abspath = '%s/key.bin' % os.path.dirname(base_uri)
expected_key_path = '../key.bin'
expected_ts1_abspath = '%s/entire1.ts' % base_uri
expected_ts1_abspath = '/entire1.ts'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think files with a leading / were being mis-handled as well.

On a file system, joining /tmp/path/to/somewhere with /entire.ts should produce /entire.ts.

>>> from os.path import join as python_join
>>> python_join('/path/to/somewhere', '/entire1.ts')
'/entire1.ts'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@mauricioabreu mauricioabreu merged commit 52243d7 into globocom:master May 10, 2023
5 checks passed
@SeaHOH
Copy link
Contributor

SeaHOH commented May 11, 2023

Greatly surprised. So, // should not be in a path string? But I have saw someone did this.

@mauricioabreu
Copy link
Member

@SeaHOH maybe Windows paths?

@bbayles
Copy link
Contributor Author

bbayles commented May 11, 2023

I think it's fine to use //, it just needs to mean something standard.

@SeaHOH
Copy link
Contributor

SeaHOH commented May 12, 2023

A HTTP URL, just like this: https://some.domain//path/to/segmentx.ts, the // is in head of the relative path entires, not a host name.

@bbayles
Copy link
Contributor Author

bbayles commented May 12, 2023

I think that particular absolute URL is fine.

The question is, what do you do when you need to have the starting URL http://a/b/c/d;p?q and then are given the relative URL //g?

The answer, going back to 1995 with RFC 1808, has been consistently the same:

@SeaHOH
Copy link
Contributor

SeaHOH commented May 12, 2023

Yes, I just told what I have saw, and be curious about how does it work with web player.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants