Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recognize hierarchical URIs with authority #6

Closed
stain opened this issue Jan 19, 2018 · 2 comments
Closed

Recognize hierarchical URIs with authority #6

stain opened this issue Jan 19, 2018 · 2 comments

Comments

@stain
Copy link

stain commented Jan 19, 2018

URI does not recognize URI authority generally, that is scheme://authority/path and somehow mangles it to a non-hierarchical URNs.

Examples of mishandling:

>>> a = uri.URI("ldap://[2001:db8::7]/c=GB?objectClass?one")
>>> a
URI('ldap://[2001:db8::7]/c=GB?objectClass?one')
>>> a.resolve("/c=NO")
URI('/c=NO')  # expected: ldap://[2001:db8::7]/c=NO
>>> a = uri.URI("sftp://example.com/etc/passwd")
>>> a
URI('sftp:example.com/etc/passwd')  # expected // to be preserved
>>> a.resolve("/root")
URI('sftp:/root')  # expected sftp://example.com/root
>>> a = uri.URI("app://01e36b38-39a4-48b2-88d0-6e82717ee87f/nested/example")
>>> a
URI('app:01e36b38-39a4-48b2-88d0-6e82717ee87f/nested/example') # expected // to be preserved
>>> a.resolve("/folder/")
URI('app:/folder') # Expected app://01e36b38-39a4-48b2-88d0-6e82717ee87f/folder/

Instead there seems to be special handling of a few, selected schemes. This should be treated according to RFC3986 and respect // in the incoming URI, no matter which scheme.

@amcgregor
Copy link
Member

amcgregor commented Oct 29, 2018

Curious! Indeed, the .resolve() method isn't applying those fragments properly.

>>> a = uri.URI("ldap://[2001:db8::7]/c=GB?objectClass?one"); a
URI('ldap://[2001:db8::7]/c=GB?objectClass?one')
>>> a / "/c=NO"
URI('ldap://[2001:db8::7]/c=NO')

While the "resolve" method does not correctly handle this case, the division operator does. It ought to correctly resolve this, however. Thanks for reporting! The second case isn't quite as simple.

>>> a = URI('sftp:example.com/etc/passwd'); a
URI('sftp:example.com/etc/passwd')  # looks right, but this is an illusion
>>> a.authority  # no host component
''

In this particular case, : is insufficient. The sftp scheme defines the following syntax:

sftp://[<user>[;fingerprint=<host-key fingerprint>]@]<host>[:<port>]/<path>/<file>

While adding the // does allow it to parse correctly, the REPR is not an identity transform. (This is also an issue.)

>>> a = URI('sftp://example.com/etc/passwd'); a
URI('sftp:example.com/etc/passwd')
>>> a.authority
'example.com'

I'll need to add sftp as a recognized URL-like protocol via https://github.com/marrow/uri/blob/develop/setup.py#L98

The last case is explained by non-URL URI treating everything after the authority part as path. For most URI schemes (e.g. URN, ISBN, etc., etc.) the part after the protocol is protocol-defined and arbitrary.

@amcgregor
Copy link
Member

amcgregor commented Oct 29, 2018

Adding the registration of sftp = uri.scheme:URLScheme to setup.py and issuing a setup.py develop call to link it, SFTP URI now behave correctly:

>>> a = uri.URI("sftp://example.com/etc/passwd"); a
URI('sftp://example.com/etc/passwd')
>>> a.resolve("/root")
URI('sftp://example.com/root')
>>> uri.URI("sftp:example.com/etc/passwd")
URI('sftp://example.com/etc/passwd')

This'll be bundled in the next release, this week. Available in commit a8718ac. (If you are impatient, you can link this entry_point in your own application; just remember to remove your own registration of it after you update your URI version.)

@amcgregor amcgregor added this to the 2.0.1 milestone Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants