Copy url params in format_parsed_parts #101

rahuliyer95 · 2023-05-12T17:53:26Z

Similar to #95 I noticed that when creating sub-paths using / or __truediv__ we are not copying the url params.

Example,

>>> from upath import UPath
>>> base = UPath("s3://bucket/dir?param1=value1")
>>> base._url
SplitResult(scheme='s3', netloc='bucket', path='/dir', query='param1=value1', fragment='')
>>> file = base / "file1.txt"
>>> file._url
SplitResult(scheme='s3', netloc='bucket', path='/dir/file1.txt', query='', fragment='')

This PR attempts to resolve this by copying the query parameters when creating sub-paths using /.

Results with the fix,

>>> from upath import UPath
>>> base = UPath("s3://bucket/dir?param1=value1")
>>> base._url
SplitResult(scheme='s3', netloc='bucket', path='/dir', query='param1=value1', fragment='')
>>> file = base / "file1.txt"
>>> file._url
SplitResult(scheme='s3', netloc='bucket', path='/dir/file1.txt', query='param1=value1', fragment='')

rahuliyer95 · 2023-05-12T18:56:42Z

Requesting Review: @ap-- @normanrz

ap--

Hi @rahuliyer95

Thank you for opening the PR! ❤️

I think we'll have to check how this fits together with: #88 and #92

As it stands now, I believe the test assumption is wrong, because query parameters have to be provided on a per "file" basis and should not carry over.

I understand though, that this might be a common use case, so maybe we could provide some other way to globally set query parameters...

Cheers,
Andreas 😃

ap-- · 2023-05-13T10:27:14Z

upath/tests/cases.py

+        path1 = self.path / f"test_joinpath_dir1?{query}"
+        assert path1._url.query == query
+        path2 = path1 / "file1.txt"
+        assert path2._url.query == query


This assumption is probably incorrect. See:

>>> import urllib.parse >>> urllib.parse.urljoin("http://www.example.com/dir/?p1=v1&p2=v2", "file.txt") 'http://www.example.com/dir/file.txt'

rahuliyer95 · 2023-05-15T18:38:59Z

Hi @rahuliyer95

Thank you for opening the PR! ❤️

I think we'll have to check how this fits together with: #88 and #92

As it stands now, I believe the test assumption is wrong, because query parameters have to be provided on a per "file" basis and should not carry over.

I understand though, that this might be a common use case, so maybe we could provide some other way to globally set query parameters...

Cheers, Andreas 😃

Thanks for getting back on this, we are using upath to read data from different namespaces in our cloud backend and the credentials for these namespaces are different. We are trying to add an identifier for these namespaces via query params, example

from upath import UPath

dir1 = UPath("scheme://bucket/dir1?namespace=ns1")
dir2 = UPath("scheme://bucket/dir2?namespace=ns2")

now dir1 and dir2 will use proper credentials for ns1 and ns2 respectively because that's how we have written our custom fsspec.Filesystem implementation which uses the _get_kwargs_from_urls to parse the namespace parameter and set the appropriate credentials. This helps us a lot abstract away from some this internals of the filesystem and just pass around paths in our code which works seamlessly.

ap-- · 2023-05-18T03:25:45Z

Thank you for the additional context!

Is there a reason why you chose to implement it via query parameters?

Will there ever be multiple namespaces for the same bucket/dir combination?
Or is a mapping from (bucket, dir) to namespace unique?

As I understand right now: you were basically looking for a way to encode which credentials to use into the string representation of the url.

rahuliyer95 · 2023-05-18T16:27:46Z

Is there a reason why you chose to implement it via query parameters?

For our use-case this path parameter can be provided by the user and the API contract we set is that the user only needs to provide the right path with a namespace so that the right credentials can be automatically used and the user of the API need not worry about additional configuration.

Will there ever be multiple namespaces for the same bucket/dir combination?
Or is a mapping from (bucket, dir) to namespace unique?

There could be multiple namespace for the same bucket/dir combination. e.g: bucket/dir can exist in both ns1 and ns2 but at a given time the user would only accessing bucket/dir from either ns1 and ns2 and we would request the user to create two paths one for each namespace so that proper credentials are used for accessing them.

As I understand right now: you were basically looking for a way to encode which credentials to use into the string representation of the url.

Yes, without having to expose actual credentials in the URL itself.

drernie · 2023-07-12T03:02:43Z

FYI, I had a very similar situation. We ultimately ended up requiring the user to specify the credentials via an environment variable, because there were too many places the URI could be logged, which would create a security vulnerability.

rahuliyer95 · 2023-07-12T17:26:03Z

To clarify I am not proposing to share the actual credentials in the URL params, just an id or a reference to where the credentials are present; e.g: name of the environment variable to read for the credentials.

rahuliyer95 · 2023-11-01T17:44:18Z

Managed to solve this using kwargs of the UPath by subclassing the S3Path

class MyS3PathWithNamespace(S3Path):
    @classmethod
    def _from_parts(
        cls,
        args: List[Union[str, os.PathLike]],
        url: Optional[SplitResult] = None,
        **kwargs: Any,
    ):
        if url:
            query = parse_qs(url.query)
            if namespace := next(iter(query.get("namespace", [])), ""):
                kwargs["namespace"] = namespace
        return super()._from_parts(args, url=url, **kwargs)

Copy url params in format_parsed_parts

ded54d6

rahuliyer95 force-pushed the issue-95/update-url-in-joinpath branch from 53eac9a to ded54d6 Compare May 12, 2023 18:21

ap-- reviewed May 13, 2023

View reviewed changes

rahuliyer95 closed this Nov 1, 2023

rahuliyer95 deleted the issue-95/update-url-in-joinpath branch November 1, 2023 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy url params in format_parsed_parts #101

Copy url params in format_parsed_parts #101

rahuliyer95 commented May 12, 2023 •

edited

Loading

rahuliyer95 commented May 12, 2023 •

edited

Loading

ap-- left a comment

ap-- May 13, 2023

rahuliyer95 commented May 15, 2023

ap-- commented May 18, 2023 •

edited

Loading

rahuliyer95 commented May 18, 2023

drernie commented Jul 12, 2023

rahuliyer95 commented Jul 12, 2023

rahuliyer95 commented Nov 1, 2023

Copy url params in format_parsed_parts #101

Copy url params in format_parsed_parts #101

Conversation

rahuliyer95 commented May 12, 2023 • edited Loading

rahuliyer95 commented May 12, 2023 • edited Loading

ap-- left a comment

Choose a reason for hiding this comment

ap-- May 13, 2023

Choose a reason for hiding this comment

rahuliyer95 commented May 15, 2023

ap-- commented May 18, 2023 • edited Loading

rahuliyer95 commented May 18, 2023

drernie commented Jul 12, 2023

rahuliyer95 commented Jul 12, 2023

rahuliyer95 commented Nov 1, 2023

rahuliyer95 commented May 12, 2023 •

edited

Loading

rahuliyer95 commented May 12, 2023 •

edited

Loading

ap-- commented May 18, 2023 •

edited

Loading