Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the URL.include_query_params() to support multiple query string parameters with the same name #1738

Closed
wants to merge 3 commits into from

Conversation

lexabug
Copy link

@lexabug lexabug commented Jul 6, 2022

The URL.include_query_params() overwrites query string parameters with the same name. For example:

from starlette.datastructures import URL, MultiDict

url = URL('my_test_url_example')
query_params = MultiDict([('my_id', '143155'), ('language', 'en'), ('list_p[]', 'item1'), ('list_p[]', 'original_name'), ('list_p[]', 'item_3')])
new_url = url.include_query_params(**query_params)

So the new_url will be:

URL('my_url_example?my_id=143155&language=en&list_p%5B%5D=item_3')

Which is incorrect. The specification of the URLs allows multiple query string parameters with the same name, so the backend that should process that query string must correctly treat such parameters as arrays/lists.

This change adds a new optional argument to the URL.include_query_params() that is handled a container of query string parameters and those parameters are appended to the original query string.

@lexabug lexabug force-pushed the fix-the-include-query-params branch from b650034 to a3e4dba Compare July 6, 2022 08:32
@@ -133,8 +133,17 @@ def replace(self, **kwargs: typing.Any) -> "URL":
components = self.components._replace(**kwargs)
return self.__class__(components.geturl())

def include_query_params(self, **kwargs: typing.Any) -> "URL":
def include_query_params(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how do you include a query param called "items"? 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of about 3 solutions to this issue:

a. use positional-only parameters: def include_query_params(self, items, /, **kwargs): - however, that syntax only works in Python 3.8+, and we still want to support Python 3.7;
b. use an *args parameter (similar to ImmutableMultiDict __init__): def include_query_params(self, *args, **kwargs): - that looks like a fine solution, even if the signature becomes a bit harder to read;
c. define a new method, e.g. def append_query_params(self, items): - that solution also looks fine to me but makes the API a bit heavier;

An intermediary solution would be to define the API as in (b), but raise an error if more than one unnamed argument is passed, so that a migration to (a) becomes possible once support for Python 3.7 can be dropped;

Copy link
Member

@adriangb adriangb Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. use positional-only parameters: def include_query_params(self, items, /, **kwargs): - however, that syntax only works in Python 3.8+, and we still want to support Python 3.7;

We could do:

def include_query_params(self, __items, **kwargs):

This makes __items positional only as far as type checkers are concerned (see https://peps.python.org/pep-0484/#positional-only-arguments) and at runtime you just can't have a query parameter with the name __items, which I hope no one is using. Then when Python 3.7 because the minimum supported version we can enforce it at runtime with your version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the target API is to use a positional parameter, I would rather switch to the def append_query_items(self, *args, **kwargs): syntax and check inside the method body that len(items) < 2 - that is, I prefer my breaking changes to be impossible rather than extremely unlikely. (E.g. there is also potential for someone who does not use a type checker to simply call the function with __items=[], which would break once __items was replaced with the / syntax)

However, that is a question of API design, and I do not know what policy (if any) Starlette has for such API changes.

starlette/datastructures.py Outdated Show resolved Hide resolved
starlette/datastructures.py Outdated Show resolved Hide resolved
starlette/datastructures.py Outdated Show resolved Hide resolved
@lexabug
Copy link
Author

lexabug commented Jul 7, 2022

I've applied suggestions of @adriangb. Take a look.

@aminalaee
Copy link
Member

aminalaee commented Jul 8, 2022

I think it would make sense to include explode flag for serialization, so with explode=False we would get ?id=1,2,3 instead of ?id=1&id=2&id=3.

@lexabug
Copy link
Author

lexabug commented Jul 12, 2022

I think it would make sense to include explode flag for serialization, so with explose=False we would get ?id=1,2,3 instead of ?id=1&id=2&id=3.

This is also achievable with joining items outside the include_query_params() call.

@adriangb
Copy link
Member

@lexabug I would like to take a step back and try to clarify the original use case. It sounds like you are using URL as a standalone structure to parse/manipulate URLs. Is that correct?

@lexabug
Copy link
Author

lexabug commented Jul 12, 2022

@lexabug I would like to take a step back and try to clarify the original use case. It sounds like you are using URL as a standalone structure to parse/manipulate URLs. Is that correct?

The original use case that lead me to this suggestion is: I have a web service that works like an API gateway, so it receives a specific requests, processes it with different middlewares and then the request is sent to a target service. Sometimes original URLs may contain query string parameters representing arrays/lists like user_ids[]=111&user_ids[]=222&user_ids[]=333. In such case my API gateway service was failing to proxy the request correctly, because of the include_query_params() couldn't accept multiple args with the same name.
I hope that is clear.

@adriangb
Copy link
Member

Could you use an external (i.e. non-Starlette) library to do the URL parsing/building, and then hand it off to Starlette? URL parsing is complicated and full of dragons, I would be a bit concern that this sort of change would set the precedent for Starlette providing this functionality (currently it only really provides the minimum required for other functionality in Starlette to work).

@alex-oleshkevich
Copy link
Member

alex-oleshkevich commented Aug 16, 2022

The proposed solution have some drawbacks:

  1. adding a new argument breaks current API contract (people are not used to *args but know **kwargs and the latter one is very common across various frameworks)
  2. what would be the result if both __items and **kwargs passed?
  3. it becomes cumbersome to use include_query_params and replace_query_params in jinja templates because we don't have a flexibility to build MultiDict in the template (you can do that, but the template code will quickly get unreadable). Instead, users would have to write custom jinja plugins or prepare the value somewhere else.

I would like to propose to explore an alternative, where include_query_params and replace_query_params can see iterables (sets, lists, tuples) in kwargs as multi params. This is less invasive and should not break anything:

url.include_query_params(page=1, search='my query', tags=['tag1', 'tag2', 'tag3'])
# ?page=1&search=my%20query&tags=tag1&tags=tag2&tags=tag3

Also, keys in kwargs cannot contain any special characters like brackets, making tags[] not possible to use. This leads to another idea to use dict

url.include_query_params({
    'page': 1,
    'search': 'my query',
    'tags[]': ['tag1', 'tag2', 'tag3'],
})
# ?page=1&search=my%20query&tags[]=tag1&tags[]=tag2&tags[]=tag3

url.replace_query_params({
    'page': 2,
})
# ?page=2&search=my%20query&tags[]=tag1&tags[]=tag2&tags[]=tag3

This is the most flexible solution of all I know, but it is the most complicated and definitely a breaking change. It may make sense to introduce the third method update_query_params to do the same thing.


Another point worth to mention is that there is no any common naming convention for multiparams exists. Some frameworks expect tags[]=tag1&tags[]=tag2, some like tags=tag1&tag=tag2, others do tags=tag1,tag2,tag3. But Starlette would need to choose one.

@alex-oleshkevich
Copy link
Member

@adriangb while it is achievable by extra coding, I am sure that URL manipulation is one of the basic features of web frameworks and should be in the Starlette's core. We already have URL class which is incomplete in this sense.

@adriangb
Copy link
Member

Do Django and/or Flask have in-depth URL manipulation utilities?

@alex-oleshkevich
Copy link
Member

I don't know. What I wanted to say is that if Starlette provides a tool to manipulate URLs it should be complete. Lists in query parameters are a pretty common thing.

@jhominal
Copy link
Member

jhominal commented Aug 31, 2022

Given all the discussions and back and forths about include_query_params, I would suggest that we leave include_query_params() alone, and add an append_query_params method, taking (e.g.) an Iterable[Tuple[str, str]] as an argument. (I am pondering whether such a method should use *args or not)

@Kludex Kludex mentioned this pull request Oct 3, 2022
11 tasks
@alex-oleshkevich
Copy link
Member

After thinking a lot, I agree that append_query_params is a nice add-on that comes along with the rest of methods of the URL class.

@Kludex Kludex mentioned this pull request Feb 14, 2023
8 tasks
@Kludex
Copy link
Member

Kludex commented Mar 9, 2023

How did you folks overcome this? Is there still a need for it? 👀

@alex-oleshkevich
Copy link
Member

I haven't found anything better than this yet.
https://github.com/alex-oleshkevich/ohmyadmin/blob/master/ohmyadmin/ordering.py#L78

@Kludex
Copy link
Member

Kludex commented Mar 9, 2023

I haven't found anything better than this yet. alex-oleshkevich/ohmyadmin@master/ohmyadmin/ordering.py#L78

2 lines... Problem solved? 👀

@alex-oleshkevich
Copy link
Member

I haven't found anything better than this yet. alex-oleshkevich/ohmyadmin@master/ohmyadmin/ordering.py#L78

2 lines... Problem solved? 👀

In templates it is very inconvenient to do like this.

@Kludex
Copy link
Member

Kludex commented Mar 9, 2023

Do you still think the append method is the best solution here?

@alex-oleshkevich
Copy link
Member

Do you still think the append method is the best solution here?

Yes, it solves the problem.

@Kludex
Copy link
Member

Kludex commented Jun 20, 2023

Do you still think the append method is the best solution here?

Yes, it solves the problem.

PR welcome for append_query_params.

Thanks for the discussion everybody, and the PR @lexabug . 🙏

@Kludex Kludex closed this Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants