-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change nested query parsing behavior of "foo[]" from {"foo" => [nil]} to {"foo" => []} #1696
Comments
Can you get a browser to submit that via a form submission? As far as I'm aware, browsers will always include an equal sign, even if the right hand side would be empty. Looking at RFC 1866, it appears that a query string of |
Hi @jeremyevans 👋 Not sure you can get a form to submit like this, I haven't tried though. I looked at RFC3986 and it's... interesting. So looking at the ABNF for URIs it seems valid that key can go without value:
edit: Jeremy corrected me that of course they are escaped and hence valid The RFC also says it's frequently used as key value, but not always:
Seems to me that this way |
From a URL perspective, the query portion is opened ended. That's because different URL schemas could have different query formats. However, for HTTP, RFC 1866 applies, as the query for HTTP should be in Rack's specific handling of |
Note that |
The problem with escaping So the only question is what is the use case and how did you create such a query string? |
I'm okay with this change generally, but can you make a PR so we can review? |
@ioquatix can you describe briefly why are in favor of this change, considering that |
Regarding this particular behaviour, if it's a bug in faraday, I'm less inclined to change it. But I don't think it's an unreasonable request. So, I want to see an actual PR so we can see what specs need to be changed and the potential impact. This is on the OP to deliver. Regarding consistency, PHP behaves similar as Rack as implemented:
The difference is PHP sets the variable to an empty string, so a lack of @PragTob can you survey other implementations to see what is the normal behaviour? Can you also comment on whether this is a bug with Faraday? |
I'm not totally opposed to As far as I can tell from the faraday issue, they chose this path because it appeared to work with Rails. I'm not sure whether the Rails behavior is intentional or just an implementation detail in Rails. There's a big problem with accepting invalid input. Invalid input becomes cancerous. If a server accepts invalid input, clients will send invalid input considering it valid, and then try to use the invalid input with other servers, and complain to other servers that don't accept it. That's basically what is happening here. See https://tools.ietf.org/id/draft-thomson-postel-was-wrong-03.html and https://news.ycombinator.com/item?id=9824638. |
I agree that we shouldn't accept invalid input. But the RFC is not clear that this is invalid input? If we can confirm it's invalid according to the RFC, I would follow that. Does the query parser throw exceptions for any other kind of input? |
@ioquatix https://tools.ietf.org/html/rfc1866#section-8.2 describes the Yes, the parser already raises exceptions in some cases:
|
It does appear the Rails behavior is by design: https://github.com/rails/rails/blob/master/actionpack/test/dispatch/request/query_string_parsing_test.rb#L96-L103 Fun fact, the stripping of nils from parameter arrays was introduced in Rails as a security fix for a bug in ActiveRecord: rails/rails@060c91c This was later changed to just remove nils from arrays, leaving the array empty, and only because ActiveRecord had been fixed to handle things correctly: rails/rails@8f8ccb9 |
👋 Hey thanks for all the input and thanks Jeremy for the corrections to my erroneous view of the RFC 💚 I can look at some implementations in other languages and see what they do sure :) I can provide a PR, I already had/have the code - it only changes 2 lines and has everything passing so it's easy. As for Faraday, it was a bugfix for lostisland/faraday#800 which itself was basically a bug fix for a json_api_client issue: JsonApiClient/json_api_client#278 - so it was definitely an intended bugfix. I'll check the JSON API spec to see if I can quickly find if they consider that valid or if it was just a wrong interpretation. as an aside, I'd like it if there was a way to represent an empty array vs. array with an empty string. Standard conformity of course trumps how I feel though 😅 Also, thanks for the quick feedback everyone! 🚀 |
PR sits at #1697 - I lied it's 3 lines that need changing one assertion was in the same test after the one I thought was the only test needed changing. |
Prasing/DeserializationJavaScript - qs > var qs = require('qs');
undefined
> qs.parse('foo[]');
{ foo: [ '' ] } JavaScript = query-string > const queryString = require('query-string');
undefined
> queryString.parse('foo[]', {arrayFormat: 'bracket'})
[Object: null prototype] { foo: [ null ] }
> queryString.parse('foo[]=', {arrayFormat: 'bracket'})
[Object: null prototype] { foo: [ '' ] } Elixir - stdlib iex(1)> URI.decode_query("foo[]")
%{"foo[]" => nil}
iex(2)> URI.decode_query("foo[]=")
%{"foo[]" => ""} Elixir - plug rack equivalent iex(2)> Plug.Conn.Query.decode("foo[]")
%{"foo" => []}
iex(3)> Plug.Conn.Query.decode("foo[]=")
%{"foo" => [""]} Python - stdlib Python 3.8.4 (default, Jul 21 2020, 09:25:23)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import urlparse
>>> from urllib.parse import parse_qs
>>> parse_qs("foo[]")
{}
>>> parse_qs("foo[]=")
{}
>>> parse_qs("foo[]=1")
{'foo[]': ['1']} Python - querystring-parser >>> from querystring_parser import parser
>>> parser.parse("foo[]")
Traceback (most recent call last):
File "/home/tobi/.asdf/installs/python/3.8.4/lib/python3.8/site-packages/querystring_parser/parser.py", line 137, in parse
(var, val) = element.split("=")
ValueError: not enough values to unpack (expected 2, got 1)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tobi/.asdf/installs/python/3.8.4/lib/python3.8/site-packages/querystring_parser/parser.py", line 146, in parse
raise MalformedQueryStringError
querystring_parser.parser.MalformedQueryStringError
>>> parser.parse("foo[]=")
{'foo': {'': ''}}
>>> parser.parse("foo[]=1")
{'foo': {'': 1}} From the surveyed implementations, only plug from elixir parses it as this issue suggests. Behavior is generally all over the place. SerializationIt seemed fun/interesting to check what some implementations think how it should be serialized (aka is faraday doing the right thing) Elixir - Plug iex(1)> Plug.Conn.Query.encode(%{foo: []})
""
iex(3)> Plug.Conn.Query.encode(%{foo: [""]})
"foo[]=" JavaScript - jQuery $.param({foo: []})
""
$.param({foo: [""]})
"foo%5B%5D=" JavaScript - qs > var qs = require('qs');
undefined
> qs.stringify({foo: []})
''
> qs.stringify({foo: ['']})
'foo%5B0%5D=' The surveyed serializers all seem to serialize it as empty string suggesting that the old faraday behavior was correct and the new behavior might be a bug. Any big library we should check? I didn't find a good Java answer but my Java package days are also way behind me. I'd be interested to now what Apache thinks/does but I'm not sure how to get to that if Apache itself even cares about that :) |
Thanks everyone and @PragTob in particular for the discussion around this.
My understanding is that this particular case is not strictly defined in the RFC spec (but I'll join your search for references to it) and existing libraries seem to converge towards the Without that change, parsing That is simply my view of course, but I'd really hate is to rollback to an illogical implementation even though there's no official RFC asking for it, and simply because other libraries have adopted it over the years. |
Thanks @iMacTia for chiming in 👋 I skimmed the latest JSON API spec (as the original request to faraday came from a JSON API client bug) and couldn't find any defined usage of the |
@PragTob I didn't have much time to go through it, but I found this one: https://jsonapi.org/format/#crud-updating-to-many-relationships (check 2nd example after "And the following request clears every tag for an article"). Now, I know this is a If we considered |
@iMacTia yeah but as you say that should be in the body and it's also JSON so I don't think it's too important here. I had hoped that their filtering section or something had something about how index end points should react/what the format should be there (as those would commonly be query string parameters) but it's intentionally left open and without rules. |
Alright, I might have found something! I started at RFC1866 but that's obsolete, as it's for HTML 2 I went to the newest HTML standard I found which is 5.2. The form submission section eventually links to the URL standard application/x-www-form-urlencoded First, I want everyone to read the introduction, because well... 🤣 :
With that out of the way, it specifies an algorithm for parsing, which simplified is:
Which imo speaks for the correctly parsed value being Maybe there is such a thing somewhere 🤷 |
That introduction definitely made my day 😂 |
@PragTob Thank you for your research on this. With the updated standard, I agree with you that |
I think the current behaviour makes the most sense to me, as an extrapolation of general rules.
Ergo, each value-less instance of That being said, although it requires an explicit extra rule, the proposed change here (which is kinda upstreaming the Rails |
The URL spec section 5.1.3.3 specifies that if = is not present in the byte sequence, it should be treated as if the byte sequence is the name of the tuple and the value is the empty string. This affects all parameters without =, not just arrays: ```ruby Rack::Utils.parse_nested_query("foo[bar]&baz[]&quux") {"foo"=>{"bar"=>nil}, "baz"=>[nil], "quux"=>nil} # Before {"foo"=>{"bar"=>""}, "baz"=>[""], "quux"=>""} # After ``` Fixes rack#1696
@matthewd I like that you are thinking not just about what aligns with the RFC, but also what makes the most sense for downstream users. Consider the following:
The presence of > Rack::Utils.build_query({x: nil})
=> "x"
> Rack::Utils.build_query({x: ""})
=> "x=" When parsing data from a form, it feels like there should be some way to specify nil in an array. My question would then be, what data structures are we expecting to be able to round trip? Should we expect empty strings to become nil, or vice versa? Whatever we do here, we should try to be consistent with (1) Published RFCs, (2) For example from the OP, > Rack::Utils.parse_nested_query("foo[]")
=> {"foo"=>[nil]}
> Rack::Utils.build_query({"foo"=>[nil]})
=> "foo" At least this eventually reaches a fixed point. Based on my interpretation of all the above, One way to support this behaviour without breaking existing systems and without forcing changes would be to make it configurable. i.e. add an option to |
@ioquatix I was gonna write something similar but with a different conclusion 😅 I agree that ideally As in, is it really more important to have to be able to specify an array including nil vs. an empty array? The way I see the options for
|
Now that the updated spec considers the lack of It is definitely possible to make this configurable, and I can see good arguments for that. If can prepare a PR for a configurable approach if we want to go that route. Assuming we do go that route, we still need to decide on a default, which I think should be correct or compatible with Rack 2. |
I have to say, despite all the great conversation that's happened here, I really find it frustrating when there is no clear correct path.
Can we do both? Okay let's add an option to control it and move all the complexity to user code. It's also bad (for users). My gut tells me we should follow the standard. That means breaking compatibility. But... it's Rack 3.0 - we are allowed. Please give your opinion on this matter:
|
Setting the array behaviour aside for a moment, and considering the base IMO strict conformance is uninteresting here; it's a technicality that's hard to encounter (because it's not something browsers can generate), Returning focus to this issue, though, I grow an additional objection: I don't see how it would help address the reported issue. AFAICS, changing to So, in conclusion: I think I'm extremely wary of any configurable option inside a library like this, because all clients that don't control that configuration option (i.e., downstream libraries, as opposed to end-user applications) are then all obliged to handle both possible config scenarios. We've previously been burned on that in Rails, and now fight hard against that inclination especially when it affects return values -- it's the PHP magic quotes problem. |
I agree with @matthewd points, I just want to elaborate a bit more on it, take a different view of the same problem and try to justify my preference (and what we did in Faraday). We should really stop trying to solve this entirely with standards here because neither the URI RFC nor the HTML standards have a clear set of rules for modern web frameworks "nested" query parameters like arrays and hashes. So why did we introduce the "nested" encoding? Well I'd say that was to create a "convention" (much different, but still as important as a standard) to better serve users and developers. Now, if we agree that we're trying to make life easier for users, let's try putting ourselves in their shoes and pretend someone asked the following:
and let's imagine for a moment that our convention didn't cover that case at all, so we're free to choose whatever interpretation we want for it 😃 Here are the ones we've mentioned already:
So the last option is the one that makes the most sense AND is not already covered by other representations, thus sounds like the most logical one. Finally, I'd also advise against giving options unless strictly necessary as that will cause chaos very quickly and increase maintenance complexity in future. |
No, we don't all agree on that. It should be
Note that if you use
This format is backwards compatible. I don't think we should break backwards compatibility unless we are going to increase compatibility with the spec.
I don't see how this "ACTUALLY" makes more sense. It's inconsistent with
Let's say I want to pass
|
Let's say I want to send an empty array, neither the existing solution nor the proposed "standard-friendly" one solves this. |
https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams shows that my approach (empty string for omitted var searchParams = new URLSearchParams("q=&topic=api");
searchParams.get("q")
# => ""
var searchParams = new URLSearchParams("q&topic=api");
searchParams.get("q")
# => "" https://developer.mozilla.org/en-US/docs/Web/API/FormData/FormData converts input to string: var f = new FormData()
f.append('q', '');
var searchParams = new URLSearchParams(f);
searchParams.get("q")
# => ""
var f = new FormData()
f.append('q', null);
var searchParams = new URLSearchParams(f);
searchParams.get("q")
# => "null" Considering that |
I wouldn't say that any of those have much merit right now in deciding what's valid here as what we're doing is undefined/unspecified behaviour since none of these deal with arrays or any sort or form of nesting. There is no standard. For all that I know that parsing behaviour was invented in PHP and then adopted in Ruby. I don't know, but at this point it seems as good a guess as any. Well besides asking the WHATWG if they wanna further refine a standard they called a monstrosity... Anyhow, let's look at PHP, function seems to be
So, PHP also parses it to empty string (much to my surprise to be honest :) ). It's noteworthy though that the top voted comment is someone complaining that PHP does this weirdly and how everyone else does it differently (it's also 13 years old though):
Anyone wants to check how CGI parses this query? 😅 Anyhow, this leaves us with:
Which I'm still not sure how to resolve :) |
I have no idea what accidental keyboard shortcuts I hit here, I'm sorry for close and empty now deleted comment. |
CGI parses everything as an array, does not treat CGI.parse('q=')
# => {"q"=>[""]}
CGI.parse('q')
# => {"q"=>[]} So to get the value of a parameter, you generally would reference the first value: CGI.parse('q=')['q'][0]
# => ""
CGI.parse('q')['q'][0]
# => nil
Note that if we change the behavior of just My thoughts on this haven't changed. Either we keep the behavior the same for backwards compatibility, or we change the behavior to be correct. Changing the behavior to a different backwards-incompatible approach, even if it makes a common case easier, is a poor choice IMO. |
Looking at this issue again, I think that we should:
While I really strongly feel for the original logic and rational, implementing a bespoke parser which doesn't follow the documented standards (no matter how stupid it may be) feels wrong. I know it's hard to reconcile this position given that no one here is "wrong"... but we should fix the standard if that's the problem. |
The URL spec section 5.1.3.3 specifies that if = is not present in the byte sequence, it should be treated as if the byte sequence is the name of the tuple and the value is the empty string. This affects all parameters without =, not just arrays: ```ruby Rack::Utils.parse_nested_query("foo[bar]&baz[]&quux") {"foo"=>{"bar"=>nil}, "baz"=>[nil], "quux"=>nil} # Before {"foo"=>{"bar"=>""}, "baz"=>[""], "quux"=>""} # After ``` Fixes rack#1696
The URL spec section 5.1.3.3 specifies that if = is not present in the byte sequence, it should be treated as if the byte sequence is the name of the tuple and the value is the empty string. This affects all parameters without =, not just arrays: ```ruby Rack::Utils.parse_nested_query("foo[bar]&baz[]&quux") {"foo"=>{"bar"=>nil}, "baz"=>[nil], "quux"=>nil} # Before {"foo"=>{"bar"=>""}, "baz"=>[""], "quux"=>""} # After ``` Fixes #1696
We're discussing how this change breaks backwards compatibility in Grape in ruby-grape/grape#2298, please check it out and comment if you have time! |
Hello lovely folks,
first and foremost thanks for your excellent work on this project practically the entire ruby webdev eco system relies on 💚 🎉 🚀
Problem
I've come here today to propose a change in the parsing behavior of the query string
param[]
.There is a spec asserting the current behavior so I assume it's intentional
rack/test/spec_utils.rb
Lines 164 to 165 in 649c72b
I believe/hope the more correct test would be:
I base this on three things:
nil
doesn't seem particularly useful (or intuitive) vs. an empty arrayrack/lib/rack/query_parser.rb
Lines 67 to 72 in 649c72b
her
and many projects rely upon) changed it's way of serializing empty arrays to the query string above #800 correct handling parameter value empty array lostisland/faraday#801Especially due to the last point this creates a sort of dissonance in the Ruby eco system where I send an empty array from the one side and it arrives as
[nil]
on the server side. Of course, the fix might also be on the faraday side.I have not yet checked the HTTP specs or anything about this to see what's "officially correct"
To have complete traceability, I came here via ruby-grape/grape#2079 :)
Implementation
I'm not sure if it's the complete implementation but it seems to be easily fixable in
rack/lib/rack/query_parser.rb
Lines 102 to 105 in 649c72b
all it needs is
unless v.nil?
:All tests (with exception of the one mentioned above) pass with this change.
(I have the code, I found the old spec while implementing this/my spec was elsewhere)
Happy to make a PR should it be wanted!
Thanks for all your work! 🎉 🚀
The text was updated successfully, but these errors were encountered: