Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a special normalize method? #19

Closed
godfat opened this issue Oct 23, 2010 · 5 comments
Closed

a special normalize method? #19

godfat opened this issue Oct 23, 2010 · 5 comments
Labels

Comments

@godfat
Copy link

godfat commented Oct 23, 2010

I have an issue in em-http-request[0] that the client calls Addressable::URI#normalize! before actually doing the request, and that causes semantic changes for the URI.

Addressable::URI.parse('http://example.com/?q=%2B%26b%3Da').normalize.to_s == "http://example.com/?q=+&b=a"

Probably the client should not call Addressable::URI#normalize! before requesting the server, but according to Ilya, the author of em-http-request[1], this is a must to deal with some edge cases. I am no expert in neither URI nor HTTP, what do you think?

Thanks a lot!

[0] http://github.com/igrigorik/em-http-request

[1] http://github.com/igrigorik/em-http-request/issues/57

@sporkmonger
Copy link
Owner

The client should not be calling normalize, but this is a mistake in the client, not the URI parser. However, if it's been like this for awhile, resolving it will likely cause breaking changes in any projects that have em-http-request as a dependency and that have been relying on this functionality. i.e., the 'edge cases' Ilya was referring to.

If Ilya needs convincing, point him at OAuth and ask him if he's ever tried using em-http-request in conjunction with an auth mechanism that signs parts of the URI. If you pre-normalize like this, the signatures won't match and the request will fail in ways that are nearly impossible to debug.

However, in this particular case you've given, that is not an example of a semantic change. All URI-aware software should treat those two as equivalent. The main problem here is simply that if I give an HTTP client a URI, I expect it to make a request against exactly the byte-for-byte data I give it. Pre-normalizing is the kind of magic Ruby is known and sometimes reviled for, and we shouldn't be making a habit of that.

@igrigorik
Copy link
Contributor

Bob, the normalize! call in em-http is a fairly recent addition. Perhaps I misunderstood the utility / semantics of it? I assumed the same behavior as built in URI lib...

ruby-1.9.2-p0 > require 'uri'
ruby-1.9.2-p0 > u = URI.parse('http://example.com/path?a=%28%2B%29')
ruby-1.9.2-p0 > u.normalize
# URI::HTTP:0x00000101959fa0 URL:http://example.com/path?a=%28%2B%29
ruby-1.9.2-p0 > u.normalize.to_s
"http://example.com/path?a=%28%2B%29"
ruby-1.9.2-p0 > u.query
"a=%28%2B%29"
ruby-1.9.2-p0 > require 'addressable/uri'
ruby-1.9.2-p0 > a = Addressable::URI.parse('http://example.com/path?a=%28%2B%29')
ruby-1.9.2-p0 > a.normalize!
# Addressable::URI:0x80e588c0 URI:http://example.com/path?a=(+)
ruby-1.9.2-p0 > a.query
"a=%28%2B%29"

I'm guessing you're following the normalization spec? [1] If thats the case, this is a tricky case.. In theory, the URI's should be the same, in practice (due to server implementations) they are not. At the same time, the last thing I want to do is reimplement pars of Addressable in em-http.

It seems like saying "client shouldn't call normalize" defeats the purpose of the lib? Having said that, it's a catch-22 because that's what the spec says you should do. Ugh!

Any suggestions for how to deal with this?

[1] http://labs.apache.org/webarch/uri/rfc/rfc3986.html#normalize-encoding

@sporkmonger
Copy link
Owner

Addressable performs encoding normalization as per the spec, yes. It also performs all the other normalization steps given, like path segment normalization and so on. The problem is not that Addressable's normalization is non-conformant. The problem is that an HTTP client must not perform normalization prior to sending the request. Nowhere does any spec require a generalized HTTP client perform normalization prior to sending the request. That's always something that should be done manually.

Normalization can and often does result in a new identifier. It's a process that attempts to produce a new URI that points to the same resource as the original URI. From the spec: "Implementations may use logic based on the definitions provided by this specification to reduce the probability of false negatives." In other words, any time you perform normalization, you run the risk of a false negative; i.e., a new URI that points to the wrong resource.

In the case of an HTTP client, it's critical that the client makes a request against precisely the same URI it was given. Because of the way HTTP splits the URI in half and only passes the request URI section to the server, it's OK to normalize the scheme and authority piece. But a client should not attempt to normalize the path or query components unless explicitly requested to do so.

And as I pointed out to the other guy, OAuth 1.0 is a perfect example of why this is so important. If you were to sign a request prior to passing it through to the HTTP client, and then the client performed normalization, the signatures would no longer match. The problem would be nearly impossible to debug on top of it, because it would work for almost all requests, and only if you encoded something that was already in canonical form would the signature fail.

@sporkmonger
Copy link
Owner

Also, in the particular example you gave, both implementations are quite possibly wrong. The most correct normalization may actually be http://example.com/path?a=(%2B), depending on the context.

However, to be clear, this one could probably be argued two different ways according to two different specifications (RFC 3986 vs HTML 4.01). Which pretty much should make this the perfect example of why you don't want to normalize here.

@igrigorik
Copy link
Contributor

Bob, that makes sense. Let me take a pass over the code in em-http. Should be able to remove the normalize! call without too much trouble, since its localized to a single location. Just have to make sure that the requests are dispatched correctly in a few edge cases.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants