Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve header casing as specified #576

Merged
merged 1 commit into from
Dec 19, 2019

Conversation

joshuaflanagan
Copy link
Contributor

The original behavior was to normalize all header names so that they
were broken up into words, delimited by - or _, capitalize each word,
and then join the words together with a -.

This made it impossible to make a request with an underscore (_) in the header
name, or with a different casing (ex: all caps).

However, the normalized name made it possible to access (or delete) headers,
without having to know the exact casing.

The new behavior is based on the following rules (as specified in #524 (comment))

  1. Fail if a header name is not specified as a String or Symbol
  2. If the header name is specified as a Symbol, normalize it when writing it in a request.
    If the header name is specified as a String, preserve it as-is when writing it in a request.
  3. Allow lookup of any header using the normalized form of the name

I implemented this behavior by storing three elements for each header value:

  1. normalized header name
  2. header name as it will be written in a request
  3. header value

Element 2 is the new addition. I considered just storing the header value
as it would be written, and only doing normalization during lookup, but
it seemed wasteful to potentially normalize the same value over and over
when searching through the list for various lookups. This way we only
normalize each name once, and can continue to use that value for lookups.
However, whenever asked for the contents (ex: via each or keys) we
return the new, non-normalized name.

Fixes: #524

@joshuaflanagan joshuaflanagan force-pushed the unconverted_headers branch 2 times, most recently from 697e29b to 73bf831 Compare December 15, 2019 01:18
Copy link
Member

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good to me. Curious what @ixti thinks.

Copy link
Member

@ixti ixti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change #add api to minimize breaking changes. We can also make it fully backward compatible by adding normalize keyword with true as default.

@@ -49,16 +53,30 @@ def delete(name)
# @param [Array<#to_s>, #to_s] value header value(s) to be appended
# @return [void]
def add(name, value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to introduce optional keyword:

def add(name, value, normalize: nil)

Naturally, when normalize is false, no normalization should be done for the wire. When true - force normalization, and when nil - type dependent normalization.

when Symbol
lookup_name
else
raise HTTP::HeaderError, "HTTP header must be a String or Symbol: #{name.inspect}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should raise only if key is no ot responding to to_s and not a Symbol. Something like:

if name.is_a? Symbol
  # ...
elsif name.respond_to? :to_s
  # ...
else
  raise
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how that is helpful. Almost everything in ruby responds to to_s (unless it derives from BasicObject to explicitly opt out of default behavior).

If you want to allow more than just String (not sure what the use case is), we could support anything that responds to to_str, which is used to support implicit conversion to strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I agree. Then we need to update api doc to reflect this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to want to keep my current strict implementation (only allow String or Symbol), or do you want to use the more relaxed "anything that supports an implicit conversion to string (ie. responds to #to_str)"?

I think being strict makes sense here, especially since we have different behavior based on the data type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with your way. Strict Symbol or String.

@ixti
Copy link
Member

ixti commented Dec 16, 2019

Huge thanks for the PR. I'm not against breaking api, but would like add signature to allow predictable behaviour when normalize dwarf passed

@joshuaflanagan
Copy link
Contributor Author

Thanks for the feedback. I don’t completely understand what you want to enable. Can you write me a test that would fail with the existing code, that you want to pass?

I think the most “predictable” behavior is to not do any manipulation of the strings provided by the user. I do understand that is a breaking change if someone is currently specifying a “content_type” and expecting it to go out on the wire as “Content-Type”. But for the vast number of new users and new code being written, I think changing the name out from under them is more surprising.

@joshuaflanagan
Copy link
Contributor Author

joshuaflanagan commented Dec 16, 2019

I understand the normalize keyword, but not exactly why it is helpful. In one place you suggest the default is nil (current type based switching), but then another place you suggest it should be true. That suggests the default behavior would be to continue changing a specified name into something else on the wire. That seems like more surprises for future users.

Copy link
Contributor Author

@joshuaflanagan joshuaflanagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to help clarify my reluctance toward the normalize keyword:

If the default is going to be nil, then the only place a user would set it explicitly to true is when they want to maintain the backwards compatible behavior. But in those cases, since they are touching the callsites anyway, wouldn't it make sense for them to change the name string they are specifying so it is in the normalized form they want?

If the default is going to be true, you are continuing the existing behavior of ignoring the casing (and more importantly, _ over -) of the name as specified by the user. You would be forcing everyone writing new code to add an additional keyword just to say "don't mess with my strings! You see that header name I gave you? I really meant it - keep it that way". That feels odd to ask the user to do extra work, to stop the library from getting in their way.

I understand the pain of introducing a breaking change. One option, that I don't love, would be to allow a global configuration option that the user could opt-in to preserve the existing auto-convert behavior.

HTTP.always_normalize_header_names = true` # default false

I don't like the global state aspect of that solution. Or the fact that it perpetuates the behavior forever. It might be better if it reported a noisy deprecation warning (to stderr) whenever it does a conversion of a String. That would call attention to the callsites, so that they can be fixed. With the idea that the next major version would remove the configuration option and the old behavior.

@ixti
Copy link
Member

ixti commented Dec 16, 2019

@joshuaflanagan honestly I'm not sure what I want. :D I want the API to be as predictable as possible. Please give me a bit of time to think. Would like to play around with your PR first.

@joshuaflanagan
Copy link
Contributor Author

joshuaflanagan commented Dec 19, 2019

Any further thoughts on this? Is the primary concern that it includes a breaking change?

I think the only breaking change is to existing code that adds a request header with a name, specified as a String, that includes an underscore. With the new code, that will go out on the wire as an underscore (_), whereas previous code would have (incorrectly) converted it to a dash (-).

Technically, it could also be considered a breaking change that the casing is no longer changed from what the user specified. Header names are not supposed to be case-sensitive, so servers shouldn't care that header names are no longer being converted to "capitalized" case. However, we know that not all servers are RFC compliant and some do care about case (which is, in fact, the motivation behind this PR). So it is possible that this change in casing will break some existing code, if the server someone is communicating with expects a specific casing and the user is not currently passing the correct casing.

I do appreciate trying to avoid breaking changes, but in this case, I think any changes forced on the user will be toward making their code more correct. If this is released as part of a major version upgrade, these types of breaking changes can be expected.

@ixti
Copy link
Member

ixti commented Dec 19, 2019

After giving it more thoughts I guess I'm fine to merge it. But please, fix the API doc of Headers#add.

The original behavior was to normalize all header names so that they
were broken up into words, delimited by `-` or `_`, capitalize each word,
and then join the words together with a `-`.

This made it impossible to make a request with an underscore in the header
name, or with a different casing (ex: all caps).

However, the normalized name made it possible to access (or delete) headers,
without having to know the exact casing.

The new behavior is based on the following rules (as specified in httprb#524 (comment))

1) Fail if a header name is not specified as a String or Symbol
2) If the header name is specified as a Symbol, normalize it when writing it in a request.
If the header name is specified as a String, preserve it as-is when writing it in a request.
3) Allow lookup of any header using the normalized form of the name

I implemented this behavior by storing three elements for each header value:
1) normalized header name
2) header name as it will be written in a request
3) header value

Element 2 is the new addition. I considered just storing the header value
as it would be written, and only doing normalization during lookup, but
it seemed wasteful to potentially normalize the same value over and over
when searching through the list for various lookups. This way we only
normalize each name once, and can continue to use that value for lookups.
However, whenever asked for the contents (ex: via `each` or `keys`) we
return the new, non-normalized name.

Fixes: httprb#524
@joshuaflanagan
Copy link
Contributor Author

I've updated the API doc to reflect the current strict implementation that only accepts a string or symbol as the header name.

Copy link
Member

@ixti ixti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes things better than they were, thus OK to merge it.

@uberllama
Copy link

Hey folks, we ran into this today with a partner using non-standard api headers. I noticed it still hasn't been released. Is there an ETA?

@tarcieri
Copy link
Member

See the release tracking milestone here: https://github.com/httprb/http/milestone/11

There are some showstopper bugs which would be good to resolve before cutting another release. We would appreciate help from anyone interested in getting it over the line.

@Linuus
Copy link
Contributor

Linuus commented Nov 10, 2021

For anyone reading this later on. This is a breaking change. See: #700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non capitalized HTTP headers
5 participants