Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Unescaping URI doesn't recreate unicode characters #86

Closed
hesselink opened this Issue · 6 comments

3 participants

Erik Hesselink Johan Tibell Jeremy Shaw
Erik Hesselink

Since 23c1007 (see #51), unicode characters in strings are escaped the way browsers do: encode as UTF8, and use the bytes from that to percent-encode. However, during unescaping no UTF8-decoding is done. This means that (decode . encode) is not the identity, and results in malformed strings. For example, if I take the string from test case testEscapeURIString06 and unescape it, I get:

> putStrLn $ unEscapeString (escapeURIString isUnescapedInURIComponent "helloø©日本")
helloø©日本
Johan Tibell
Owner

@singpolyma Could you please take a look?

Stephen Paul Weber singpolyma referenced this issue
Merged

TDD'd #86 #87

Johan Tibell tibbe closed this in f2168b1
Jeremy Shaw

Next time you completely change the behavior of a function like unEscapeString completely breaking any code that requires the behavior that has been present for a decade, it would be nice if you didn't give the package the tiniest version number bump possible.

Johan Tibell
Owner

My apologies. Do we need a new point release to reinstate the old behavior (and make a new release the introduces the new behavior under a major version bump).

Jeremy Shaw

I'm not sure. Releasing a new point release might be nice. People who have not yet upgraded to network 2.4.1.{1|2} would then get a working version like 2.4.1.3 if they got updated to the lastest 2.4.* network. People who have the 'broken' versions wouldn't see the benefit until they get upgraded of course.

For my particular needs.. I am just going to update happstack-server and web-routes and release new versions. I will probably just stop using the unEscapeString function so that I can allow a wider range of versions for network and not have to worry about the changing unEscapeString behavior. Once I do that, it won't really make any difference to me whether you release a 2.4.1.3 or not.

Looking at the Haskell community as a whole.. there are two types of programs out there.

  1. programs that worked correctly because they understood the old behavior and dealt with it.

  2. programs that worked incorrectly before, but are now fixed by this.

Anything in #1 is probably now broken.

I am also a bit wishy-washy about whether the new behavior is entirely correct. I guess the old behavior was just 'percentDecode' and the new behavior is 'percentDecodeAndUtf8Decode'. For URIs you pretty much always want the latter. But, in theory, you might have been using 'unEscapeString' to handle something like this:

Content-Type: application/x-www-form-urlencoded; charset=binary

Though, that is unlike to come up in practice I think. I'm guessing most browsers ignore the charset=binary in a form tag. I think I've experimented with that before.

Ultimately, you have the problem of notifying developers that something significant changed in unEscapeString. I think Gregory Collins and I (or perhaps it was mightybyte) independently came to the conclusion that the only safe way to do it is to create a new function with a new name for the new behavior, and then mark the old version as deprecated. This will at least force developers to think for a minute and maybe read some documentation about what they need to do to change their code for the new function. As it stands now, even if you had waited until network 2.5 to make the changes, most people would have just checked that their code compiled and assumed everything was A-OK. The downside is that 'unEscapeString' is a pretty good name for the new function. You don't really want to have to call it something like 'unEscapeAndDecodeString'.

Another option might be to add a WARNING pragma for a few releases after you update network. It is not as noticeable as the function disappearing, but is more likely to be seen than some changelog entry on github?

Anyway. In theory, releasing a network 2.4.1.3 with the old behavior and a 2.5 with the new behavior would minimize the damages, provided that other people are actually following the PVP and that they know they need to update their code when they switch to 2.5. If they don't know they need to update their code.. then releasing 2.4.1.3/2.5 would just delay the problem slightly.

So, it's up to you I guess. I'll have new updated packages out soon enough that it won't matter to me anymore. A network 2.5 with unEscapeText would be nifty though ;) But, at the same time, adding the text dependency is not thrilling.

Erik Hesselink

While it probably would have been good to use a major version bump for changes like this, I'd prefer it to leave things as they are now. I've already had to update our packages twice. Reverting and doing a new major release requires extra code, tricky cabal version constraints and more work again for us. I'd also like to note that when I upgraded to network 2.4, I overlooked the initial change and broke our application (for a very short while). So even a major bump doesn't always help. I did mean that we discovered the cause quickly, since there was an explicit action on our part that caused it: relaxing the dependency on network.

For the future, I'd prefer:

  • A major release for changes like these. They are API changing, even if the types don't tell you that.
  • A changelog listing the changes. I often look at the diffs between version, but there is so much noise and unfamiliarity with the code that it is hard to find out what has changed.
Johan Tibell
Owner

I will leave things as they are for now and try to make sure we bump the major version number next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.