-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unescaping URI doesn't recreate unicode characters #86
Comments
@singpolyma Could you please take a look? |
Next time you completely change the behavior of a function like unEscapeString completely breaking any code that requires the behavior that has been present for a decade, it would be nice if you didn't give the package the tiniest version number bump possible. |
My apologies. Do we need a new point release to reinstate the old behavior (and make a new release the introduces the new behavior under a major version bump). |
I'm not sure. Releasing a new point release might be nice. People who have not yet upgraded to network 2.4.1.{1|2} would then get a working version like 2.4.1.3 if they got updated to the lastest 2.4.* network. People who have the 'broken' versions wouldn't see the benefit until they get upgraded of course. For my particular needs.. I am just going to update happstack-server and web-routes and release new versions. I will probably just stop using the unEscapeString function so that I can allow a wider range of versions for network and not have to worry about the changing unEscapeString behavior. Once I do that, it won't really make any difference to me whether you release a 2.4.1.3 or not. Looking at the Haskell community as a whole.. there are two types of programs out there.
Anything in #1 is probably now broken. I am also a bit wishy-washy about whether the new behavior is entirely correct. I guess the old behavior was just 'percentDecode' and the new behavior is 'percentDecodeAndUtf8Decode'. For URIs you pretty much always want the latter. But, in theory, you might have been using 'unEscapeString' to handle something like this:
Though, that is unlike to come up in practice I think. I'm guessing most browsers ignore the charset=binary in a form tag. I think I've experimented with that before. Ultimately, you have the problem of notifying developers that something significant changed in unEscapeString. I think Gregory Collins and I (or perhaps it was mightybyte) independently came to the conclusion that the only safe way to do it is to create a new function with a new name for the new behavior, and then mark the old version as deprecated. This will at least force developers to think for a minute and maybe read some documentation about what they need to do to change their code for the new function. As it stands now, even if you had waited until network 2.5 to make the changes, most people would have just checked that their code compiled and assumed everything was A-OK. The downside is that 'unEscapeString' is a pretty good name for the new function. You don't really want to have to call it something like 'unEscapeAndDecodeString'. Another option might be to add a WARNING pragma for a few releases after you update network. It is not as noticeable as the function disappearing, but is more likely to be seen than some changelog entry on github? Anyway. In theory, releasing a network 2.4.1.3 with the old behavior and a 2.5 with the new behavior would minimize the damages, provided that other people are actually following the PVP and that they know they need to update their code when they switch to 2.5. If they don't know they need to update their code.. then releasing 2.4.1.3/2.5 would just delay the problem slightly. So, it's up to you I guess. I'll have new updated packages out soon enough that it won't matter to me anymore. A network 2.5 with unEscapeText would be nifty though ;) But, at the same time, adding the text dependency is not thrilling. |
While it probably would have been good to use a major version bump for changes like this, I'd prefer it to leave things as they are now. I've already had to update our packages twice. Reverting and doing a new major release requires extra code, tricky cabal version constraints and more work again for us. I'd also like to note that when I upgraded to network 2.4, I overlooked the initial change and broke our application (for a very short while). So even a major bump doesn't always help. I did mean that we discovered the cause quickly, since there was an explicit action on our part that caused it: relaxing the dependency on network. For the future, I'd prefer:
|
I will leave things as they are for now and try to make sure we bump the major version number next time. |
Since 23c1007 (see #51), unicode characters in strings are escaped the way browsers do: encode as UTF8, and use the bytes from that to percent-encode. However, during unescaping no UTF8-decoding is done. This means that (decode . encode) is not the identity, and results in malformed strings. For example, if I take the string from test case
testEscapeURIString06
and unescape it, I get:The text was updated successfully, but these errors were encountered: