Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safari breaks UTF-16 compressed strings when sending data through XMLHttpRequest #60

Closed
carminexx opened this issue Jul 24, 2015 · 10 comments

Comments

@carminexx
Copy link

I noticed a strange behavior when using LZString.compressToUTF16 (in JavaScript), sending the data (a compressed string) via XMLHttpRequest to a web server written in Java, and then using decompressFromUTF16 (I tried both proposed porting of LZString in Java).

When I do all of this with Chrome or Firefox, everything works. Instead, when I use Safari, it sends the UTF16 string generated by LZString, but, when I decode the string on the Java web service, I got a null string.
After a bit of debugging, I noticed that:

  1. The decompress algorithm fails at a specified character
  2. The sent UTF16 string (in Safari), and the received UTF16 string, on Java web service, differs for some characters.

Are there issues in UTF16 encoding in Javascript for Safari, or it's a LZString limitation?

@rquadling
Copy link
Collaborator

As with many of the transports and storage mechanisms, you may be getting an escape of some particular character happening. This then breaks the binary aspect of the compressed data.

If you want to transmit the binary (though it is UTF-16), you will need to escape it appropriately for your requirement and then unescape it from within your service.

@carminexx
Copy link
Author

I imagined that the problem may be in some sort of character escaping. Is there a list of "known" character that are escaped by Safari (and maybe some other browsers), when using UTF-16?

@pieroxy
Copy link
Owner

pieroxy commented Jul 24, 2015

The compressToUTF16 is a hack because old JS engines didn't know any other data type apart from number and booleans. It was never meant to get out of the browser. Plus, depending on your encoding when sending the string out of the browser, you may be wasting lots of space.

Can you post the code you use to send the data? Specifically, which content-type and/or encoding are you setting for your XHR?

I created specifically a compressToEncodedURIComponent which can almost maximize bandwidth on a application/x-www-form-urlencoded transmission. It'll go faster and this one is not a hack, it's pure ASCII.

@carminexx
Copy link
Author

Thank you for the reply,
I'm using a plain-old-XMLHttpRequest to send the data, forcing text/plain;charset=UTF-16 encoding (but I noticed that almost every browser rewrites it as text/plain;charset=UTF-8).

Something like:

 function _sendData(address, payload) {
        var xhr = new XMLHttpRequest();
        xhr.open("POST", address, true);
        xhr.overrideMimeType("text/plain;charset=UTF-16");
        xhr.send(payload);
    }

Regarding the compressToEncodedURIComponent function, I did several benchmarks and noticed that the space occupied by the resulting string is almost the same as compressToBase64, is it correct?

For example, here's a simple benchmark:

Original: 281718 bytes (281.718 kB)
UTF-16 compressed: 48835 bytes (48.835 kB)
Binary encoding: 91564 bytes (91.564 kB)
Invalid UTF-16 encoding: 45782 bytes (45.782 kB)
Base 64 encoding: 122088 bytes (122.088 kB)
URI-encoded: 122085 bytes (122.085 kB)

(I get the size reading the resultString.length property)

@pieroxy
Copy link
Owner

pieroxy commented Jul 27, 2015

compressToEncodedURIComponent is a variation over compressToBase64, you are correct. But if you send it in UTF-8 (or plain ASCII or ISO-LATIN-1, ...), each character will take 8 bits. If you send the output of compressToUTF16, each character will use 16 bits, and if for some reason the browser defaults to UTF-8 charset, they will take on average much more than 16 bits and you will waste a lot of bandwidth.

@carminexx
Copy link
Author

I noticed that on some browsers, and when they send the content as UTF-8 instead of UTF-16, the space saved thanks to UTF-16 is gone.
Regarding LZString, wouldn't it be better to have a compressToUTF8 function, in order to save a bit more space in comparison to Base64/URIencoded strings?

@pieroxy
Copy link
Owner

pieroxy commented Jul 28, 2015

base64 is using 6 bits out of 8 and the best I could do with UTF-8 would be to use 7 bits out of 8. Maybe there's something to be gained here. The downside is that none of the existing ports of lz-string in other languages support this encoding.

In the meantime, I suggest you use the compressToEncodedURIComponent variety. It's not that bad. It'll consume 14% more space.

@pieroxy
Copy link
Owner

pieroxy commented Jul 28, 2015

Actually, I read the UTF-8 spec wrong. I cannot use 7 out of 8 bits, but just 6. So we're back to base64.

@pieroxy
Copy link
Owner

pieroxy commented Aug 5, 2015

Have you been able to figure something out on this issue ?

@patrickberkeley
Copy link

We ran into this as well i.e., LZString.compressToUTF16 over XHR not working in Safari. Weirdly it does work over WebSocket.

Our solution was the same as @ktsaou: switch to https://github.com/nodeca/pako.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants