-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CSL-2435] Use 'TextEncoder' to measure correct length of UTF-8 strings #840
[CSL-2435] Use 'TextEncoder' to measure correct length of UTF-8 strings #840
Conversation
I can confirm that all our acceptance tests are passing! |
My manual testing has also proven that the fix works! 🎉 |
@DominikGuzei @darko-mijic we need to get this PR merged as soon as possible so that we can resume the release process... |
@nikolaglumac you forgot to push the acceptance tests implementation? |
No @DominikGuzei - I have added them in the first commit: 521d30a |
@nikolaglumac ah sorry … i just saw the feature file and thought there are some implementation steps missing - but you just reused existing functionality 👍 |
@nikolaglumac i will merge this now as it works as intended. Please feel free to create an issue on YouTrack with the proposal by @cleverca22 |
Yes that single test captures the issue (in the code without the fix) |
JavaScript length method (with ES6) actually counts unicode punycodes of the UTF-16 encoding of the given string. Apart from messing up with surrogate pairs (counting for instance 💩 as two characters because encoded with a pair or punycodes), it also leads to unexpected results when characters have a different UTF-8 and UTF-16 encoding.
Since requests are made using UTF-8 encoding, the Content-Length of these requests should match the actual length when payloads are UTF-8 encoded.
This PR is a port of #839
Acceptance tests results:
Before the fix:
After the fix: