Support Unicode in JSON input #5504

moneromooo-monero · 2019-04-30T21:09:49Z

No description provided.

vtnerd · 2019-05-01T00:25:38Z

contrib/epee/include/storages/parserse_base_utils.h

@@ -162,6 +185,57 @@ namespace misc_utils
              val.push_back('\\');break;
            case '/':  //Slash character
              val.push_back('/');break;
+            case 'u':  //Unicode code point
+              if (it + 1 == buf_end || it + 2 == buf_end || it + 3 == buf_end || it + 4 == buf_end)


if (buf_end - it < 4)

vtnerd · 2019-05-01T00:33:44Z

contrib/epee/include/storages/parserse_base_utils.h

+              }
+              else
+              {
+                uint32_t dst = 0;


This portions looks like a:

uint16_t dst = 0; for (unsigned count = 0; count < 4; ++count) { dst <<= 4; const unsigned char tmp = isx[*++it]; CHECK_AND_ASSERT_THROW_MES(tmp != 0xff, "Bad unicode encoding"); dst |= tmp; }

Also note the uint16_t because that is the entire range that can be extracted here.

vtnerd · 2019-05-01T00:43:35Z

contrib/epee/include/storages/parserse_base_utils.h

+                  val.push_back(0x80 | ((dst >> 6) & 0x3f));
+                  val.push_back(0x80 | (dst & 0x3f));
+                }
+                else if (dst <= 0x10ffff)


This value range is not possible with 4 hex characters. Anything in this range is provided as a UTF-16 surrogate pair. This requires parsing two 16-bit values. If the first 16-bit value is between 0xD800–0xDBFF, then another 16-bit value must follow which represents the entire code point.

Do you know how to encode code points > 0xffff ? All the examples I've fund use \uxxxx with 4 digits.

Oh you mean this encoding is actually UTF-16, not raw code points ?

Yes, the RFC says the value is UTF-16 and a peek at rapidjson shows that at least one major implementation has done it this way.

eeca5ca epee: support unicode in parsed strings (moneromooo-monero) 3e11bb5 functional_tests: test creating wallets with local language names (moneromooo-monero)

vtnerd · 2019-08-18T23:22:58Z

Sorry for not re-reviewing this PR earlier, UTF-16 surrogate pairs are still not handled properly. Anything outside of the Basic Multilingual Plane will decode incorrectly (which are less common characters at least).

vtnerd · 2019-08-18T23:24:57Z

I will try to update it with tests later this week.

vtnerd reviewed May 1, 2019

View reviewed changes

moneromooo-monero force-pushed the euni branch 2 times, most recently from d012094 to 4144dbf Compare May 1, 2019 18:58

xiphon approved these changes Jul 31, 2019

View reviewed changes

moneromooo-monero added 2 commits August 16, 2019 17:06

epee: support unicode in parsed strings

eeca5ca

functional_tests: test creating wallets with local language names

3e11bb5

moneromooo-monero force-pushed the euni branch from 4144dbf to 3e11bb5 Compare August 16, 2019 21:12

luigi1111 added a commit that referenced this pull request Aug 17, 2019

Merge pull request #5504

14602ba

eeca5ca epee: support unicode in parsed strings (moneromooo-monero) 3e11bb5 functional_tests: test creating wallets with local language names (moneromooo-monero)

luigi1111 merged commit 3e11bb5 into monero-project:master Aug 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Unicode in JSON input #5504

Support Unicode in JSON input #5504

moneromooo-monero commented Apr 30, 2019

vtnerd May 1, 2019

vtnerd May 1, 2019

vtnerd May 1, 2019

moneromooo-monero May 1, 2019 •

edited

Loading

moneromooo-monero May 1, 2019

vtnerd May 2, 2019

vtnerd commented Aug 18, 2019

vtnerd commented Aug 18, 2019

Support Unicode in JSON input #5504

Support Unicode in JSON input #5504

Conversation

moneromooo-monero commented Apr 30, 2019

vtnerd May 1, 2019

Choose a reason for hiding this comment

vtnerd May 1, 2019

Choose a reason for hiding this comment

vtnerd May 1, 2019

Choose a reason for hiding this comment

moneromooo-monero May 1, 2019 • edited Loading

Choose a reason for hiding this comment

moneromooo-monero May 1, 2019

Choose a reason for hiding this comment

vtnerd May 2, 2019

Choose a reason for hiding this comment

vtnerd commented Aug 18, 2019

vtnerd commented Aug 18, 2019

moneromooo-monero May 1, 2019 •

edited

Loading