Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing napi_get_value_string_length #226

Closed
jasongin opened this issue Apr 12, 2017 · 5 comments
Closed

Consider removing napi_get_value_string_length #226

jasongin opened this issue Apr 12, 2017 · 5 comments

Comments

@jasongin
Copy link
Member

I regret adding this API now, because it doesn't serve much purpose, and is only likely to cause confusion and bugs. The intention was that this would return the number of "characters" in a string independent of encoding, but that's not generally useful. In almost all cases, one of the encoding-specific napi_get_value_string_* APIs is more correct. (Pass a null buffer if only the encoded length is desired.)

Anyway the current V8 implementation of napi_get_value_string_length() is technically wrong: it returns the number of 2-byte code points of the UTF-16 encoding, but there are actually characters that are encoded as two UTF-16 code points.

@kkoopa
Copy link

kkoopa commented Apr 12, 2017

You are probably right. It is better to err on the side of caution. Character encoding should be added to the list of 2 hard problems in CS, which would now be the 3 hard problems:

  1. Cache invalidation
  2. Naming
  3. Character encoding
  4. Off-by-one errors

@jasongin
Copy link
Member Author

And in case anyone is wondering, the JavaScript String.prototype.length property returns the number of UTF-16 code units, which may be different from the number of characters. So, getting the true character count is not common with JavaScript, and is probably best left to specialized internationalization libraries.

@trevnorris
Copy link

@jasongin Personally I think the byte length would be the most useful b/c that's what's needed if you want to copy out the string. For that you could just copy the method used by Buffer.byteLength(), and in the same way could extend the API to accept an encoding.

@jasongin
Copy link
Member Author

@trevnorris There are other APIs to get byte length, that is why this API should just be removed.

napi_get_value_string_utf8() gets the length of the UTF-8 encoding of a string, in bytes.
napi_get_value_string_utf16() gets the length of the UTF-16 encoding of a string, in 2-byte code units.
In both of those, if you just want the length without copying the string, you can pass in a null string buffer.

@mhdawson
Copy link
Member

I'm +1 for removing. Its safer to start with a smaller API and add back than to be stuck with methods that cause confusion and are not really needed.

jasongin added a commit to jasongin/nodejs that referenced this issue Apr 18, 2017
This API doesn't serve much purpose, and is only likely to cause
confusion and bugs. The intention was that this would return the
number of characters in a string independent of encoding, but
that's not generally useful. In almost all cases, one of the
encoding-specific napi_get_value_string_* APIs is more correct.
(Pass a null buffer if only the encoded length is desired.)

Anyway the current implementation of napi_get_value_string_length()
is technically wrong: it returns the number of 2-byte code units of
the UTF-16 encoding, but there are actually some characters that
are encoded as two UTF-16 code units.

Note the JavaScript String.prototype.length property returns the
number of UTF-16 code units, which may be different from the number
of characters. So, getting the true character count is not common
with JavaScript, and is probably best left to specialized
internationalization libraries.

Fixes: nodejs/abi-stable-node#226
gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this issue Apr 10, 2018
This API doesn't serve much purpose, and is only likely to cause
confusion and bugs. The intention was that this would return the
number of characters in a string independent of encoding, but
that's not generally useful. In almost all cases, one of the
encoding-specific napi_get_value_string_* APIs is more correct.
(Pass a null buffer if only the encoded length is desired.)

Anyway the current implementation of napi_get_value_string_length()
is technically wrong: it returns the number of 2-byte code units of
the UTF-16 encoding, but there are actually some characters that
are encoded as two UTF-16 code units.

Note the JavaScript String.prototype.length property returns the
number of UTF-16 code units, which may be different from the number
of characters. So, getting the true character count is not common
with JavaScript, and is probably best left to specialized
internationalization libraries.

PR-URL: nodejs#12496
Fixes: nodejs/abi-stable-node#226
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit to nodejs/node that referenced this issue Apr 16, 2018
This API doesn't serve much purpose, and is only likely to cause
confusion and bugs. The intention was that this would return the
number of characters in a string independent of encoding, but
that's not generally useful. In almost all cases, one of the
encoding-specific napi_get_value_string_* APIs is more correct.
(Pass a null buffer if only the encoded length is desired.)

Anyway the current implementation of napi_get_value_string_length()
is technically wrong: it returns the number of 2-byte code units of
the UTF-16 encoding, but there are actually some characters that
are encoded as two UTF-16 code units.

Note the JavaScript String.prototype.length property returns the
number of UTF-16 code units, which may be different from the number
of characters. So, getting the true character count is not common
with JavaScript, and is probably best left to specialized
internationalization libraries.

Backport-PR-URL: #19447
PR-URL: #12496
Fixes: nodejs/abi-stable-node#226
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants