Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: add buffer.transcode #9038

Closed
wants to merge 1 commit into from

Conversation

@jasnell
Copy link
Member

commented Oct 11, 2016

Checklist
  • make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

buffer

Description of change

Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another.

Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module.

Refs: #8075
/cc @trevnorris @addaleax

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 11, 2016

lib/internal/buffer.js Outdated
const Buffer = require('buffer').Buffer;
const normalizeEncoding = require('internal/util').normalizeEncoding;

if (process.binding('config').hasIntl) {

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 11, 2016

Contributor

what about possibly placing this in lib/internal/buffer-transcode.js and conditionally require()'ing it. purely cosmetic though, to prevent an extra level of indent. or you could just return early. :)

if (!process.binding('config').hasIntl)
  return;
lib/internal/buffer.js Outdated
// Buffer instance.
exports.transcode = function transcode(source, from_enc, to_enc) {
if (!source || !(source.buffer instanceof ArrayBuffer))
throw new TypeError('"source" argument must be a Buffer');

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 11, 2016

Contributor

We going to have a complaint about not supporting SharedArrayBuffer for this?

This comment has been minimized.

Copy link
@jasnell

jasnell Oct 11, 2016

Author Member

Eventually, perhaps. Not too worried about that for now.

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2016

Updated

doc/api/buffer.md Outdated
Returns a new `Buffer` instance.

Throws if transcoding is not possible or if one of the specified encodings is
invalid or unknown.

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

Maybe give an example of transcoding is not possible?

src/node_i18n.cc Outdated
return e;
}

#define THROW_ICU_ERROR(env, status, msg) \

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

Hm – this could be a ThrowICUError function, right?

src/node_i18n.cc Outdated

MaybeLocal<Object> AsBuffer(Isolate* isolate,
MaybeStackBuffer<char>* buf,
size_t len) {

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

This is fine but at some point this might become a member of the MaybeStackBuffer class? I realize that would conflict a bit with the MaybeStackBuffer<UChar> overload, maybe leave a TODO here?

lib/internal/buffer.js Outdated
const icu = process.binding('icu');

// Maps the supported transcoding conversions. The top key is the from_enc,
// the child key is the to_enc. The value is the transcoding function to.

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

Is that final to residue from editing?

This comment has been minimized.

Copy link
@jasnell

jasnell Oct 12, 2016

Author Member

Yeah, slight brain malfunction there I think ;-)

lib/internal/buffer.js Outdated
return source.toString('base64');
},
'hex': (source) => {
return source.toString('hex');

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

Mhhh this returns a string or a Buffer depending on the target encoding? I don’t think binary-to-text encodings should be allowed here, .toString() is the right method for them.

This comment has been minimized.

Copy link
@jasnell

jasnell Oct 12, 2016

Author Member

Yeah, you're right. I'll pull these back out.

@jasnell jasnell force-pushed the jasnell:buffer-transcode branch Oct 12, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2016

@addaleax ... updated! PTAL

@addaleax
Copy link
Member

left a comment

LGTM, nice!

src/node_i18n.cc Outdated
const uint8_t hi = static_cast<uint8_t>(ts_obj_data[n + 0]);
const uint8_t lo = static_cast<uint8_t>(ts_obj_data[n + 1]);
swapspace[i] = (hi << 8) | lo;
}

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

Could this use SwapBytes16?

src/node_i18n.cc Outdated
const uint8_t hi = static_cast<uint8_t>(ts_obj_data[n + 0]);
const uint8_t lo = static_cast<uint8_t>(ts_obj_data[n + 1]);
swapspace[i] = (hi << 8) | lo;
}

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

(ditto)

src/node_i18n.cc Outdated
MaybeStackBuffer<char> buf;
int32_t len;

u_strToUTF8(*buf, 1024, &len, source, length, &status);

This comment has been minimized.

Copy link
@addaleax

addaleax Oct 12, 2016

Member

The 1024 seem kind of magic here, although I realize that is largely my fault. 😄 (Not sure if there’s anything to do about that)

This comment has been minimized.

Copy link
@jasnell

jasnell Oct 13, 2016

Author Member

Should be fixed now!

lib/internal/buffer.js Outdated

// Transcodes the Buffer from one encoding to another, returning a new
// Buffer instance.
exports.transcode = function transcode(source, from_enc, to_enc) {

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

Style: s/from_enc/fromEncoding/ and s/to_enc/toEncoding/. Ditto for cnv_from and cnv_to.

src/node_i18n.cc Outdated
msg = "Unspecified ICU Exception";
Local<String> cons =
String::Concat(estring, FIXED_ONE_BYTE_STRING(env->isolate(), ", "));
cons = String::Concat(cons, OneByteString(env->isolate(), msg));

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

I realize you adapted this code from elsewhere but using snprintf() to format the error message will be much more efficient.

src/node_i18n.cc Outdated
String::Concat(estring, FIXED_ONE_BYTE_STRING(env->isolate(), ", "));
cons = String::Concat(cons, OneByteString(env->isolate(), msg));

Local<Value> e = Exception::Error(cons);

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

Needs a if (e.Empty()) return Local<Value>();.

src/node_i18n.cc Outdated
size_t len) {
if (buf->IsAllocated()) {
MaybeLocal<Object> ret = Buffer::New(isolate, buf->out(), len);
buf->Release();

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

I think this should be if (!ret.Empty()) buf->Release(); - it's leaking memory now when the buffer can't be created.

src/node_i18n.cc Outdated
if (buf->IsAllocated()) {
MaybeLocal<Object> ret =
Buffer::New(isolate, reinterpret_cast<char*>(buf->out()), len);
buf->Release();

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

Ditto.

src/node_i18n.cc Outdated
UChar* source = nullptr;
MaybeStackBuffer<UChar> swapspace;
if (IsLittleEndian()) {
source = reinterpret_cast<UChar*>(ts_obj_data);

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

Same as above: strict aliasing violation and prone to crashing.

src/node_i18n.cc Outdated
} else {
ThrowICUError(env, status, "Unable to transcode buffer");
}
}

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

It seems like there is ample opportunity to share code between Ucs2FromUtf8 and Utf8FromUcs2, they are 80% identical.

This comment has been minimized.

Copy link
@jasnell

jasnell Oct 13, 2016

Author Member

Perhaps. For now I'm more inclined to keep these separate as it makes finding and tweaking bugs a bit easier. I'll take another pass in a separate PR to condense things down.

}

void Release() {
buf_ = buf_st_;

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

Shouldn't this also reset length_?

src/util.h Outdated
return env->ThrowTypeError("argument should be a Buffer"); \
} while (0)

#define SPREAD_ARG(val, name) \

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

If you move this into a common header, you might want to give it a slightly less generic name; e.g. SPREAD_BUFFER_ARG.

tools/icu/icu-generic.gyp Outdated
@@ -21,7 +21,7 @@
'toolsets': [ 'target' ],
'direct_dependent_settings': {
'defines': [
'UCONFIG_NO_CONVERSION=1',
#'UCONFIG_NO_CONVERSION=1',

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Oct 13, 2016

Member

Just remove the 'defines' block instead of commenting it out.

@jasnell jasnell force-pushed the jasnell:buffer-transcode branch 3 times, most recently Oct 13, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 13, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 13, 2016

Another CI run after cleanups: https://ci.nodejs.org/job/node-test-pull-request/4507/ ... that last run was less than successful....
Trying another: https://ci.nodejs.org/job/node-test-pull-request/4508/

@jasnell jasnell force-pushed the jasnell:buffer-transcode branch Oct 13, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 13, 2016

CI looks good. @bnoordhuis PTAL... LGTY?

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 17, 2016

src/node_i18n.cc Outdated
buf, v8::NewStringType::kNormal,
len).ToLocalChecked());
if (e.IsEmpty())
return Local<Value>();

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

if you're returning empty handles as an indicator it's probably be more "V8-ish" have the return signature as a MaybeLocal<Value> instead. been trying to do that in other locations myself.

src/node_i18n.cc Outdated
char buf[kStorageSize];
int len = snprintf(buf, sizeof(buf), "%s [%s]", msg, u_errorName(status));

Local<Value> e =

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

creating Local<Value>'s but there's no HandleScope. if this callback is expected to always be called within an existing HandleScope (like MakeCallback), mind putting a comment at the top. also like MakeCallback (see src/node.h).

src/node_i18n.cc Outdated
len).ToLocalChecked());
if (e.IsEmpty())
return Local<Value>();
Local<Object> obj = e->ToObject(env->isolate());

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

if we know this is a v8::Object then can use e.As<Object>(). that's also more explicit that no extra handle is being created.

src/node_i18n.cc Outdated
obj->Set(env->code_string(),
String::NewFromUtf8(env->isolate(),
u_errorName(status), v8::NewStringType::kNormal)
.ToLocalChecked());

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

The new v8::Maybe<T> API for v8::Object::Set() is annoying and ugly, but if if we're going to use some of the new API might as well use all of it.

src/node_i18n.cc Outdated
if (!ret.IsEmpty()) buf->Release();
return ret;
}
return Buffer::Copy(isolate, dst, len);

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

if you're manipulating the original memory, why bother take a copy?

src/node_i18n.cc Outdated
if (U_SUCCESS(status)) {
len = target - *buf;
args.GetReturnValue().Set(
AsBuffer(env->isolate(), &buf, len).ToLocalChecked());

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

if this operation fails, do we want to abort or throw?

src/node_i18n.cc Outdated
for (size_t n = 0, i = 0; i < length; n += 2, i += 1) {
const uint8_t hi = static_cast<uint8_t>(ts_obj_data[n + 0]);
const uint8_t lo = static_cast<uint8_t>(ts_obj_data[n + 1]);
swapspace[i] = (lo << 8) | hi;

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

for future performance enhancement, detect the alignment of the pointer and perform as many swaps that can be done in a single go.

lib/internal/buffer.js Outdated
const conversions = {
'ascii': {
'latin1': (source) => {
return Buffer.from(source);

This comment has been minimized.

Copy link
@trevnorris

trevnorris Oct 17, 2016

Contributor

i'm a little confused by this whole object, but right here if we're converting ascii to latin1 shouldn't we be passing 'latin1' as the encoding argument to Buffer.from()?

@jasnell jasnell force-pushed the jasnell:buffer-transcode branch 3 times, most recently Oct 18, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 18, 2016

@bnoordhuis ... ok, reworked the implementation with an eye towards simplification and reducing duplication. PTAL
@trevnorris and @addaleax ... if I could trouble each of you to take another look also, I'd appreciate it.

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 21, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 24, 2016

Thank you for the follow up review @addaleax. Will get this landed tomorrow if there are no further objections.

buffer: add buffer.transcode
Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.

Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.

Refs: #8075

@jasnell jasnell force-pushed the jasnell:buffer-transcode branch to 4d7472b Oct 25, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2016

@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2016

green except for unrelated failures. landing

PR updated after view

jasnell added a commit that referenced this pull request Oct 25, 2016
buffer: add buffer.transcode
Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.

Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.

Refs: #8075
PR-URL: #9038
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@jasnell

This comment has been minimized.

Copy link
Member Author

commented Oct 25, 2016

Landed in e8eaaa7

@jasnell jasnell closed this Oct 25, 2016

@srl295 srl295 referenced this pull request Oct 25, 2016
4 of 4 tasks complete
evanlucas added a commit that referenced this pull request Nov 3, 2016
buffer: add buffer.transcode
Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.

Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.

Refs: #8075
PR-URL: #9038
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@addaleax

This comment has been minimized.

Copy link
Member

commented Dec 21, 2016

If this is backported to any of the other release lines, it needs to come with #9838

@addaleax addaleax referenced this pull request Dec 21, 2016
3 of 3 tasks complete
@MylesBorins MylesBorins referenced this pull request Jan 6, 2017
9 of 9 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.