src: simplify Javascript code embedding #27095

refack · 2019-04-04T23:20:38Z

Based on #25518 so only 9a5b688 is new code
This simplifies how we embed the javascript into the binary:

All files are stored as uint16_t[] (no need for UnionBytes anymore)
Raw arrays are wrapped in UInt16Span that is independent of V8 (reduced the amount of definition need to be included)
Turned ToStringChecked into a static class method that is the only place that creates UInt16SpanResource

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
documentation is changed or added
commit message follows commit guidelines

* simplify js2c semantics

nodejs-github-bot · 2019-04-04T23:20:40Z

Lite-CI: https://ci.nodejs.org/job/node-test-pull-request-lite-pipeline/3174

refack · 2019-04-04T23:22:03Z

/CC @nodejs/c++

nodejs-github-bot · 2019-04-04T23:25:44Z

CI: https://ci.nodejs.org/job/node-test-pull-request/22208/

joyeecheung · 2019-04-04T23:32:11Z

All files are stored as uint16_t[] (no need for UnionBytes anymore)

Can you elaborate on why? Encoding files without multibyte characters with one byte strings gives us some memory savings.

refack · 2019-04-04T23:37:08Z

Can you elaborate on why? Encoding files without multibyte characters with one byte strings gives us some memory savings.

This is not "real memory" since it comes from the data segment, not the stack or the
heap (i.e. file mapped).
IIUC it gives us speed improvement since V8 always convert code to TwoByte Strings.
Reduces amount of code, and makes it a little bit simpler.

Hopefully we can follow V8 and eliminate this completely in favor of a code snapshot.

joyeecheung · 2019-04-05T00:02:26Z

This is not "real memory" since it comes from the data segment, not the stack or the
heap.

I believe there's still a difference when the code are parsed and stored in v8?

IIUC it gives us speed improvement since V8 always convert code to TwoByte Strings.

Um, I don't think so? At least they are not stored that way, and AstRawString is usually in one bytes in the parsers (because --print-ast etc. does not handle multibyte characters well).

diff --git a/src/objects.cc b/src/objects.cc
index f27f067624..e6e20de9f3 100644
--- a/src/objects.cc
+++ b/src/objects.cc
@@ -5278,6 +5278,7 @@ Handle<Object> SharedFunctionInfo::GetSourceCode(
   if (!shared->HasSourceCode()) return isolate->factory()->undefined_value();
   Handle<String> source(String::cast(Script::cast(shared->script())->source()),
                         isolate);
+  DCHECK(!String::IsOneByteRepresentationUnderneath(*source));
   return isolate->factory()->NewSubString(source, shared->StartPosition(),
                                           shared->EndPosition());
 }

$ out.gn/x64.debug/d8 --allow-natives-syntax -e "print(%FunctionGetSourceCode((a, b) => { a + b; }));"


#
# Fatal error in ../../src/objects.cc, line 5281
# Debug check failed: !String::IsOneByteRepresentationUnderneath(*source).

$ out.gn/x64.debug/d8 --allow-natives-syntax -e "print(%FunctionGetSourceCode((a, b) => { 测试; }));"
(a, b) => { 测试; }

Reduces amount of code, and makes it a little bit simpler.

While I generally agree with this, we usually trade simplicity for size savings and optimization, not the other way around, so someone could just reverse this patch in the name of saving size.

Hopefully we can follow V8 and eliminate this completely in favor of a code snapshot.

I don't think we can eliminate this completely? For one most builtins are not loaded during bootstrap and are not very likely to be used in normal use cases (e.g. scripts in deps). And a lot of builtins are not side-effect-free modules but scripts that can e.g. exit the process when executed so we cannot snapshot them.

joyeecheung · 2019-04-05T00:11:08Z

Maybe we should have benchmarks for binary sizes and RSS/heap sizes to determine the actual impact in cases like this..

refack · 2019-04-05T00:38:20Z

Maybe we should have benchmarks for binary sizes and RSS/heap sizes to determine the actual impact in cases like this..

I'll do that.

As for code scanning, I was going by this:

https://v8.dev/blog/scanner

If I got it the wrong way around, we could (1) save everything as UTF-8 encoded uint8_t, or (2) eliminate code points that are > 255 from our code base (I believe there are ~10)

As for size/speed AFAICT we tend to prefer speed over size (within reason of course):

node/common.gypi

Line 202 in 2f1ed5c

'FavorSizeOrSpeed': 1, # /Ot, favor speed over size

refack · 2019-04-05T00:39:11Z

Ping @bmeurer @hashseed ?

refack · 2019-04-05T01:08:55Z

BTW 657c979 is all the unicode code-point in our codebase

joyeecheung · 2019-04-05T01:21:07Z

I remember we explicitly leave multibyte characters in our code base to make sure our code compiles that way? (but I have no context anymore, maybe @Fishrock123 @bnoordhuis would know)

joyeecheung · 2019-04-05T01:30:08Z

@refack I believe in our case, the implementation that the article talks about translates to https://cs.chromium.org/chromium/src/v8/src/parsing/scanner-character-streams.cc?type=cs&q=BufferedCharacterStream&sq=package:chromium&g=0&l=250 ? ~~It's a utf16 view of a uint8_t buffer, the data is not actually converted~~ EDIT: looks like they are copied in chunks. If there is speed up, you should be able to see from the startup benchmark though.

joyeecheung · 2019-04-05T01:32:08Z

Lets see what comes out of the startup benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/325/

joyeecheung · 2019-04-05T01:50:55Z

confidence improvement accuracy (*)   (**)  (***)
09:45:03  misc/startup.js mode='process' script='benchmark/fixtures/require-cachable' dur=1                 0.85 %       ±0.88% ±1.17% ±1.53%
09:45:03  misc/startup.js mode='process' script='test/fixtures/semicolon' dur=1                             0.68 %       ±0.85% ±1.13% ±1.47%
09:45:03  misc/startup.js mode='worker' script='benchmark/fixtures/require-cachable' dur=1           *      1.43 %       ±1.31% ±1.75% ±2.30%
09:45:03  misc/startup.js mode='worker' script='test/fixtures/semicolon' dur=1                             -0.33 %       ±1.43% ±1.92% ±2.52%

From the benchmark

refack · 2019-04-05T02:11:33Z

File size has for Windows increased by 8%, so I'd say it's probably not worth it:

D:\refael\Downloads>dir /s node.exe
 Directory of D:\refael\Downloads\node-both-x64
2019-04-04  21:38        26,054,656 node.exe
 Directory of D:\refael\Downloads\node-utf16-x64
2019-04-04  21:28        27,846,144 node.exe

 Directory of D:\refael\Downloads\node-both.exe
2019-04-04  21:23        21,798,912 node.exe
 Directory of D:\refael\Downloads\node-utf16.exe
2019-04-04  21:25        23,717,888 node.exe

devsnek · 2019-04-05T02:22:35Z

i'm not as knowledgeable as joyee is about the compile pipeline but moving to u16 without any significant performance improvement seems like a 👎

nodejs-github-bot · 2019-04-05T02:52:27Z

CI: https://ci.nodejs.org/job/node-test-pull-request/22211/

* use `string_view` instead of `UnionBytes`

nodejs-github-bot · 2019-04-05T04:02:48Z

CI: https://ci.nodejs.org/job/node-test-pull-request/22216/

refack · 2019-04-05T04:03:43Z

I've taken this is a different direction. Replaced UnionBytes with std::string_view to see what it takes to get this compiling on all of our CI cluster.

So this is blocked until we get gcc-6 level compilers on all platforms.

bnoordhuis · 2019-04-05T13:53:36Z

I remember we explicitly leave multibyte characters in our code base to make sure our code compiles that way? (but I have no context anymore, maybe @Fishrock123 @bnoordhuis would know)

That's correct, see #10673 for background.

Fishrock123 · 2019-04-05T19:31:23Z

Yes, some code comments in timers use multi-byte characters (the ascii diagram).

Those characters could be safely discarded in a build if necessary, I think.

devsnek · 2019-04-05T19:53:53Z

Those characters could be safely discarded in a build if necessary, I think.

as long as the discard method preserves the proper string length (because of stack traces) this should be fine.

jasnell · 2020-06-25T14:16:11Z

This stalled and hasn't been moved forward in over a year. Closing but it can be reopened if it is picked back up again

tools: refactor js2c.py for maximal Python3 compatibility

14912d8

* simplify js2c semantics

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Apr 4, 2019

refack requested a review from joyeecheung April 4, 2019 23:20

refack self-assigned this Apr 4, 2019

refack requested a review from devsnek April 4, 2019 23:22

refack force-pushed the tweak-js2c-3 branch from 5199407 to 1e308ac Compare April 5, 2019 03:18

refack added 2 commits April 4, 2019 23:58

[temp]encode unicode code-points in code

38a3362

src: simplify Javascript code embedding

d5fd4e7

* use `string_view` instead of `UnionBytes`

refack force-pushed the tweak-js2c-3 branch from bea21f3 to d5fd4e7 Compare April 5, 2019 04:00

refack added the blocked PRs that are blocked by other issues or PRs. label Apr 5, 2019

refack mentioned this pull request May 30, 2019

tools: refactor js2c.py for maximal python3 compat #25518

Merged

3 tasks

Trott force-pushed the master branch from 1ecc406 to 49cf67e Compare September 17, 2019 16:51

devnexen force-pushed the master branch from e8a4568 to 5289f80 Compare December 26, 2019 19:46

BridgeAR force-pushed the master branch 2 times, most recently from 8ae28ff to 2935f72 Compare May 31, 2020 12:19

jasnell added the stalled Issues and PRs that are stalled. label Jun 25, 2020

jasnell closed this Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: simplify Javascript code embedding #27095

src: simplify Javascript code embedding #27095

refack commented Apr 4, 2019 •

edited

Loading

nodejs-github-bot commented Apr 4, 2019

refack commented Apr 4, 2019

nodejs-github-bot commented Apr 4, 2019

joyeecheung commented Apr 4, 2019

refack commented Apr 4, 2019 •

edited

Loading

joyeecheung commented Apr 5, 2019 •

edited

Loading

joyeecheung commented Apr 5, 2019

refack commented Apr 5, 2019

refack commented Apr 5, 2019

refack commented Apr 5, 2019

joyeecheung commented Apr 5, 2019

joyeecheung commented Apr 5, 2019 •

edited

Loading

joyeecheung commented Apr 5, 2019

joyeecheung commented Apr 5, 2019

refack commented Apr 5, 2019

devsnek commented Apr 5, 2019

nodejs-github-bot commented Apr 5, 2019

nodejs-github-bot commented Apr 5, 2019

refack commented Apr 5, 2019 •

edited

Loading

bnoordhuis commented Apr 5, 2019

Fishrock123 commented Apr 5, 2019

devsnek commented Apr 5, 2019 •

edited

Loading

jasnell commented Jun 25, 2020

src: simplify Javascript code embedding #27095

src: simplify Javascript code embedding #27095

Conversation

refack commented Apr 4, 2019 • edited Loading

Checklist

nodejs-github-bot commented Apr 4, 2019

refack commented Apr 4, 2019

nodejs-github-bot commented Apr 4, 2019

joyeecheung commented Apr 4, 2019

refack commented Apr 4, 2019 • edited Loading

joyeecheung commented Apr 5, 2019 • edited Loading

joyeecheung commented Apr 5, 2019

refack commented Apr 5, 2019

refack commented Apr 5, 2019

refack commented Apr 5, 2019

joyeecheung commented Apr 5, 2019

joyeecheung commented Apr 5, 2019 • edited Loading

joyeecheung commented Apr 5, 2019

joyeecheung commented Apr 5, 2019

refack commented Apr 5, 2019

devsnek commented Apr 5, 2019

nodejs-github-bot commented Apr 5, 2019

nodejs-github-bot commented Apr 5, 2019

refack commented Apr 5, 2019 • edited Loading

bnoordhuis commented Apr 5, 2019

Fishrock123 commented Apr 5, 2019

devsnek commented Apr 5, 2019 • edited Loading

jasnell commented Jun 25, 2020

refack commented Apr 4, 2019 •

edited

Loading

refack commented Apr 4, 2019 •

edited

Loading

joyeecheung commented Apr 5, 2019 •

edited

Loading

joyeecheung commented Apr 5, 2019 •

edited

Loading

refack commented Apr 5, 2019 •

edited

Loading

devsnek commented Apr 5, 2019 •

edited

Loading