Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_decoder: reimplement in C++ #18537

Closed
wants to merge 4 commits into from

Conversation

@addaleax
Copy link
Member

addaleax commented Feb 2, 2018

Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

Benchmark results
$ ./node benchmark/compare.js --new ./node --old ./node-before --set n=25e4 string_decoder | Rscript benchmark/compare.R
[03:07:14|% 100| 2/2 files | 60/60 runs | 80/80 configs]: Done
                                                                                             confidence improvement accuracy (*)    (**)   (***)
 string_decoder/string-decoder-create.js n=250000 encoding="ascii"                                 ***    -77.32 %       ±4.32%  ±5.80%  ±7.68%
 string_decoder/string-decoder-create.js n=250000 encoding="AscII"                                 ***    -74.15 %       ±2.55%  ±3.43%  ±4.53%
 string_decoder/string-decoder-create.js n=250000 encoding="base64"                                ***    -56.88 %       ±4.75%  ±6.36%  ±8.37%
 string_decoder/string-decoder-create.js n=250000 encoding="ucs2"                                  ***    -64.00 %       ±3.53%  ±4.72%  ±6.21%
 string_decoder/string-decoder-create.js n=250000 encoding="UTF-16LE"                              ***    -54.64 %       ±3.82%  ±5.12%  ±6.74%
 string_decoder/string-decoder-create.js n=250000 encoding="utf-8"                                 ***    -66.98 %       ±4.92%  ±6.62%  ±8.75%
 string_decoder/string-decoder-create.js n=250000 encoding="UTF-8"                                 ***    -59.94 %       ±3.45%  ±4.61%  ±6.03%
 string_decoder/string-decoder-create.js n=250000 encoding="utf8"                                  ***    -66.55 %       ±4.65%  ±6.26%  ±8.28%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="ascii"                       12.89 %      ±14.67% ±19.54% ±25.46%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-ascii"         **     12.83 %       ±9.20% ±12.24% ±15.93%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-utf8"          **     12.19 %       ±8.97% ±11.93% ±15.53%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf16le"                      7.69 %      ±11.02% ±14.67% ±19.12%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf8"                         0.55 %       ±8.17% ±10.88% ±14.17%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="ascii"                  *     23.55 %      ±17.71% ±23.57% ±30.69%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-ascii"         ***     45.47 %      ±13.60% ±18.13% ±23.66%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-utf8"          ***     39.64 %      ±14.04% ±18.70% ±24.37%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf16le"                *     13.59 %      ±13.43% ±17.86% ±23.26%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf8"                         11.52 %      ±12.00% ±15.98% ±20.81%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="ascii"                   *     14.10 %      ±13.51% ±18.01% ±23.50%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-ascii"          ***     67.77 %      ±17.03% ±22.72% ±29.69%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-utf8"           ***     54.15 %      ±16.54% ±22.08% ±28.87%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf16le"                       15.89 %      ±16.84% ±22.44% ±29.26%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf8"                           9.28 %      ±12.42% ±16.52% ±21.51%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="ascii"                       10.09 %      ±10.59% ±14.09% ±18.34%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-ascii"                 6.93 %       ±8.86% ±11.79% ±15.36%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-utf8"           *      9.24 %       ±7.98% ±10.62% ±13.83%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf16le"                     -0.67 %       ±9.40% ±12.51% ±16.31%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf8"                         2.27 %       ±8.52% ±11.34% ±14.76%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="ascii"                          8.53 %      ±10.65% ±14.18% ±18.48%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-ascii"          ***     54.82 %      ±10.50% ±14.00% ±18.29%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-utf8"           ***     51.75 %      ±10.83% ±14.43% ±18.85%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf16le"                       -5.01 %       ±9.15% ±12.18% ±15.85%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf8"                           4.55 %       ±8.20% ±10.92% ±14.21%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="ascii"                    *     13.88 %      ±12.62% ±16.81% ±21.91%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-ascii"           ***     52.79 %      ±10.56% ±14.08% ±18.39%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-utf8"            ***     52.56 %      ±11.39% ±15.21% ±19.90%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf16le"                         0.43 %      ±10.06% ±13.38% ±17.42%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf8"                            3.33 %      ±10.17% ±13.53% ±17.63%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="ascii"                   ***     31.81 %      ±14.71% ±19.58% ±25.52%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-ascii"            ***     51.38 %      ±13.23% ±17.65% ±23.09%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-utf8"             ***     48.89 %      ±14.06% ±18.82% ±24.71%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf16le"                          8.70 %      ±12.70% ±16.90% ±22.01%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf8"                      *     17.94 %      ±14.58% ±19.40% ±25.26%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="ascii"                  **     13.42 %       ±9.78% ±13.03% ±17.00%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-ascii"          ***     49.02 %      ±10.60% ±14.13% ±18.45%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-utf8"           ***     52.44 %      ±11.03% ±14.72% ±19.23%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf16le"                       -5.55 %       ±9.18% ±12.22% ±15.92%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf8"                           6.32 %       ±8.12% ±10.80% ±14.07%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="ascii"                         8.21 %      ±12.47% ±16.61% ±21.65%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-ascii"         ***     28.39 %       ±9.32% ±12.42% ±16.19%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-utf8"          ***     21.83 %       ±9.94% ±13.24% ±17.26%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf16le"                       5.92 %      ±10.21% ±13.60% ±17.71%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf8"                          2.83 %       ±8.47% ±11.27% ±14.67%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="ascii"                   *     16.92 %      ±16.02% ±21.34% ±27.81%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-ascii"          ***     36.24 %      ±14.01% ±18.68% ±24.37%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-utf8"           ***     24.74 %      ±13.22% ±17.65% ±23.10%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf16le"                        6.26 %      ±12.01% ±15.98% ±20.81%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf8"                           1.51 %      ±10.89% ±14.50% ±18.88%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="ascii"                   **     25.58 %      ±19.19% ±25.54% ±33.28%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-ascii"           ***     67.36 %      ±15.88% ±21.21% ±27.76%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-utf8"            ***     55.56 %      ±17.89% ±23.92% ±31.36%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf16le"                 **     22.91 %      ±15.06% ±20.03% ±26.07%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf8"                           11.45 %      ±12.80% ±17.03% ±22.18%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="ascii"                  *     10.71 %      ±10.00% ±13.32% ±17.35%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-ascii"         ***     21.56 %       ±9.11% ±12.12% ±15.77%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-utf8"          ***     25.72 %       ±9.04% ±12.03% ±15.68%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf16le"                       0.64 %       ±9.49% ±12.64% ±16.48%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf8"                          1.64 %       ±8.54% ±11.37% ±14.80%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="ascii"                         11.38 %      ±11.58% ±15.44% ±20.15%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-ascii"          ***     44.38 %      ±10.72% ±14.31% ±18.70%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-utf8"           ***     40.75 %      ±10.92% ±14.57% ±19.03%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf16le"                        0.01 %       ±9.09% ±12.09% ±15.73%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf8"                           4.25 %       ±8.23% ±10.95% ±14.26%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="ascii"                    *     21.15 %      ±16.30% ±21.71% ±28.31%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-ascii"           ***     50.31 %      ±11.71% ±15.64% ±20.46%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-utf8"            ***     42.86 %      ±11.50% ±15.34% ±20.07%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf16le"                         3.08 %      ±14.43% ±19.21% ±25.01%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf8"                            5.19 %      ±10.15% ±13.51% ±17.59%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="ascii"                           12.04 %      ±16.01% ±21.32% ±27.79%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-ascii"            ***     52.38 %      ±15.90% ±21.27% ±27.92%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-utf8"             ***     38.86 %      ±16.88% ±22.57% ±29.60%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf16le"                  **     19.55 %      ±13.93% ±18.54% ±24.15%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf8"                     **     19.12 %      ±12.42% ±16.52% ±21.51%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="ascii"                  **     14.49 %       ±9.70% ±12.91% ±16.81%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-ascii"          ***     37.39 %      ±10.67% ±14.21% ±18.52%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-utf8"           ***     46.33 %      ±10.02% ±13.34% ±17.40%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf16le"                       -1.83 %       ±9.42% ±12.54% ±16.32%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf8"                           2.81 %       ±8.96% ±11.93% ±15.52%
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

string_decoder

CI: https://ci.nodejs.org/job/node-test-pull-request/12913
CI: https://ci.nodejs.org/job/node-test-commit/15905/

Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

    $ ./node benchmark/compare.js --new ./node --old ./node-before --set n=25e4 string_decoder | Rscript benchmark/compare.R
    [03:07:14|% 100| 2/2 files | 60/60 runs | 80/80 configs]: Done
                                                                                                 confidence improvement accuracy (*)    (**)   (***)
     string_decoder/string-decoder-create.js n=250000 encoding="ascii"                                 ***    -77.32 %       ±4.32%  ±5.80%  ±7.68%
     string_decoder/string-decoder-create.js n=250000 encoding="AscII"                                 ***    -74.15 %       ±2.55%  ±3.43%  ±4.53%
     string_decoder/string-decoder-create.js n=250000 encoding="base64"                                ***    -56.88 %       ±4.75%  ±6.36%  ±8.37%
     string_decoder/string-decoder-create.js n=250000 encoding="ucs2"                                  ***    -64.00 %       ±3.53%  ±4.72%  ±6.21%
     string_decoder/string-decoder-create.js n=250000 encoding="UTF-16LE"                              ***    -54.64 %       ±3.82%  ±5.12%  ±6.74%
     string_decoder/string-decoder-create.js n=250000 encoding="utf-8"                                 ***    -66.98 %       ±4.92%  ±6.62%  ±8.75%
     string_decoder/string-decoder-create.js n=250000 encoding="UTF-8"                                 ***    -59.94 %       ±3.45%  ±4.61%  ±6.03%
     string_decoder/string-decoder-create.js n=250000 encoding="utf8"                                  ***    -66.55 %       ±4.65%  ±6.26%  ±8.28%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="ascii"                       12.89 %      ±14.67% ±19.54% ±25.46%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-ascii"         **     12.83 %       ±9.20% ±12.24% ±15.93%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-utf8"          **     12.19 %       ±8.97% ±11.93% ±15.53%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf16le"                      7.69 %      ±11.02% ±14.67% ±19.12%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf8"                         0.55 %       ±8.17% ±10.88% ±14.17%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="ascii"                  *     23.55 %      ±17.71% ±23.57% ±30.69%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-ascii"         ***     45.47 %      ±13.60% ±18.13% ±23.66%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-utf8"          ***     39.64 %      ±14.04% ±18.70% ±24.37%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf16le"                *     13.59 %      ±13.43% ±17.86% ±23.26%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf8"                         11.52 %      ±12.00% ±15.98% ±20.81%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="ascii"                   *     14.10 %      ±13.51% ±18.01% ±23.50%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-ascii"          ***     67.77 %      ±17.03% ±22.72% ±29.69%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-utf8"           ***     54.15 %      ±16.54% ±22.08% ±28.87%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf16le"                       15.89 %      ±16.84% ±22.44% ±29.26%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf8"                           9.28 %      ±12.42% ±16.52% ±21.51%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="ascii"                       10.09 %      ±10.59% ±14.09% ±18.34%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-ascii"                 6.93 %       ±8.86% ±11.79% ±15.36%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-utf8"           *      9.24 %       ±7.98% ±10.62% ±13.83%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf16le"                     -0.67 %       ±9.40% ±12.51% ±16.31%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf8"                         2.27 %       ±8.52% ±11.34% ±14.76%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="ascii"                          8.53 %      ±10.65% ±14.18% ±18.48%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-ascii"          ***     54.82 %      ±10.50% ±14.00% ±18.29%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-utf8"           ***     51.75 %      ±10.83% ±14.43% ±18.85%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf16le"                       -5.01 %       ±9.15% ±12.18% ±15.85%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf8"                           4.55 %       ±8.20% ±10.92% ±14.21%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="ascii"                    *     13.88 %      ±12.62% ±16.81% ±21.91%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-ascii"           ***     52.79 %      ±10.56% ±14.08% ±18.39%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-utf8"            ***     52.56 %      ±11.39% ±15.21% ±19.90%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf16le"                         0.43 %      ±10.06% ±13.38% ±17.42%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf8"                            3.33 %      ±10.17% ±13.53% ±17.63%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="ascii"                   ***     31.81 %      ±14.71% ±19.58% ±25.52%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-ascii"            ***     51.38 %      ±13.23% ±17.65% ±23.09%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-utf8"             ***     48.89 %      ±14.06% ±18.82% ±24.71%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf16le"                          8.70 %      ±12.70% ±16.90% ±22.01%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf8"                      *     17.94 %      ±14.58% ±19.40% ±25.26%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="ascii"                  **     13.42 %       ±9.78% ±13.03% ±17.00%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-ascii"          ***     49.02 %      ±10.60% ±14.13% ±18.45%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-utf8"           ***     52.44 %      ±11.03% ±14.72% ±19.23%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf16le"                       -5.55 %       ±9.18% ±12.22% ±15.92%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf8"                           6.32 %       ±8.12% ±10.80% ±14.07%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="ascii"                         8.21 %      ±12.47% ±16.61% ±21.65%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-ascii"         ***     28.39 %       ±9.32% ±12.42% ±16.19%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-utf8"          ***     21.83 %       ±9.94% ±13.24% ±17.26%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf16le"                       5.92 %      ±10.21% ±13.60% ±17.71%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf8"                          2.83 %       ±8.47% ±11.27% ±14.67%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="ascii"                   *     16.92 %      ±16.02% ±21.34% ±27.81%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-ascii"          ***     36.24 %      ±14.01% ±18.68% ±24.37%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-utf8"           ***     24.74 %      ±13.22% ±17.65% ±23.10%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf16le"                        6.26 %      ±12.01% ±15.98% ±20.81%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf8"                           1.51 %      ±10.89% ±14.50% ±18.88%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="ascii"                   **     25.58 %      ±19.19% ±25.54% ±33.28%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-ascii"           ***     67.36 %      ±15.88% ±21.21% ±27.76%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-utf8"            ***     55.56 %      ±17.89% ±23.92% ±31.36%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf16le"                 **     22.91 %      ±15.06% ±20.03% ±26.07%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf8"                           11.45 %      ±12.80% ±17.03% ±22.18%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="ascii"                  *     10.71 %      ±10.00% ±13.32% ±17.35%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-ascii"         ***     21.56 %       ±9.11% ±12.12% ±15.77%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-utf8"          ***     25.72 %       ±9.04% ±12.03% ±15.68%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf16le"                       0.64 %       ±9.49% ±12.64% ±16.48%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf8"                          1.64 %       ±8.54% ±11.37% ±14.80%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="ascii"                         11.38 %      ±11.58% ±15.44% ±20.15%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-ascii"          ***     44.38 %      ±10.72% ±14.31% ±18.70%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-utf8"           ***     40.75 %      ±10.92% ±14.57% ±19.03%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf16le"                        0.01 %       ±9.09% ±12.09% ±15.73%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf8"                           4.25 %       ±8.23% ±10.95% ±14.26%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="ascii"                    *     21.15 %      ±16.30% ±21.71% ±28.31%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-ascii"           ***     50.31 %      ±11.71% ±15.64% ±20.46%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-utf8"            ***     42.86 %      ±11.50% ±15.34% ±20.07%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf16le"                         3.08 %      ±14.43% ±19.21% ±25.01%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf8"                            5.19 %      ±10.15% ±13.51% ±17.59%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="ascii"                           12.04 %      ±16.01% ±21.32% ±27.79%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-ascii"            ***     52.38 %      ±15.90% ±21.27% ±27.92%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-utf8"             ***     38.86 %      ±16.88% ±22.57% ±29.60%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf16le"                  **     19.55 %      ±13.93% ±18.54% ±24.15%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf8"                     **     19.12 %      ±12.42% ±16.52% ±21.51%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="ascii"                  **     14.49 %       ±9.70% ±12.91% ±16.81%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-ascii"          ***     37.39 %      ±10.67% ±14.21% ±18.52%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-utf8"           ***     46.33 %      ±10.02% ±13.34% ±17.40%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf16le"                       -1.83 %       ±9.42% ±12.54% ±16.32%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf8"                           2.81 %       ±8.96% ±11.93% ±15.52%
@addaleax addaleax force-pushed the addaleax:stream-decoder-native branch from 5cea9aa to 7eae61d Feb 2, 2018
…fered
@jasnell
jasnell approved these changes Feb 5, 2018
ssize_t* nread_ptr) {
Local<String> prepend, body;

size_t nread = *nread_ptr;

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Feb 8, 2018

Member

nread_ptr should probably be a size_t* (see comment further down) but otherwise do a CHECK_GE(*nread_ptr, 0) sanity check first.

#endif
state_[kBufferedBytes]++;
if ((data[i] & 0xC0) == 0x80) {
// This byte does not start a character (a "trailing" bytes).

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Feb 8, 2018

Member

Typo: s/bytes/byte/

}
}

if (!prepend.IsEmpty()) {

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Feb 8, 2018

Member

Arguably reads a bit more natural if you invert (or un-invert really) the logic:

if (prepend.IsEmpty()) return body;
return String::Concat(prepend, body);
CHECK_EQ(BufferedBytes(), 0);
}

if (Encoding() == UCS2 && (BufferedBytes() % 2) == 1) {

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Feb 8, 2018

Member

Superfluous parens.

StringDecoder* decoder =
reinterpret_cast<StringDecoder*>(Buffer::Data(args[0]));
CHECK_NE(decoder, nullptr);
ssize_t nread = Buffer::Length(args[1]);

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Feb 8, 2018

Member

Buffer::Length() returns a size_t.

class StringDecoder {
public:
StringDecoder() { state_[kEncodingField] = BUFFER; }
void SetEncoding(enum encoding encoding);

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Feb 8, 2018

Member

The methods that are implemented in string_decoder-inl.h should be marked inline here.

This comment has been minimized.

Copy link
@addaleax

addaleax Feb 9, 2018

Author Member

@bnoordhuis I’m happy to do that, but is there a specific reason for that style?

It should be transparent to consumers of the header file whether a function is provided inline or not, right? And e.g. our HTTP2 code is a wonderful example of how these would get out of sync anyway when code is being moved around…

This comment has been minimized.

Copy link
@bnoordhuis

bnoordhuis Feb 9, 2018

Member

It's not a style issue, at least not exactly; it's to give the methods inline linkage right off the bat.

Right now, if you include string_decoder.h (but not string_decoder-inl.h) and call SetEncoding(), it will compile just fine but generate a link-time error.

If instead you mark it inline in the class definition, it's a compile-time error, or at least a warning.

@addaleax addaleax removed the author ready label Feb 9, 2018
@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Feb 9, 2018

@bnoordhuis Okay, should have applied everything now.

CI: https://ci.nodejs.org/job/node-test-commit/16099/

Copy link
Member

bnoordhuis left a comment

Only did a quick scan this time around but LGTM.

@BridgeAR

This comment has been minimized.

Copy link
Member

BridgeAR commented Feb 10, 2018

Landed in 180af17 🎉

@BridgeAR BridgeAR closed this Feb 10, 2018
BridgeAR added a commit to BridgeAR/node that referenced this pull request Feb 10, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@TimothyGu

This comment has been minimized.

Copy link
Member

TimothyGu commented Feb 10, 2018

Hmm, it would be a fun experiment to try implementing this in WebAssembly…

@addaleax addaleax deleted the addaleax:stream-decoder-native branch Feb 10, 2018
@addaleax

This comment has been minimized.

Copy link
Member Author

addaleax commented Feb 10, 2018

@TimothyGu As long as you have no way to decode/encode strings as a JS or WASM primitive (which doesn’t call into native code), it isn’t really going to be faster… I think the main speedup here is from the reduced # of JS/C++ boundary calls?

@apapirovski

This comment has been minimized.

Copy link
Member

apapirovski commented Feb 12, 2018

@addaleax It appears this broke https://github.com/expressjs/body-parser in CitGM. I confirmed locally by reverting this commit and running it successfully against the test suite. Would you have time to look into it?

@apapirovski apapirovski mentioned this pull request Feb 12, 2018
3 of 3 tasks complete
apapirovski added a commit that referenced this pull request Feb 13, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: #18723
Refs: #18537
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
@MylesBorins

This comment has been minimized.

Copy link
Member

MylesBorins commented Feb 21, 2018

Should this be backported to v9.x-staging? If yes please follow the guide and raise a backport PR, if not let me know or add the dont-land-on label.

@addaleax addaleax added this to To do in StreamBase refactor via automation Apr 1, 2018
@addaleax addaleax moved this from To do to Done in StreamBase refactor Apr 1, 2018
BridgeAR added a commit to BridgeAR/node that referenced this pull request May 1, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
BridgeAR added a commit to BridgeAR/node that referenced this pull request May 1, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: nodejs#18723
Refs: nodejs#18537
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
MayaLekova added a commit to MayaLekova/node that referenced this pull request May 8, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
MayaLekova added a commit to MayaLekova/node that referenced this pull request May 8, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: nodejs#18723
Refs: nodejs#18537
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
8 participants
You can’t perform that action at this time.