Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_decoder: reimplement in C++ #18537

Closed
wants to merge 4 commits into from

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented Feb 2, 2018

Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

Benchmark results
$ ./node benchmark/compare.js --new ./node --old ./node-before --set n=25e4 string_decoder | Rscript benchmark/compare.R
[03:07:14|% 100| 2/2 files | 60/60 runs | 80/80 configs]: Done
                                                                                             confidence improvement accuracy (*)    (**)   (***)
 string_decoder/string-decoder-create.js n=250000 encoding="ascii"                                 ***    -77.32 %       ±4.32%  ±5.80%  ±7.68%
 string_decoder/string-decoder-create.js n=250000 encoding="AscII"                                 ***    -74.15 %       ±2.55%  ±3.43%  ±4.53%
 string_decoder/string-decoder-create.js n=250000 encoding="base64"                                ***    -56.88 %       ±4.75%  ±6.36%  ±8.37%
 string_decoder/string-decoder-create.js n=250000 encoding="ucs2"                                  ***    -64.00 %       ±3.53%  ±4.72%  ±6.21%
 string_decoder/string-decoder-create.js n=250000 encoding="UTF-16LE"                              ***    -54.64 %       ±3.82%  ±5.12%  ±6.74%
 string_decoder/string-decoder-create.js n=250000 encoding="utf-8"                                 ***    -66.98 %       ±4.92%  ±6.62%  ±8.75%
 string_decoder/string-decoder-create.js n=250000 encoding="UTF-8"                                 ***    -59.94 %       ±3.45%  ±4.61%  ±6.03%
 string_decoder/string-decoder-create.js n=250000 encoding="utf8"                                  ***    -66.55 %       ±4.65%  ±6.26%  ±8.28%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="ascii"                       12.89 %      ±14.67% ±19.54% ±25.46%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-ascii"         **     12.83 %       ±9.20% ±12.24% ±15.93%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-utf8"          **     12.19 %       ±8.97% ±11.93% ±15.53%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf16le"                      7.69 %      ±11.02% ±14.67% ±19.12%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf8"                         0.55 %       ±8.17% ±10.88% ±14.17%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="ascii"                  *     23.55 %      ±17.71% ±23.57% ±30.69%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-ascii"         ***     45.47 %      ±13.60% ±18.13% ±23.66%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-utf8"          ***     39.64 %      ±14.04% ±18.70% ±24.37%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf16le"                *     13.59 %      ±13.43% ±17.86% ±23.26%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf8"                         11.52 %      ±12.00% ±15.98% ±20.81%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="ascii"                   *     14.10 %      ±13.51% ±18.01% ±23.50%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-ascii"          ***     67.77 %      ±17.03% ±22.72% ±29.69%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-utf8"           ***     54.15 %      ±16.54% ±22.08% ±28.87%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf16le"                       15.89 %      ±16.84% ±22.44% ±29.26%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf8"                           9.28 %      ±12.42% ±16.52% ±21.51%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="ascii"                       10.09 %      ±10.59% ±14.09% ±18.34%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-ascii"                 6.93 %       ±8.86% ±11.79% ±15.36%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-utf8"           *      9.24 %       ±7.98% ±10.62% ±13.83%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf16le"                     -0.67 %       ±9.40% ±12.51% ±16.31%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf8"                         2.27 %       ±8.52% ±11.34% ±14.76%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="ascii"                          8.53 %      ±10.65% ±14.18% ±18.48%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-ascii"          ***     54.82 %      ±10.50% ±14.00% ±18.29%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-utf8"           ***     51.75 %      ±10.83% ±14.43% ±18.85%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf16le"                       -5.01 %       ±9.15% ±12.18% ±15.85%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf8"                           4.55 %       ±8.20% ±10.92% ±14.21%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="ascii"                    *     13.88 %      ±12.62% ±16.81% ±21.91%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-ascii"           ***     52.79 %      ±10.56% ±14.08% ±18.39%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-utf8"            ***     52.56 %      ±11.39% ±15.21% ±19.90%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf16le"                         0.43 %      ±10.06% ±13.38% ±17.42%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf8"                            3.33 %      ±10.17% ±13.53% ±17.63%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="ascii"                   ***     31.81 %      ±14.71% ±19.58% ±25.52%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-ascii"            ***     51.38 %      ±13.23% ±17.65% ±23.09%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-utf8"             ***     48.89 %      ±14.06% ±18.82% ±24.71%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf16le"                          8.70 %      ±12.70% ±16.90% ±22.01%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf8"                      *     17.94 %      ±14.58% ±19.40% ±25.26%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="ascii"                  **     13.42 %       ±9.78% ±13.03% ±17.00%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-ascii"          ***     49.02 %      ±10.60% ±14.13% ±18.45%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-utf8"           ***     52.44 %      ±11.03% ±14.72% ±19.23%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf16le"                       -5.55 %       ±9.18% ±12.22% ±15.92%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf8"                           6.32 %       ±8.12% ±10.80% ±14.07%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="ascii"                         8.21 %      ±12.47% ±16.61% ±21.65%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-ascii"         ***     28.39 %       ±9.32% ±12.42% ±16.19%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-utf8"          ***     21.83 %       ±9.94% ±13.24% ±17.26%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf16le"                       5.92 %      ±10.21% ±13.60% ±17.71%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf8"                          2.83 %       ±8.47% ±11.27% ±14.67%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="ascii"                   *     16.92 %      ±16.02% ±21.34% ±27.81%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-ascii"          ***     36.24 %      ±14.01% ±18.68% ±24.37%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-utf8"           ***     24.74 %      ±13.22% ±17.65% ±23.10%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf16le"                        6.26 %      ±12.01% ±15.98% ±20.81%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf8"                           1.51 %      ±10.89% ±14.50% ±18.88%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="ascii"                   **     25.58 %      ±19.19% ±25.54% ±33.28%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-ascii"           ***     67.36 %      ±15.88% ±21.21% ±27.76%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-utf8"            ***     55.56 %      ±17.89% ±23.92% ±31.36%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf16le"                 **     22.91 %      ±15.06% ±20.03% ±26.07%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf8"                           11.45 %      ±12.80% ±17.03% ±22.18%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="ascii"                  *     10.71 %      ±10.00% ±13.32% ±17.35%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-ascii"         ***     21.56 %       ±9.11% ±12.12% ±15.77%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-utf8"          ***     25.72 %       ±9.04% ±12.03% ±15.68%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf16le"                       0.64 %       ±9.49% ±12.64% ±16.48%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf8"                          1.64 %       ±8.54% ±11.37% ±14.80%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="ascii"                         11.38 %      ±11.58% ±15.44% ±20.15%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-ascii"          ***     44.38 %      ±10.72% ±14.31% ±18.70%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-utf8"           ***     40.75 %      ±10.92% ±14.57% ±19.03%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf16le"                        0.01 %       ±9.09% ±12.09% ±15.73%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf8"                           4.25 %       ±8.23% ±10.95% ±14.26%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="ascii"                    *     21.15 %      ±16.30% ±21.71% ±28.31%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-ascii"           ***     50.31 %      ±11.71% ±15.64% ±20.46%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-utf8"            ***     42.86 %      ±11.50% ±15.34% ±20.07%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf16le"                         3.08 %      ±14.43% ±19.21% ±25.01%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf8"                            5.19 %      ±10.15% ±13.51% ±17.59%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="ascii"                           12.04 %      ±16.01% ±21.32% ±27.79%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-ascii"            ***     52.38 %      ±15.90% ±21.27% ±27.92%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-utf8"             ***     38.86 %      ±16.88% ±22.57% ±29.60%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf16le"                  **     19.55 %      ±13.93% ±18.54% ±24.15%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf8"                     **     19.12 %      ±12.42% ±16.52% ±21.51%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="ascii"                  **     14.49 %       ±9.70% ±12.91% ±16.81%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-ascii"          ***     37.39 %      ±10.67% ±14.21% ±18.52%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-utf8"           ***     46.33 %      ±10.02% ±13.34% ±17.40%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf16le"                       -1.83 %       ±9.42% ±12.54% ±16.32%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf8"                           2.81 %       ±8.96% ±11.93% ±15.52%
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

string_decoder

CI: https://ci.nodejs.org/job/node-test-pull-request/12913
CI: https://ci.nodejs.org/job/node-test-commit/15905/

@addaleax addaleax added the string_decoder label Feb 2, 2018
@nodejs-github-bot nodejs-github-bot added c++ lib / src labels Feb 2, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

    $ ./node benchmark/compare.js --new ./node --old ./node-before --set n=25e4 string_decoder | Rscript benchmark/compare.R
    [03:07:14|% 100| 2/2 files | 60/60 runs | 80/80 configs]: Done
                                                                                                 confidence improvement accuracy (*)    (**)   (***)
     string_decoder/string-decoder-create.js n=250000 encoding="ascii"                                 ***    -77.32 %       ±4.32%  ±5.80%  ±7.68%
     string_decoder/string-decoder-create.js n=250000 encoding="AscII"                                 ***    -74.15 %       ±2.55%  ±3.43%  ±4.53%
     string_decoder/string-decoder-create.js n=250000 encoding="base64"                                ***    -56.88 %       ±4.75%  ±6.36%  ±8.37%
     string_decoder/string-decoder-create.js n=250000 encoding="ucs2"                                  ***    -64.00 %       ±3.53%  ±4.72%  ±6.21%
     string_decoder/string-decoder-create.js n=250000 encoding="UTF-16LE"                              ***    -54.64 %       ±3.82%  ±5.12%  ±6.74%
     string_decoder/string-decoder-create.js n=250000 encoding="utf-8"                                 ***    -66.98 %       ±4.92%  ±6.62%  ±8.75%
     string_decoder/string-decoder-create.js n=250000 encoding="UTF-8"                                 ***    -59.94 %       ±3.45%  ±4.61%  ±6.03%
     string_decoder/string-decoder-create.js n=250000 encoding="utf8"                                  ***    -66.55 %       ±4.65%  ±6.26%  ±8.28%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="ascii"                       12.89 %      ±14.67% ±19.54% ±25.46%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-ascii"         **     12.83 %       ±9.20% ±12.24% ±15.93%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-utf8"          **     12.19 %       ±8.97% ±11.93% ±15.53%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf16le"                      7.69 %      ±11.02% ±14.67% ±19.12%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf8"                         0.55 %       ±8.17% ±10.88% ±14.17%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="ascii"                  *     23.55 %      ±17.71% ±23.57% ±30.69%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-ascii"         ***     45.47 %      ±13.60% ±18.13% ±23.66%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-utf8"          ***     39.64 %      ±14.04% ±18.70% ±24.37%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf16le"                *     13.59 %      ±13.43% ±17.86% ±23.26%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf8"                         11.52 %      ±12.00% ±15.98% ±20.81%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="ascii"                   *     14.10 %      ±13.51% ±18.01% ±23.50%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-ascii"          ***     67.77 %      ±17.03% ±22.72% ±29.69%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-utf8"           ***     54.15 %      ±16.54% ±22.08% ±28.87%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf16le"                       15.89 %      ±16.84% ±22.44% ±29.26%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf8"                           9.28 %      ±12.42% ±16.52% ±21.51%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="ascii"                       10.09 %      ±10.59% ±14.09% ±18.34%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-ascii"                 6.93 %       ±8.86% ±11.79% ±15.36%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-utf8"           *      9.24 %       ±7.98% ±10.62% ±13.83%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf16le"                     -0.67 %       ±9.40% ±12.51% ±16.31%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf8"                         2.27 %       ±8.52% ±11.34% ±14.76%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="ascii"                          8.53 %      ±10.65% ±14.18% ±18.48%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-ascii"          ***     54.82 %      ±10.50% ±14.00% ±18.29%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-utf8"           ***     51.75 %      ±10.83% ±14.43% ±18.85%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf16le"                       -5.01 %       ±9.15% ±12.18% ±15.85%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf8"                           4.55 %       ±8.20% ±10.92% ±14.21%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="ascii"                    *     13.88 %      ±12.62% ±16.81% ±21.91%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-ascii"           ***     52.79 %      ±10.56% ±14.08% ±18.39%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-utf8"            ***     52.56 %      ±11.39% ±15.21% ±19.90%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf16le"                         0.43 %      ±10.06% ±13.38% ±17.42%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf8"                            3.33 %      ±10.17% ±13.53% ±17.63%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="ascii"                   ***     31.81 %      ±14.71% ±19.58% ±25.52%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-ascii"            ***     51.38 %      ±13.23% ±17.65% ±23.09%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-utf8"             ***     48.89 %      ±14.06% ±18.82% ±24.71%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf16le"                          8.70 %      ±12.70% ±16.90% ±22.01%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf8"                      *     17.94 %      ±14.58% ±19.40% ±25.26%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="ascii"                  **     13.42 %       ±9.78% ±13.03% ±17.00%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-ascii"          ***     49.02 %      ±10.60% ±14.13% ±18.45%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-utf8"           ***     52.44 %      ±11.03% ±14.72% ±19.23%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf16le"                       -5.55 %       ±9.18% ±12.22% ±15.92%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf8"                           6.32 %       ±8.12% ±10.80% ±14.07%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="ascii"                         8.21 %      ±12.47% ±16.61% ±21.65%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-ascii"         ***     28.39 %       ±9.32% ±12.42% ±16.19%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-utf8"          ***     21.83 %       ±9.94% ±13.24% ±17.26%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf16le"                       5.92 %      ±10.21% ±13.60% ±17.71%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf8"                          2.83 %       ±8.47% ±11.27% ±14.67%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="ascii"                   *     16.92 %      ±16.02% ±21.34% ±27.81%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-ascii"          ***     36.24 %      ±14.01% ±18.68% ±24.37%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-utf8"           ***     24.74 %      ±13.22% ±17.65% ±23.10%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf16le"                        6.26 %      ±12.01% ±15.98% ±20.81%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf8"                           1.51 %      ±10.89% ±14.50% ±18.88%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="ascii"                   **     25.58 %      ±19.19% ±25.54% ±33.28%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-ascii"           ***     67.36 %      ±15.88% ±21.21% ±27.76%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-utf8"            ***     55.56 %      ±17.89% ±23.92% ±31.36%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf16le"                 **     22.91 %      ±15.06% ±20.03% ±26.07%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf8"                           11.45 %      ±12.80% ±17.03% ±22.18%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="ascii"                  *     10.71 %      ±10.00% ±13.32% ±17.35%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-ascii"         ***     21.56 %       ±9.11% ±12.12% ±15.77%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-utf8"          ***     25.72 %       ±9.04% ±12.03% ±15.68%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf16le"                       0.64 %       ±9.49% ±12.64% ±16.48%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf8"                          1.64 %       ±8.54% ±11.37% ±14.80%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="ascii"                         11.38 %      ±11.58% ±15.44% ±20.15%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-ascii"          ***     44.38 %      ±10.72% ±14.31% ±18.70%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-utf8"           ***     40.75 %      ±10.92% ±14.57% ±19.03%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf16le"                        0.01 %       ±9.09% ±12.09% ±15.73%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf8"                           4.25 %       ±8.23% ±10.95% ±14.26%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="ascii"                    *     21.15 %      ±16.30% ±21.71% ±28.31%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-ascii"           ***     50.31 %      ±11.71% ±15.64% ±20.46%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-utf8"            ***     42.86 %      ±11.50% ±15.34% ±20.07%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf16le"                         3.08 %      ±14.43% ±19.21% ±25.01%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf8"                            5.19 %      ±10.15% ±13.51% ±17.59%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="ascii"                           12.04 %      ±16.01% ±21.32% ±27.79%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-ascii"            ***     52.38 %      ±15.90% ±21.27% ±27.92%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-utf8"             ***     38.86 %      ±16.88% ±22.57% ±29.60%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf16le"                  **     19.55 %      ±13.93% ±18.54% ±24.15%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf8"                     **     19.12 %      ±12.42% ±16.52% ±21.51%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="ascii"                  **     14.49 %       ±9.70% ±12.91% ±16.81%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-ascii"          ***     37.39 %      ±10.67% ±14.21% ±18.52%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-utf8"           ***     46.33 %      ±10.02% ±13.34% ±17.40%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf16le"                       -1.83 %       ±9.42% ±12.54% ±16.32%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf8"                           2.81 %       ±8.96% ±11.93% ±15.52%
@addaleax addaleax force-pushed the stream-decoder-native branch from 5cea9aa to 7eae61d Compare Feb 2, 2018
jasnell
jasnell approved these changes Feb 5, 2018
@addaleax addaleax added the author ready label Feb 6, 2018
ssize_t* nread_ptr) {
Local<String> prepend, body;

size_t nread = *nread_ptr;
Copy link
Member

@bnoordhuis bnoordhuis Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nread_ptr should probably be a size_t* (see comment further down) but otherwise do a CHECK_GE(*nread_ptr, 0) sanity check first.

#endif
state_[kBufferedBytes]++;
if ((data[i] & 0xC0) == 0x80) {
// This byte does not start a character (a "trailing" bytes).
Copy link
Member

@bnoordhuis bnoordhuis Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: s/bytes/byte/

}
}

if (!prepend.IsEmpty()) {
Copy link
Member

@bnoordhuis bnoordhuis Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably reads a bit more natural if you invert (or un-invert really) the logic:

if (prepend.IsEmpty()) return body;
return String::Concat(prepend, body);

CHECK_EQ(BufferedBytes(), 0);
}

if (Encoding() == UCS2 && (BufferedBytes() % 2) == 1) {
Copy link
Member

@bnoordhuis bnoordhuis Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous parens.

StringDecoder* decoder =
reinterpret_cast<StringDecoder*>(Buffer::Data(args[0]));
CHECK_NE(decoder, nullptr);
ssize_t nread = Buffer::Length(args[1]);
Copy link
Member

@bnoordhuis bnoordhuis Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer::Length() returns a size_t.

class StringDecoder {
public:
StringDecoder() { state_[kEncodingField] = BUFFER; }
void SetEncoding(enum encoding encoding);
Copy link
Member

@bnoordhuis bnoordhuis Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods that are implemented in string_decoder-inl.h should be marked inline here.

Copy link
Member Author

@addaleax addaleax Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnoordhuis I’m happy to do that, but is there a specific reason for that style?

It should be transparent to consumers of the header file whether a function is provided inline or not, right? And e.g. our HTTP2 code is a wonderful example of how these would get out of sync anyway when code is being moved around…

Copy link
Member

@bnoordhuis bnoordhuis Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a style issue, at least not exactly; it's to give the methods inline linkage right off the bat.

Right now, if you include string_decoder.h (but not string_decoder-inl.h) and call SetEncoding(), it will compile just fine but generate a link-time error.

If instead you mark it inline in the class definition, it's a compile-time error, or at least a warning.

@addaleax addaleax removed the author ready label Feb 9, 2018
@addaleax addaleax added the author ready label Feb 9, 2018
@addaleax
Copy link
Member Author

@addaleax addaleax commented Feb 9, 2018

@bnoordhuis Okay, should have applied everything now.

CI: https://ci.nodejs.org/job/node-test-commit/16099/

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Only did a quick scan this time around but LGTM.

@BridgeAR
Copy link
Member

@BridgeAR BridgeAR commented Feb 10, 2018

Landed in 180af17 🎉

@BridgeAR BridgeAR closed this Feb 10, 2018
BridgeAR pushed a commit to BridgeAR/node that referenced this issue Feb 10, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@TimothyGu
Copy link
Member

@TimothyGu TimothyGu commented Feb 10, 2018

Hmm, it would be a fun experiment to try implementing this in WebAssembly…

@addaleax addaleax deleted the stream-decoder-native branch Feb 10, 2018
@addaleax
Copy link
Member Author

@addaleax addaleax commented Feb 10, 2018

@TimothyGu As long as you have no way to decode/encode strings as a JS or WASM primitive (which doesn’t call into native code), it isn’t really going to be faster… I think the main speedup here is from the reduced # of JS/C++ boundary calls?

@apapirovski
Copy link
Member

@apapirovski apapirovski commented Feb 12, 2018

@addaleax It appears this broke https://github.com/expressjs/body-parser in CitGM. I confirmed locally by reverting this commit and running it successfully against the test suite. Would you have time to look into it?

apapirovski added a commit that referenced this issue Feb 13, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: #18723
Refs: #18537
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
@MylesBorins
Copy link
Member

@MylesBorins MylesBorins commented Feb 21, 2018

Should this be backported to v9.x-staging? If yes please follow the guide and raise a backport PR, if not let me know or add the dont-land-on label.

@addaleax addaleax added backport-requested-v9.x and removed backport-requested-v9.x author ready labels Feb 21, 2018
@addaleax addaleax added this to To do in StreamBase refactor via automation Apr 1, 2018
@addaleax addaleax moved this from To do to Done in StreamBase refactor Apr 1, 2018
BridgeAR pushed a commit to BridgeAR/node that referenced this issue May 1, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
BridgeAR pushed a commit to BridgeAR/node that referenced this issue May 1, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: nodejs#18723
Refs: nodejs#18537
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
MayaLekova pushed a commit to MayaLekova/node that referenced this issue May 8, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
MayaLekova pushed a commit to MayaLekova/node that referenced this issue May 8, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: nodejs#18723
Refs: nodejs#18537
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ lib / src string_decoder
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants