Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review String -> RubyString UTF (8) encoding #5239

Merged
merged 10 commits into from Aug 21, 2018
Merged

Commits on Jul 7, 2018

  1. [fix] encode UTF-16 without unwrapping Java String internals

    ... this likely isn't used that much since it might have failed
    for cases where String's char[] is shared (int offset being > 0)
    also this would need special care in Java 10 where its a byte[]
    kares committed Jul 7, 2018
    Copy the full SHA
    1a09fe3 View commit details
    Browse the repository at this point in the history
  2. [refactor] avoid a byte[] copy on encode when possible

    ... for non-direct ByteBuffer we can extract bytes directly
    kares committed Jul 7, 2018
    Copy the full SHA
    335dc2e View commit details
    Browse the repository at this point in the history
  3. handle String/CharSequence decoding slightly differently

    doing toString() does not make a difference in micro-benchmarks
    thus could as well not char[] copy esp. since one doesn't know what kind
    of CharSequence objects might come along ...
    kares committed Jul 7, 2018
    Copy the full SHA
    c253b45 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    c766b25 View commit details
    Browse the repository at this point in the history
  5. [refactor] use separate encode String/CS paths

    and no longer fill in null encoding - we always pass it down
    
    interestingly, with micro-benchmarks, this seems to run better
    passing a StringBuilder down seems to get a very noticeable speed
    improvement, while String cases stay around the same performance
    kares committed Jul 7, 2018
    Copy the full SHA
    c0ec7c8 View commit details
    Browse the repository at this point in the history

Commits on Jul 16, 2018

  1. Copy the full SHA
    e9ac48e View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    5717f94 View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    c63ef49 View commit details
    Browse the repository at this point in the history
  4. [bench] add an artificial yet useful UTF-8 encoding benchmark

    BEFORE:
    ```
    
    Benchmark                                                   Mode  Cnt
    Score     Error   Units
    EncodingBenchmark.benchLongRubyStringNew                   thrpt    5
    7104.064 ± 252.231  ops/ms
    EncodingBenchmark.benchLongRubyStringNewCharSequence       thrpt    5
    6882.044 ± 133.946  ops/ms
    EncodingBenchmark.benchLongRubyStringNewCharSequence2      thrpt    5
    7059.163 ± 208.203  ops/ms
    EncodingBenchmark.benchLongRubyStringNewCharSequence3      thrpt    5
    7177.851 ± 188.033  ops/ms
    EncodingBenchmark.benchShortRubyStringNew                  thrpt    5
    15108.288 ± 282.496  ops/ms
    EncodingBenchmark.benchShortRubyStringNewCharSequence      thrpt    5
    14342.470 ± 101.090  ops/ms
    EncodingBenchmark.benchVeryLongRubyStringNew               thrpt    5
    1173.092 ±   8.716  ops/ms
    EncodingBenchmark.benchVeryLongRubyStringNewCharSequence   thrpt    5
    1017.636 ±  58.843  ops/ms
    EncodingBenchmark.benchVeryLongRubyStringNewCharSequence2  thrpt    5
    1065.907 ±  26.763  ops/ms
    ```
    
     AFTER:
    ```
    
    Benchmark                                                   Mode  Cnt
    Score      Error   Units
    EncodingBenchmark.benchLongRubyStringNew                   thrpt    5
    7205.086 ±  474.930  ops/ms
    EncodingBenchmark.benchLongRubyStringNewCharSequence       thrpt    5
    9239.360 ±  338.284  ops/ms
    EncodingBenchmark.benchLongRubyStringNewCharSequence2      thrpt    5
    4425.827 ±  246.294  ops/ms
    EncodingBenchmark.benchLongRubyStringNewCharSequence3      thrpt    5
    7661.631 ±  418.873  ops/ms
    EncodingBenchmark.benchShortRubyStringNew                  thrpt    5
    15875.130 ±  926.360  ops/ms
    EncodingBenchmark.benchShortRubyStringNewCharSequence      thrpt    5
    16137.382 ± 1024.177  ops/ms
    EncodingBenchmark.benchVeryLongRubyStringNew               thrpt    5
    1149.699 ±   27.375  ops/ms
    EncodingBenchmark.benchVeryLongRubyStringNewCharSequence   thrpt    5
    1982.773 ±  133.350  ops/ms
    EncodingBenchmark.benchVeryLongRubyStringNewCharSequence2  thrpt    5
    634.528 ±  224.842  ops/ms
    ```
    kares committed Jul 16, 2018
    Copy the full SHA
    e8f5eac View commit details
    Browse the repository at this point in the history
  5. when CharBuffer is to be encoded do not wrap it again

    ... makes no difference for micro-benchmarks but we at least won't copy
    char[] buffers around - its clearly a user intent to encode the buffer
    kares committed Jul 16, 2018
    Copy the full SHA
    5105163 View commit details
    Browse the repository at this point in the history