Unefficient use of io.netty.util.ByteProcessor$IndexOfProcessor #9499

comtel2000 · 2019-08-22T14:24:25Z

Expected behavior

ByteBufUtils.indexOf(..) or in my case ByteBuf$bytesBefore for any ByteBuf
should work as efficient as a simple for loop without object allocation for ByteProcessor$IndexOfProcessor

Actual behavior

Usage of ByteBuf$indexOf or ByteBuf$bytesBefore stress the garbage collector by allocate ByteProcessor$IndexOfProcessor for each call

By replace all indexOf and bytesBefore with simple for loops reduce the garabge in my App by 50%

Steps to reproduce

compare object allocation / memory footprint for:

AbstractByteBuf$bytesBefore(byte b)

and a simple:

    protected static int bytesBefore(ByteBuf in, byte b) {
      for (int i = in.readerIndex(); i < in.writerIndex(); i++) {
        if (in.getByte(i) == b) {
          return i - in.readerIndex();
        }
      }
      return -1;
    }

Minimal yet complete reproducer code (or URL to code)

Netty version

4.1.39-Final

JVM version (e.g. `java -version`)

1.8.212 x64

OS version (e.g. `uname -a`)

Windows 10

The text was updated successfully, but these errors were encountered:

normanmaurer · 2019-08-22T15:09:49Z

@comtel2000 that's a good one... we can definitely add some optimisation here.. Will open a pr shortly

Motivation: AbstractByteBuf.indexOf(...) currently delegates to ByteBufUtils.indexOf(...) which will create a new ByteBufProcessor on each call. This is done to reduce overhead of bounds-checks. Unfortunally while this reduces bounds checks it produces a lot of GC. We can just implement our own version in AbstractByteBuf which makes use of _getByte(...) and so does no bound checks as well but also not need to create any garbage. Modifications: Write optimized implementation of indexOf(...) for AbstractByteBuf Result: Fixes #9499.

normanmaurer · 2019-08-22T17:56:48Z

@comtel2000 PTAL #9502

Motivation: AbstractByteBuf.indexOf(...) currently delegates to ByteBufUtils.indexOf(...) which will create a new ByteBufProcessor on each call. This is done to reduce overhead of bounds-checks. Unfortunally while this reduces bounds checks it produces a lot of GC. We can just implement our own version in AbstractByteBuf which makes use of _getByte(...) and so does no bound checks as well but also not need to create any garbage. Modifications: Write optimized implementation of indexOf(...) for AbstractByteBuf Result: Fixes #9499.

normanmaurer mentioned this issue Aug 22, 2019

Reduce GC produced by AbstractByteBuf.indexOf(..) implementation #9502

Merged

normanmaurer added this to the 4.1.40.Final milestone Aug 22, 2019

normanmaurer closed this as completed in #9502 Aug 24, 2019

normanmaurer modified the milestones: 4.1.40.Final, 4.1.41.Final Sep 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unefficient use of io.netty.util.ByteProcessor$IndexOfProcessor #9499

Unefficient use of io.netty.util.ByteProcessor$IndexOfProcessor #9499

comtel2000 commented Aug 22, 2019

normanmaurer commented Aug 22, 2019

normanmaurer commented Aug 22, 2019

Unefficient use of io.netty.util.ByteProcessor$IndexOfProcessor #9499

Unefficient use of io.netty.util.ByteProcessor$IndexOfProcessor #9499

Comments

comtel2000 commented Aug 22, 2019

Expected behavior

Actual behavior

Steps to reproduce

Minimal yet complete reproducer code (or URL to code)

Netty version

JVM version (e.g. java -version)

OS version (e.g. uname -a)

normanmaurer commented Aug 22, 2019

normanmaurer commented Aug 22, 2019

JVM version (e.g. `java -version`)

OS version (e.g. `uname -a`)