Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unefficient use of io.netty.util.ByteProcessor$IndexOfProcessor #9499

Closed
comtel2000 opened this issue Aug 22, 2019 · 2 comments · Fixed by #9502
Closed

Unefficient use of io.netty.util.ByteProcessor$IndexOfProcessor #9499

comtel2000 opened this issue Aug 22, 2019 · 2 comments · Fixed by #9502
Milestone

Comments

@comtel2000
Copy link

Expected behavior

ByteBufUtils.indexOf(..) or in my case ByteBuf$bytesBefore for any ByteBuf
should work as efficient as a simple for loop without object allocation for ByteProcessor$IndexOfProcessor

Actual behavior

Usage of ByteBuf$indexOf or ByteBuf$bytesBefore stress the garbage collector by allocate ByteProcessor$IndexOfProcessor for each call

By replace all indexOf and bytesBefore with simple for loops reduce the garabge in my App by 50%

Steps to reproduce

compare object allocation / memory footprint for:

AbstractByteBuf$bytesBefore(byte b)

and a simple:

    protected static int bytesBefore(ByteBuf in, byte b) {
      for (int i = in.readerIndex(); i < in.writerIndex(); i++) {
        if (in.getByte(i) == b) {
          return i - in.readerIndex();
        }
      }
      return -1;
    }

Minimal yet complete reproducer code (or URL to code)

Netty version

4.1.39-Final

JVM version (e.g. java -version)

1.8.212 x64

OS version (e.g. uname -a)

Windows 10

@normanmaurer
Copy link
Member

@comtel2000 that's a good one... we can definitely add some optimisation here.. Will open a pr shortly

normanmaurer added a commit that referenced this issue Aug 22, 2019
Motivation:

AbstractByteBuf.indexOf(...) currently delegates to ByteBufUtils.indexOf(...) which will create a new ByteBufProcessor on each call. This is done to reduce overhead of bounds-checks. Unfortunally while this reduces bounds checks it produces a lot of GC. We can just implement our own version in AbstractByteBuf which makes use of _getByte(...) and so does no bound checks as well but also not need to create any garbage.

Modifications:

Write optimized implementation of indexOf(...) for AbstractByteBuf

Result:

Fixes #9499.
@normanmaurer
Copy link
Member

@comtel2000 PTAL #9502

@normanmaurer normanmaurer added this to the 4.1.40.Final milestone Aug 22, 2019
normanmaurer added a commit that referenced this issue Aug 24, 2019
Motivation:

AbstractByteBuf.indexOf(...) currently delegates to ByteBufUtils.indexOf(...) which will create a new ByteBufProcessor on each call. This is done to reduce overhead of bounds-checks. Unfortunally while this reduces bounds checks it produces a lot of GC. We can just implement our own version in AbstractByteBuf which makes use of _getByte(...) and so does no bound checks as well but also not need to create any garbage.

Modifications:

Write optimized implementation of indexOf(...) for AbstractByteBuf

Result:

Fixes #9499.
normanmaurer added a commit that referenced this issue Aug 24, 2019
Motivation:

AbstractByteBuf.indexOf(...) currently delegates to ByteBufUtils.indexOf(...) which will create a new ByteBufProcessor on each call. This is done to reduce overhead of bounds-checks. Unfortunally while this reduces bounds checks it produces a lot of GC. We can just implement our own version in AbstractByteBuf which makes use of _getByte(...) and so does no bound checks as well but also not need to create any garbage.

Modifications:

Write optimized implementation of indexOf(...) for AbstractByteBuf

Result:

Fixes #9499.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants