New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Java no-op Filter decreases performance by ~ 83% (!!!) #8010

Open
mkurz opened this Issue Nov 13, 2017 · 11 comments

Comments

Projects
None yet
6 participants
@mkurz
Member

mkurz commented Nov 13, 2017

This is really bad.

play.mvc.Filter has a serious performance issue.

While working on TechEmpower/FrameworkBenchmarks#3061 I discovered a massive performance decrease as soon as you add a simple no-op filter to a play-java application.
When I was running the plaintext benchmark with all filters disabled (play.filters.enabled = [ ]) that were the results I was getting:

1

I now added following simple no-op filter via play.filters.enabled = [ filters.HeaderFilter ]:

public class HeaderFilter extends Filter {

    @Inject
    public HeaderFilter(final Materializer mat) {
        super(mat);
    }

    @Override
    public CompletionStage<Result> apply(final Function<Http.RequestHeader, CompletionStage<Result>> nextFilter, final Http.RequestHeader requestHeader) {
        return nextFilter.apply(requestHeader);
    }
}

And that were the results I was getting:
3

That is a performance decrease of around 83 percent - constant, for each test case.

I tried to track down the problem, so I went ahead and instead of extending play.mvc.Filter I now directly extended its superclass play.mvc.EssentialFilter instead:

public class EssentialHeaderFilter extends EssentialFilter {

    private final Executor executor;

    @Inject
    public EssentialHeaderFilter(final Executor executor) {
        super();
        this.executor = executor;
    }

    @Override
    public EssentialAction apply(final EssentialAction next) {
        return EssentialAction.of(request -> {
            final Accumulator<ByteString, Result> accumulator = next.apply(request);
            return accumulator.map(result -> result, executor);
        });
    }

}

Now the results look much better:
4

There is still a couple percent performance decrease, but acceptable.
Also this is very similar to the results I was getting when annotating the action method with a simple no-op @Headers annotation:

public class Headers extends Action.Simple {

    @Override
    public CompletionStage<Result> call(final Http.Context context) {
        return delegate.call(context);
    }
}

2

BTW: I can reproduce all of this, I even restarted my machine and re-run the benchmarks, very similar results again...

@mkurz

This comment has been minimized.

Show comment
Hide comment
@mkurz

mkurz Nov 13, 2017

Member

If you want to try yourself you can git check-out my slowFilter branch here containing the above examples: mkurz/FrameworkBenchmarks@ffb49e1

Member

mkurz commented Nov 13, 2017

If you want to try yourself you can git check-out my slowFilter branch here containing the above examples: mkurz/FrameworkBenchmarks@ffb49e1

@wsargent

This comment has been minimized.

Show comment
Hide comment
@wsargent

wsargent Nov 13, 2017

Member

Can you see if those results are repeatable with EssentialFilter, rather than Filter? I think the Materializer is at fault here.

EDIT: okay, you already did that. Yes, materialization is expensive. There are some passthroughs you can do, but the upfront cost is a doozy.

Member

wsargent commented Nov 13, 2017

Can you see if those results are repeatable with EssentialFilter, rather than Filter? I think the Materializer is at fault here.

EDIT: okay, you already did that. Yes, materialization is expensive. There are some passthroughs you can do, but the upfront cost is a doozy.

@gmethvin

This comment has been minimized.

Show comment
Hide comment
@gmethvin

gmethvin Nov 13, 2017

Member

For reference, here are some test results with wrk on my machine:

with filter:

wrk -t12 -c400 -d30s http://localhost:9000/json
Running 30s test @ http://localhost:9000/json
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.48ms   42.57ms   1.12s    99.09%
    Req/Sec     5.43k     1.30k   21.99k    95.39%
  1922227 requests in 30.09s, 274.98MB read
Requests/sec:  63875.01
Transfer/sec:      9.14MB

without filter:

wrk -t12 -c400 -d30s http://localhost:9000/json
Running 30s test @ http://localhost:9000/json
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.31ms   70.36ms   1.66s    98.58%
    Req/Sec    10.47k     1.59k   36.13k    94.56%
  3678966 requests in 30.10s, 526.28MB read
Requests/sec: 122225.32
Transfer/sec:     17.48MB

So it's about a 50% drop here. I suspect it's very dependent on system resources as well.

We knew there was a performance drop associated with Filter because it has to materialize the stream and introduces some extra context switches, but that drop seems larger than we'd noticed before. The overhead of converting the Java filter to Scala may also be a factor.

Member

gmethvin commented Nov 13, 2017

For reference, here are some test results with wrk on my machine:

with filter:

wrk -t12 -c400 -d30s http://localhost:9000/json
Running 30s test @ http://localhost:9000/json
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.48ms   42.57ms   1.12s    99.09%
    Req/Sec     5.43k     1.30k   21.99k    95.39%
  1922227 requests in 30.09s, 274.98MB read
Requests/sec:  63875.01
Transfer/sec:      9.14MB

without filter:

wrk -t12 -c400 -d30s http://localhost:9000/json
Running 30s test @ http://localhost:9000/json
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.31ms   70.36ms   1.66s    98.58%
    Req/Sec    10.47k     1.59k   36.13k    94.56%
  3678966 requests in 30.10s, 526.28MB read
Requests/sec: 122225.32
Transfer/sec:     17.48MB

So it's about a 50% drop here. I suspect it's very dependent on system resources as well.

We knew there was a performance drop associated with Filter because it has to materialize the stream and introduces some extra context switches, but that drop seems larger than we'd noticed before. The overhead of converting the Java filter to Scala may also be a factor.

@gmethvin

This comment has been minimized.

Show comment
Hide comment
@gmethvin

gmethvin Nov 14, 2017

Member

At the very least we should note in the documentation that there is a performance penalty associated with using Filter and that EssentialFilter is preferred if that fits your use case.

Member

gmethvin commented Nov 14, 2017

At the very least we should note in the documentation that there is a performance penalty associated with using Filter and that EssentialFilter is preferred if that fits your use case.

@richdougherty

This comment has been minimized.

Show comment
Hide comment
@richdougherty

richdougherty Nov 16, 2017

Member

It would be great to add a test for this into the microbenchmark test suite so we can easily debug and track this issue.

I see this line inside the Filter code:

implicit val ec = mat.executionContext

Can you try changing that to use a trampoline instead?

import play.core.Execution.Implicits.trampoline
Member

richdougherty commented Nov 16, 2017

It would be great to add a test for this into the microbenchmark test suite so we can easily debug and track this issue.

I see this line inside the Filter code:

implicit val ec = mat.executionContext

Can you try changing that to use a trampoline instead?

import play.core.Execution.Implicits.trampoline
@mkurz

This comment has been minimized.

Show comment
Hide comment
@mkurz

mkurz Nov 16, 2017

Member

@richdougherty I will try asap.

Member

mkurz commented Nov 16, 2017

@richdougherty I will try asap.

@mkurz

This comment has been minimized.

Show comment
Hide comment
@mkurz

mkurz Nov 18, 2017

Member

@richdougherty I tried your suggestion now but it didn't help at all. The performance is still ~ 83 % worse on my machine... 😢 That's what I changed: mkurz@8cddcab

Any other idea(s)?

Member

mkurz commented Nov 18, 2017

@richdougherty I tried your suggestion now but it didn't help at all. The performance is still ~ 83 % worse on my machine... 😢 That's what I changed: mkurz@8cddcab

Any other idea(s)?

@wsargent

This comment has been minimized.

Show comment
Hide comment
@wsargent

wsargent Nov 18, 2017

Member

Last time I touched this was pass through materialization akka/akka#22587

Member

wsargent commented Nov 18, 2017

Last time I touched this was pass through materialization akka/akka#22587

@richdougherty

This comment has been minimized.

Show comment
Hide comment
@richdougherty

richdougherty Nov 19, 2017

Member

@wsargent, so basically there's an 'async boundary' inside the Akka streams code?

Member

richdougherty commented Nov 19, 2017

@wsargent, so basically there's an 'async boundary' inside the Akka streams code?

@schmitch

This comment has been minimized.

Show comment
Hide comment
@schmitch

schmitch Nov 22, 2017

Member

Actually this might solve it (jropers, Restart Sinks gave me the idea):
Quick and dirty https://github.com/playframework/playframework/compare/master...schmitch:issue-8010?expand=1 (untested) (looks like it still has a bug)

Edit: no matter how I change it, it will still be worse than the actual FlattenAccumulator..
Hm with the latest commit I might be a little bit faster than the worst, case... combining the new Sink + Trampoline EC would be cool if somebody else can test it. (it's hard to get good results, even after some tries + warm vm)

Member

schmitch commented Nov 22, 2017

Actually this might solve it (jropers, Restart Sinks gave me the idea):
Quick and dirty https://github.com/playframework/playframework/compare/master...schmitch:issue-8010?expand=1 (untested) (looks like it still has a bug)

Edit: no matter how I change it, it will still be worse than the actual FlattenAccumulator..
Hm with the latest commit I might be a little bit faster than the worst, case... combining the new Sink + Trampoline EC would be cool if somebody else can test it. (it's hard to get good results, even after some tries + warm vm)

@patrickliangli

This comment has been minimized.

Show comment
Hide comment
@patrickliangli

patrickliangli Jun 4, 2018

Any updates on this issue? While it is possible to change our own code to use EssentialFilter, several libraries/plugins of Play make use of Filter, and not as easy to change.

patrickliangli commented Jun 4, 2018

Any updates on this issue? While it is possible to change our own code to use EssentialFilter, several libraries/plugins of Play make use of Filter, and not as easy to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment