Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
HSEARCH-2764 Improve orchestration of Elasticsearch works #1500
Relevant JIRA ticket: https://hibernate.atlassian.net//browse/HSEARCH-2764
So, that's a big one... Sorry about that, but once again most of the changes are linked. I tried to organize the commits in such a way that they are easier to understand.
First, if you're dreaming about dramatic performance improvements, forget about it. Currently, Hibernate Search spends most of its time waiting for Elasticsearch to process works. And I mean this literally: threads are usually parked or blocked on a monitor, waiting for an Elasticsearch response, more than 90% of the time.
The last one, in particular, is made much easier by embracing the reactive nature of the underlying HTTP client: by submitting a response handler along with each request, instead of submitting a request and waiting for it to finish, we can achieve a high level of parallelism with a small number of threads. That's what the commits introducing CompletableFuture everywhere are about: making Elasticsearch work processing reactive.
Along the way, I had to tackle some technical issues, mainly because if I didn't, the performance tests would be skewed, or would simply crash. Both of the changes are about
A side effect of one of this PR is that synchronously-executed works from different threads, but relating to the same documents, are no longer executed out of order. That's because we now submit synchronous works for each index to a single queue, which is then processed serially.
Obviously it's not something that will improve performance per se, since we previously executed these works in parallel.
See commit "HSEARCH-2764 Make Elasticsearch non-stream orchestrators index-specific".
The overall architecture of Elasticsearch work processing is arguably more complex after this patch. I think that's counterbalanced by more thorough testing and more clearly separated responsibilities, but I'll let you be the judge of that.
Performance tests results
By running the performance tests we merged previously (with a few improvements, see #1492) before and after this patch, I got some numbers: https://docs.google.com/spreadsheets/d/14pLNWIewrCNr_pj_rzMyq6UTMf7lf9VBHMjY8b_gapU/edit#gid=2063776958
The interesting sheet is "nonbuggy-2017-07-28-2". Here is how to read it:
Note that due to what's probably a bug in JHM, the auxiliary counters are skewed. So please don't use the
Analysis of performance tests results
"query" method in concurrent tests
For concurrent benchmarks, the "query" threads and the "write" threads are competing for one resource: the Elasticsearch server in "default" mode, and the CPU in "black hole" mode.
Therefore, looking at the performance of the "query" benchmark is not very conclusive, because this performance will automatically degrade when write performance improves, and will automatically improve when write performance decreases.
In blackhole mode, we either have seemingly huge improvements or inconclusive results. I'd argue that the huge improvements are not conclusive either, because they mostly result from the last commit in this PR, where we remove the artificial 100ms delay between two async processing batches.
I think the only conclusion to draw from blackhole tests is that, be it before or after this PR, the internal processing in Hibernate Search is faster than the actual indexing by Elasticsearch by several orders of magnitude. Thus we shouldn't bother ourselves too much about raw code performance for now, at least not if our goal is to increase throughput.
Actual Elasticsearch server
In real-world conditions (with an actual Elasticsearch server), the numbers are more conclusive, and quite encouraging.
My comments below only look at the "worst case" improvements, i.e. the difference between (score before + error) and (score after - error); thus the improvements are probably better than what I mention here.
Performance improvements seem to be here, at least in the important cases (stream, and non-stream with
I see you might be a bit disappointed because of the figures you got, but this work might actually provide some dramatic improvements for people who have an higher-end configured ES server, or just a better tuned local network link.
See, that's great and exactly what people hope to get as "free lunch" when using a library like this.
The figures we get here are just not representative of what someone else will get from it - unless they use the same cheap AWS configuration. Still that doesn't make our figures irrelevant, they are an essential factor of this whole exercise which helps us to understand and acquire the knowledge to figure out what "smarter about what we ask Elasticsearch to do" requires... today and in the future.