Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory pressure when sending large terms queries. #21776

Merged
merged 2 commits into from Nov 30, 2016

Conversation

Projects
None yet
2 participants
@jpountz
Copy link
Contributor

commented Nov 24, 2016

When users send large terms query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Nov 24, 2016

Here are two charts showing GC activity over a 8-minutes period before and after the change, when running a query that includes many parts, but in particular a terms query over ~32k longs. Both charts have been created under similar conditions. In the first case (master), major GCs are more frequent and minor GCs often take about 100ms while in the 2nd case (this PR), most of them run in less than 20ms.

baseline
patch

@jpountz jpountz force-pushed the jpountz:less_garbage branch 3 times, most recently Nov 29, 2016

Reduce memory pressure when sending large terms queries to Elasticsea…
…rch.

When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.

@jpountz jpountz force-pushed the jpountz:less_garbage branch to 985ec64 Nov 29, 2016

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Nov 29, 2016

Following @s1monw 's advice, I tried another approach that does the same thing on top of the Stream(In/Out)put layer and results look even better for a similar load:
screen

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Nov 29, 2016

@s1monw could you have a look?

@s1monw
Copy link
Contributor

left a comment

I left some suggestions but this looks way more contained... I like the change since it also applied to the values read from XContent on the coordinating node not just to the ones written via node to node communication. I think that might be the improvements we see?

core/src/main/java/org/elasticsearch/common/util/UnboxedArrayList.java Outdated
}

@Override
public Object remove(int i) {

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 29, 2016

Contributor

I don't think it should be mutable?

if (o instanceof BytesRef) {
b = (BytesRef) o;
} else {
builder.copyChars((String) o);

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 29, 2016

Contributor

can we just for safety call o.toString() instead of the cast?

This comment has been minimized.

Copy link
@jpountz

jpountz Nov 30, 2016

Author Contributor

My reasoning was that it was better to get an exception rather than generate weird terms if something else than a string or a bytesref would end up here, but I don't mind going with toString.

@@ -185,43 +192,108 @@ public String fieldName() {
}

public List<Object> values() {

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 29, 2016

Contributor

we might be able to make this pkg private - it's only used for testing

This comment has been minimized.

Copy link
@jpountz

jpountz Nov 30, 2016

Author Contributor

I think it needs to remain public since it is part of the public API of this class?

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 30, 2016

Contributor

fair enough not sure anybody needs to access this list :)

private static final Set<Class<?>> STRING_TYPES = new HashSet<>(
Arrays.asList(BytesRef.class, String.class));

private static List<?> convert(Iterable<?> values) {

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 29, 2016

Contributor

can we get some java docs on this just to make sure we don't loose the info why we did all this?

@@ -159,7 +166,7 @@ public TermsQueryBuilder(String fieldName, Iterable<?> values) {
throw new IllegalArgumentException("No value specified for terms query");
}
this.fieldName = fieldName;
this.values = convertToBytesRefListIfStringList(values);
this.values = convert(values);

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 29, 2016

Contributor

can we add some dedicated tests to TermsQueryBuilderTest that stresses this entire convertion a bit? also with mixed value lists like floats / longs etc mixed

@jpountz
Copy link
Contributor Author

left a comment

Thanks for having a look. I pushed a new commit.

@@ -185,43 +192,108 @@ public String fieldName() {
}

public List<Object> values() {

This comment has been minimized.

Copy link
@jpountz

jpountz Nov 30, 2016

Author Contributor

I think it needs to remain public since it is part of the public API of this class?

if (o instanceof BytesRef) {
b = (BytesRef) o;
} else {
builder.copyChars((String) o);

This comment has been minimized.

Copy link
@jpountz

jpountz Nov 30, 2016

Author Contributor

My reasoning was that it was better to get an exception rather than generate weird terms if something else than a string or a bytesref would end up here, but I don't mind going with toString.

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Nov 30, 2016

I like the change since it also applied to the values read from XContent on the coordinating node not just to the ones written via node to node communication

Actually the previous change also applied to xcontent parsing so I am not totally sure how to explain this improvement. Maybe the fact that all Long objects become unreachable at once rather than one by one, not sure.

@s1monw

s1monw approved these changes Nov 30, 2016

Copy link
Contributor

left a comment

LGTM

@@ -185,43 +192,108 @@ public String fieldName() {
}

public List<Object> values() {

This comment has been minimized.

Copy link
@s1monw

s1monw Nov 30, 2016

Contributor

fair enough not sure anybody needs to access this list :)

@jpountz jpountz merged commit a3ef674 into elastic:master Nov 30, 2016

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

@jpountz jpountz deleted the jpountz:less_garbage branch Nov 30, 2016

jpountz added a commit that referenced this pull request Nov 30, 2016

Reduce memory pressure when sending large terms queries. (#21776)
When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.

jpountz added a commit that referenced this pull request Dec 2, 2016

Reduce memory pressure when sending large terms queries. (#21776)
When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.

@jpountz jpountz added the v5.1.1 label Dec 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.