Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize query with types filter in the URL (t/t/_search) #20979

Merged
merged 6 commits into from Oct 20, 2016

Conversation

Projects
None yet
2 participants
@jimczi
Copy link
Member

commented Oct 17, 2016

The TypeQuery has an optimization when only one type is present in an index.
This optimization is used when a type query is explicitly used in the query (_type:t) but it is not leveraged when the type is in the URL (GET t/t/_search).
This change uses the TypeQuery when the type filter is built from the URL (GET t/t/_search).
Since the number of types is unbounded in a query this change also preserves the current behavior which uses a TermsQuery when the number of types to query is greater than a threshold (16 by default).

@jpountz
Copy link
Contributor

left a comment

Left one minor suggestion. LGTM

core/src/main/java/org/elasticsearch/search/DefaultSearchContext.java Outdated
for (int i = 0; i < typesBytes.length; i++) {
typesBytes[i] = new BytesRef(types[i]);
}
typesFilter = new TermsQuery(TypeFieldMapper.NAME, typesBytes);

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 17, 2016

Contributor

for simplicity, maybe we should create a TermsQuery all the time and rely on query rewriting?

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 17, 2016

Contributor

I suspect it's not easy but it would be nice to let the mappings create the query rather than doing it manually.

This comment has been minimized.

Copy link
@jimczi

jimczi Oct 18, 2016

Author Member

The rewrite of the TermsQuery ? IMO it should be possible to have a specialization of the TermsQuery which checks the frequency of the inner terms and rewrite automatically to a match_all or a match_no_docs query automatically. Is it what you propose here ? Another solution could be to do the same rewriting but directly in the TermQuery ? Though it might not be possible/optimized since we don't retrieve the doc frequency of each TermQuery (if scoring is not required for instance or if global frequency has been set beforehand).

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 18, 2016

Contributor

Oh sorry I had not understood the code correctly, my comment does not make sense.

a specialization of the TermsQuery which checks the frequency of the inner terms and rewrite automatically to a match_all or a match_no_docs query automatically

+1 to changing TypeQuery into a TypesQuery that works this way

@jimczi

This comment has been minimized.

Copy link
Member Author

commented Oct 19, 2016

@jpountz I've added the TypesQuery, can you take a look ?

@jpountz
Copy link
Contributor

left a comment

It looks good as-is, but I think we can do even better in the case that multiple types are provided. Feel free to either change it in this PR or work on it in a follow-up PR as you prefer.

core/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java Outdated
private final BytesRef[] types;

public TypesQuery(BytesRef... types) {
this.types = Objects.requireNonNull(types);

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 19, 2016

Contributor

maybe also check that there is at least one type?

core/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java Outdated
}
if (tq instanceof ConstantScoreQuery) {
bq.add(((ConstantScoreQuery) tq).getQuery(), BooleanClause.Occur.SHOULD);
}

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 19, 2016

Contributor

Actually maybe we can do better in a TypesQuery? For instance imagine there are two types foo and bar. If both types are in the list then we could rewrite as a match_all query?

I guess we could either check that the types here are a superset of the types returned by MapperService.types() or compute the sum of the doc freqs of each term and rewrite to a match_all query if the sum is equal to maxDoc?

Then if TypesQuery is able to do this, we do not need TypeQuery anymore and can use TypesQuery all the time?

@jimczi

This comment has been minimized.

Copy link
Member Author

commented Oct 19, 2016

Good idea @jpountz
I've added the heuristic based on the document frequency inside the reader since this reflects the status of the index rather than the status of the mapping (we can have multiple types defined in the mapping but no documents of these types in the index)

@jpountz
Copy link
Contributor

left a comment

LGTM

core/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java Outdated
return new MatchAllDocsQuery();
}
totalDocFreq += context.docFreq();
// strict equality should be enough ?

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 19, 2016

Contributor

maybe call assert totalDocFreq == reader.maxDoc() under the if statement?

This comment has been minimized.

Copy link
@jimczi

jimczi Oct 19, 2016

Author Member

Done

core/src/main/java/org/elasticsearch/index/mapper/TypeFieldMapper.java Outdated
// Using a match_all query will help Lucene perform some optimizations
// For instance, match_all queries as filter clauses are automatically removed
return new MatchAllDocsQuery();
}

This comment has been minimized.

Copy link
@jpountz

jpountz Oct 19, 2016

Contributor

maybe we can skip this one since we already have the check on the sum below?

This comment has been minimized.

Copy link
@jimczi

jimczi Oct 19, 2016

Author Member

yes I left it to make it more obvious that there are multiple cases but this is redundant. I'll skip it.

@jimczi

This comment has been minimized.

Copy link
Member Author

commented Oct 19, 2016

Thanks @jpountz
I'll merge soon

@jimczi jimczi added the v5.0.1 label Oct 20, 2016

jimczi added some commits Oct 17, 2016

Optimize query with types filter in the URL (t/t/_search)
The TypeQuery has an optimization when only one type is present in an index.
This optimization is used when a type query is explicitly used in the query (_type:t) but it is not leveraged when the type is in the URL (GET t/t/_search).
This change uses the TypeQuery when the type filter is built from the URL (GET t/t/_search).
Since the number of types is unbounded in a query this change also preserves the current behavior which uses a TermsQuery when the number of types to query is greater than a threshold (16 by default).
Remove TypeQuery and add an heuristic in TypesQuery that rewrites to …
…a MatchAllDoc query when all the types of an index are present in the query

@jimczi jimczi merged commit d0bbe89 into elastic:master Oct 20, 2016

1 of 2 checks passed

elasticsearch-ci Build started sha1 is merged.
Details
CLA Commit author is a member of Elasticsearch
Details

@jimczi jimczi deleted the jimczi:type_filter branch Oct 20, 2016

jimczi added a commit that referenced this pull request Oct 20, 2016

Optimize query with types filter in the URL (t/t/_search) (#20979)
This change adds a TypesQuery that checks if the disjunction of types should be rewritten to a MatchAllDocs query. The check is done only if the number of terms is below a threshold (16 by default and configurable via max_boolean_clause).
@jpountz

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

@jimferenczi I am wondering whether this change could mistakenly rewrite to a match_all query if a type is present multiple times in the types arrar? (its doc freq would be counted multiple times too?)

@jimczi

This comment has been minimized.

Copy link
Member Author

commented Oct 20, 2016

Good catch @jpountz
I though that the types are resolved against the mapping but since wildcards are not allowed there is no resolution at all and the types are left as is.

I pushed a fix in master and 5.x:
e04ee40

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Oct 20, 2016

The fix looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.