Lucene search performance can be greatly improved #493

Closed
mirkosertic opened this Issue Sep 7, 2013 · 8 comments

Comments

Projects
None yet
2 participants
@mirkosertic

There seems to ba a problem in AbstractLuceneQuery with QueryDSL 3.2.2 while using Lucene 4.4.0. Providing no Sort instances for no sorting queries is quite slow. If i replace the following line

scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset).scoreDocs;

with

scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset, new Sort()).scoreDocs;

The query is executed >8 times faster!!! It seems to be Lucene related, i already filed an issue on the usergroup. Till it is clear what is the cause for this i use the optimized version, the query results are the same.

@timowest

This comment has been minimized.

Show comment
Hide comment
@timowest

timowest Sep 8, 2013

Member

This looks like a bug on the Lucene side. Let me know when you find the reason.

Member

timowest commented Sep 8, 2013

This looks like a bug on the Lucene side. Let me know when you find the reason.

@mirkosertic

This comment has been minimized.

Show comment
Hide comment
@mirkosertic

mirkosertic Sep 8, 2013

I created the following thread:

http://www.gossamer-threads.com/lists/lucene/java-user/206728?do=post_view_threaded

From my point of view QueryDSL does not need Lucene scoring, it just need sorting in some cases. So probably the first methods involves sorting by score, the later sorting by field. So just providing an empty Sort() instance disables sorting by score, which greatly improves search speed.

I will track this...

I created the following thread:

http://www.gossamer-threads.com/lists/lucene/java-user/206728?do=post_view_threaded

From my point of view QueryDSL does not need Lucene scoring, it just need sorting in some cases. So probably the first methods involves sorting by score, the later sorting by field. So just providing an empty Sort() instance disables sorting by score, which greatly improves search speed.

I will track this...

@mirkosertic

This comment has been minimized.

Show comment
Hide comment
@mirkosertic

mirkosertic Sep 8, 2013

Seems like the default lucene sort order is by relevance, which includes intensive computation...

Seems like the default lucene sort order is by relevance, which includes intensive computation...

@mirkosertic

This comment has been minimized.

Show comment
Hide comment
@mirkosertic

mirkosertic Sep 9, 2013

Ok, according to Mike McCandless the solution should be in AbstractLuceneQuery:

            if (sort != null) {
                scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset, sort, false, false).scoreDocs;
            } else {
                scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset, Sort.INDEXORDER, false, false).scoreDocs;
            }

Ok, according to Mike McCandless the solution should be in AbstractLuceneQuery:

            if (sort != null) {
                scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset, sort, false, false).scoreDocs;
            } else {
                scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset, Sort.INDEXORDER, false, false).scoreDocs;
            }
@mirkosertic

This comment has been minimized.

Show comment
Hide comment
@mirkosertic

mirkosertic Sep 9, 2013

If no sorting is required, we could also use a custom Collector, this should be the fastest solution of all....

If no sorting is required, we could also use a custom Collector, this should be the fastest solution of all....

@timowest

This comment has been minimized.

Show comment
Hide comment
@timowest

timowest Sep 10, 2013

Member

What about the other search usage in AbstractLuceneQuery?

Member

timowest commented Sep 10, 2013

What about the other search usage in AbstractLuceneQuery?

@mirkosertic

This comment has been minimized.

Show comment
Hide comment
@mirkosertic

mirkosertic Sep 11, 2013

Ok, i only see the oneResult() method, and here i think the INDEXORDER sort order should also be used, so replacing

final ScoreDoc[] scoreDocs = searcher.search(createQuery(), filter, maxDoc).scoreDocs;

with

final ScoreDoc[] scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset, Sort.INDEXORDER, false, false).scoreDocs;

We are not using onrResult() or singleResult() in our usecases, but i think the current implementation will lead to performance problems under heavy load as well.

Ok, i only see the oneResult() method, and here i think the INDEXORDER sort order should also be used, so replacing

final ScoreDoc[] scoreDocs = searcher.search(createQuery(), filter, maxDoc).scoreDocs;

with

final ScoreDoc[] scoreDocs = searcher.search(createQuery(), filter, sumOfLimitAndOffset, Sort.INDEXORDER, false, false).scoreDocs;

We are not using onrResult() or singleResult() in our usecases, but i think the current implementation will lead to performance problems under heavy load as well.

timowest added a commit that referenced this issue Sep 11, 2013

@timowest

This comment has been minimized.

Show comment
Hide comment
@timowest

timowest Oct 20, 2013

Member

Released in 3.2.4

Member

timowest commented Oct 20, 2013

Released in 3.2.4

@timowest timowest closed this Oct 20, 2013

@timowest timowest added this to the 3.2.4 milestone Apr 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment