Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Currently, when a query is submitted, we retrieve the postings lists corresponding to each query word and store them in the same memory allocation. Once all of the postings lists have been retrieved (25.63ms):
All of these sorting and allocating actions (a total of 82.41ms) are done for the whole accumulated
To give some context about the current criteria running in the engine and their needs in terms of data setup:
For information, 83470 documents were nominated candidate for the bucket sort, or 81.8% of the total number of indexed documents. The numbers written in parentheses after each of the criteria above indicates the number of documents to be given to the next criteria, like filtering. In other words, the first criteria filtered 91.6% of the candidates documents, keeping 8.3%.
I think that all of the preparation work before the bucket sort can just be made when necessary, avoiding the multi-words rewrite, because only the
So the general idea behind this issue is to introduce a
I would like to see a minimum of a 70% query time reduction. Moving from 103.24ms to something like 30.9ms for this example.
I have made some progress on this issue, I achieved something between a 27x to a 3.5x search query time reduction
It is related to the Proximity criterion that needs to prepare the documents query matches. In the first query it takes 47.02ms and 31.53ms to evaluate 26998 documents. In the second there is only 5021 documents to evaluate and therefore the preparation and evaluation only take 11.22ms and 2.71ms respectively.
I need to work on that, I don't really have any idea currently but I will probably try to reuse the
Detail for request
This PR changes the name of the ranking rules the following way :