Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
optimize has_child query when matching parent count is low #3190
Currently the has_child query loops over every single document of the parent type looking for the parents matched in child query. In situations where the child query only matches few parents, this loop is expensive.
I feel this loop can be eliminated or short-circuited early since we already know the matching parent ids (the keys of the uidToScore map).
I am currently testing a few approaches and will submit a PR when ready.
referenced this issue
Jun 20, 2013
I added a benchmark (ChildSearchShortCircutBenchmark) to test the short circuit mechanisms @mattweber came up with and the results look promising. The benchmark inserts 10M parent docs and a bunch of child docs. Child docs are inserted in groups from a group containing 1 doc to 1M docs (each group has double amount of child docs as previous group), this allows to show the performance impact between short circuits in place and no short circuits in place.
Test results without the short circuits:
Test results with the short circuit: