feat: Make queries utilise secondary indexes #1925

islamaliev · 2023-10-04T07:23:24Z

Relevant issue(s)

Resolves #1555

Description

With this change the the secondary indexes are utilised during querying data.

A dedicated Indexer fetcher is implemented to perform fetching of values of indexed fields.

Now there is a separate filter package that houses a lot of methods for working with filters.

A new metric indexesFetched is introduced into @explain to provide information about how many indexes has been fetched.

It also includes an update to the testing framework to allow adding custom asserters.
The new ExplainResultsAsserter is used with this new feature.

Benchmark

I ran some query benchmarks with regular fetching and indexed fetching on 10k docs and here is the summary:

Time per Operation (ns/op)
Indexed: Took approximately 2,196,481 nanoseconds (~2.2 ms) per operation
Original: Took approximately 74,037,628 nanoseconds (~74 ms) per operation
Boost Factor: Indexed is approximately 33.7 times faster than the original.

Memory Allocation (B/op)
Indexed: Allocated approximately 1,053,403 bytes (~1 MB) per operation
Original: Allocated approximately 45,218,327 bytes (~45 MB) per operation
Boost Factor: Indexed uses approximately 42.9 times less memory than the original.

Number of Allocations (allocs/op)
Indexed: Made 13,331 memory allocations per operation
Original: Made 494,494 memory allocations per operation
Boost Factor: Indexed has approximately 37.1 times fewer memory allocations than the original.

Summary
The indexed approach is 33.7 times faster, uses 42.9 times less memory, and has 37.1 times fewer memory allocations compared to the original approach.

Tasks

I made sure the code is well commented, particularly hard-to-understand areas.
I made sure the repository-held documentation is changed accordingly.
I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

Integration tests

Specify the platform(s) on which this was tested:

MacOS

errors/defraError.go

AndrewSisley

todo: The purpose of indexes is performance gains on read, at the cost of performance on write (and storage space). I really think performance tests should be included in this PR.

I would like to see performance tests for non-indexed vs indexed queries, on both read and write. Without these we are not really testing that this code change is doing anything useful, and could be seen as an explain-feature or a simple refactor.

praise: What I have seen so far of the production code looks good, but I have left a few comments dotted about. Will finish my review a bit later once some of them have been addressed.

db/fetcher/indexer.go

planner/select.go

planner/scan.go

planner/planner.go

tests/integration/index/query_with_index_only_filter_test.go

islamaliev · 2023-10-04T17:04:34Z

I would like to see performance tests for non-indexed vs indexed queries, on both read and write. Without these we are not really testing that this code change is doing anything useful, and could be seen as an explain-feature or a simple refactor.

Do you mean adding tests to tests/bench?

AndrewSisley · 2023-10-04T18:24:21Z

Do you mean adding tests to tests/bench?

Adding to tests/bench would probably be the simplest for now, we could also assert on one being faster than the other (with appropriate protections against variation).

At the moment there is nothing that really tests the behaviour change that the users care about (query speed).

codecov · 2023-10-09T13:46:22Z

Codecov Report

Attention: 121 lines in your changes are missing coverage. Please review.

Comparison is base (bc4c704) 74.67% compared to head (869f867) 74.98%.

@@             Coverage Diff             @@
##           develop    #1925      +/-   ##
===========================================
+ Coverage    74.67%   74.98%   +0.30%     
===========================================
  Files          234      241       +7     
  Lines        23044    23616     +572     
===========================================
+ Hits         17208    17707     +499     
- Misses        4661     4709      +48     
- Partials      1175     1200      +25

Flag	Coverage Δ
all-tests	`74.98% <86.34%> (+0.30%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
client/index.go	`100.00% <100.00%> (ø)`
datastore/errors.go	`100.00% <100.00%> (ø)`
db/collection_index.go	`97.24% <100.00%> (+0.60%)`	⬆️
db/fetcher/fetcher.go	`76.54% <100.00%> (+0.23%)`	⬆️
db/index.go	`97.25% <100.00%> (-0.35%)`	⬇️
errors/defraError.go	`100.00% <100.00%> (ø)`
planner/datasource.go	`83.33% <100.00%> (-8.33%)`	⬇️
planner/filter/complex.go	`100.00% <ø> (ø)`
planner/filter/copy_field.go	`96.36% <100.00%> (+1.01%)`	⬆️
planner/filter/extract_properties.go	`100.00% <100.00%> (ø)`
... and 14 more

... and 7 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc4c704...869f867. Read the comment docs.

AndrewSisley

Submitting a few comments now, as they are pilling up a bit and I need a breather.

Overall it looks good, but the tests in particular need so work I think.

client/index.go

db/fetcher/indexer_iterators.go

tests/bench/fixtures/fixtures.go

tests/integration/utils2.go

tests/integration/index/query_performance_test.go

tests/integration/index/query_with_relation_filter_test.go

tests/integration/utils2.go

datastore/util.go

db/fetcher/encoded_doc.go

planner/filter/unwrap_relation.go

tests/bench/collection/simple_create_many_test.go

fredcarle

LGTM. Nice work Islam. Just one minor thought on the extra param ion the bench tests.

fredcarle

LGTM. Nice work Islam. Just one minor thought on the extra param ion the bench tests.

Note: Make sure all conversations with Andy are resolved before merging.

db/fetcher/indexer_iterators.go

AndrewSisley

LGTM, please resolve the remaining conversations before merge though.

I have a request for future PRs too - when resolving a conversation, can you please comment in the conversation/thread what you have done to resolve it. It makes it easier to review, as I don't have to then go and re-find the location (which can sometimes be tricky due to github/other changes), and then figure out what has been done to resolve it (also sometimes tricky and occasionally requiring guesswork).

commit c8bde64 Author: Islam Aliev <aliev.islam@gmail.com> Date: Sat Oct 14 00:26:22 2023 +0200 feat: Make queries utilise secondary indexes (sourcenetwork#1925) ## Relevant issue(s) Resolves sourcenetwork#1555 ## Description With this change the the secondary indexes are utilised during querying data. A dedicated `Indexer` fetcher is implemented to perform fetching of values of indexed fields. Now there is a separate `filter` package that houses a lot of methods for working with filters. A new metric `indexesFetched` is introduced into `@explain` to provide information about how many indexes has been fetched. It also includes an update to the testing framework to allow adding custom asserters. The new ExplainResultsAsserter is used with this new feature.

## Relevant issue(s) Resolves sourcenetwork#1555 ## Description With this change the the secondary indexes are utilised during querying data. A dedicated `Indexer` fetcher is implemented to perform fetching of values of indexed fields. Now there is a separate `filter` package that houses a lot of methods for working with filters. A new metric `indexesFetched` is introduced into `@explain` to provide information about how many indexes has been fetched. It also includes an update to the testing framework to allow adding custom asserters. The new ExplainResultsAsserter is used with this new feature.

islamaliev requested a review from AndrewSisley October 4, 2023 11:22

islamaliev self-assigned this Oct 4, 2023

islamaliev added feature New feature or request area/query Related to the query component perf Performance issue or suggestion labels Oct 4, 2023

islamaliev added this to the DefraDB v0.8 milestone Oct 4, 2023

islamaliev commented Oct 4, 2023

View reviewed changes

errors/defraError.go Show resolved Hide resolved

AndrewSisley requested changes Oct 4, 2023

View reviewed changes

db/fetcher/indexer.go Show resolved Hide resolved

planner/select.go Show resolved Hide resolved

planner/scan.go Outdated Show resolved Hide resolved

planner/planner.go Outdated Show resolved Hide resolved

tests/integration/index/query_with_index_only_filter_test.go Show resolved Hide resolved

islamaliev force-pushed the feat/islam/query-with-secondary-indexes branch from 8b682c0 to 189f586 Compare October 5, 2023 15:46

islamaliev force-pushed the feat/islam/query-with-secondary-indexes branch from 4ad4751 to 4323b69 Compare October 10, 2023 14:14

AndrewSisley requested changes Oct 10, 2023

View reviewed changes

AndrewSisley reviewed Oct 10, 2023

View reviewed changes

tests/integration/utils2.go Outdated Show resolved Hide resolved

fredcarle reviewed Oct 10, 2023

View reviewed changes

datastore/util.go Outdated Show resolved Hide resolved

fredcarle reviewed Oct 10, 2023

View reviewed changes

datastore/util.go Outdated Show resolved Hide resolved

fredcarle reviewed Oct 10, 2023

View reviewed changes

datastore/util.go Outdated Show resolved Hide resolved

fredcarle reviewed Oct 10, 2023

View reviewed changes

db/fetcher/encoded_doc.go Outdated Show resolved Hide resolved

islamaliev added 11 commits October 11, 2023 12:18

Enable custom asserters for integration tests

38c5572

Add simple integration test for fetching with index

4d0cccf

Extract common operations

3c8c353

Make collection read only by db

c07d3d5

Add first primitive indexer version

2bc0730

fetch indexed value one by one

f623666

Create FetcherSwitcher

7e9857e

Refactor tests

739ed11

Add test with _qt filter

20a34b1

Create index iterator

e9264b8

Implement gt index search

b9d6927

islamaliev added 11 commits October 11, 2023 17:43

Make index filter return error

d7c2784

Pass schema to CollectIndexedFields

3b31a0a

Add test for _like "pref%suf"

504c5d3

Add test for _like with double %

35ce118

Add doc

1b9b2a5

Polish switch statement

4571107

Refactor bench options

ebc5e0c

Polish

cc860d3

Add comment

9c6b242

Rename

0f87637

Polish

45ed665

fredcarle reviewed Oct 12, 2023

View reviewed changes

planner/filter/unwrap_relation.go Show resolved Hide resolved

islamaliev added 2 commits October 13, 2023 08:11

Polish

dd114e2

Make comparison bench tests local to framework

c812f3e

islamaliev requested a review from AndrewSisley October 13, 2023 13:04

fredcarle reviewed Oct 13, 2023

View reviewed changes

tests/bench/collection/simple_create_many_test.go Outdated Show resolved Hide resolved

fredcarle approved these changes Oct 13, 2023

View reviewed changes

AndrewSisley reviewed Oct 13, 2023

View reviewed changes

db/fetcher/indexer_iterators.go Show resolved Hide resolved

AndrewSisley approved these changes Oct 13, 2023

View reviewed changes

islamaliev added 6 commits October 13, 2023 23:38

Add an error

5cc6f29

Remove optional nil

17297b3

Assert errors for bench action

0ac0a75

Add doc

82934ea

Fix lint

db8f075

Adjust bench

869f867

islamaliev merged commit c8bde64 into develop Oct 13, 2023
30 checks passed

islamaliev deleted the feat/islam/query-with-secondary-indexes branch October 13, 2023 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Make queries utilise secondary indexes #1925

feat: Make queries utilise secondary indexes #1925

islamaliev commented Oct 4, 2023 •

edited

AndrewSisley left a comment •

edited

islamaliev commented Oct 4, 2023

AndrewSisley commented Oct 4, 2023

codecov bot commented Oct 9, 2023 •

edited

AndrewSisley left a comment

fredcarle left a comment

fredcarle left a comment •

edited

AndrewSisley left a comment

feat: Make queries utilise secondary indexes #1925

feat: Make queries utilise secondary indexes #1925

Conversation

islamaliev commented Oct 4, 2023 • edited

Relevant issue(s)

Description

Benchmark

Tasks

How has this been tested?

AndrewSisley left a comment • edited

Choose a reason for hiding this comment

islamaliev commented Oct 4, 2023

AndrewSisley commented Oct 4, 2023

codecov bot commented Oct 9, 2023 • edited

Codecov Report

AndrewSisley left a comment

Choose a reason for hiding this comment

fredcarle left a comment

Choose a reason for hiding this comment

fredcarle left a comment • edited

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

islamaliev commented Oct 4, 2023 •

edited

AndrewSisley left a comment •

edited

codecov bot commented Oct 9, 2023 •

edited

fredcarle left a comment •

edited