New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize filter removal #18

Merged
merged 16 commits into from Sep 25, 2018

Conversation

Projects
None yet
5 participants
@scottinet
Member

scottinet commented Sep 25, 2018

⚠️ You should wait until #17 is merged to review that one

Description

Filter removal has always been an issue with Koncorde, because of how I used TypedArrays to index subfilters and conditions. These arrays were necessary as Koncorde was written with Node.js 4, meaning that, at the time, Map/Set were either unavailable, or had subpar performances (I don't remember which).
The downside of typedarrays is that I had to write a complex reindexation mechanism. It was costly, too.

So, the aim of that PR is to simplify A LOT how filters, subfilters, conditions and keyword storages are indexed.
Test tables are removed, and Map/Set are now heavily used. Also, regular arrays are also replaced with either maps or sets, so we don't have to splice these anymore, which is costly and slow.

The benchmark tool has also been updated: it now cleans filter up after each keyword test. This hasn't been done before because the removal of large quantities of filters posed problems to Koncorde: it takes 12 to 20 seconds just to remove 10k filters, which is far too long.
With that PR, it takes 100ms to remove 10k filters, on the most expensive keyword (geopolygon)

How should this be manually tested?

Use that PR' benchmark tool. You can also use that version to test previous branches of Koncorde, the difference is striking.

Boyscout

Add LGTM badges to the README file

@scottinet

This comment has been minimized.

Show comment
Hide comment
@scottinet

scottinet Sep 25, 2018

Member

Just for fun, I ran the benchmark with 1 000 000 filters per keywords:

» node --max_old_space_size=8192 benchmark.js
Filter count per tested keyword: 1000000

> Benchmarking keyword: equals
	Indexation: time = 27.632s, mem = +2212MB
	Matching x 2,063,035 ops/sec ±1.16% (83 runs sampled)
	Filters removal: time = 2.162s

> Benchmarking keyword: exists
	Indexation: time = 51.874s, mem = +-7MB
	Matching x 1,273,604 ops/sec ±1.08% (90 runs sampled)
	Filters removal: time = 4.128s

> Benchmarking keyword: geoBoundingBox
	Indexation: time = 83.23s, mem = +621MB
	Matching x 222,023 ops/sec ±0.50% (90 runs sampled)
	Filters removal: time = 16.28s

> Benchmarking keyword: geoDistance
	Indexation: time = 135.244s, mem = +113MB
	Matching x 397,197 ops/sec ±0.96% (95 runs sampled)
	Filters removal: time = 17.932s

> Benchmarking keyword: geoDistanceRange
	Indexation: time = 197.662s, mem = +1431MB
	Matching x 508,669 ops/sec ±0.69% (85 runs sampled)
	Filters removal: time = 17.156s

> Benchmarking keyword: geoPolygon (10 vertices)
	Indexation: time = 152.354s, mem = +-150MB
	Matching x 174 ops/sec ±1.74% (80 runs sampled)
	Filters removal: time = 21.025s

> Benchmarking keyword: in (5 random values)
	Indexation: time = 316.394s, mem = +3609MB
	Matching x 416,351 ops/sec ±0.49% (66 runs sampled)
	Filters removal: time = 11.08s

> Benchmarking keyword: range (random bounds)
	Indexation: time = 35.759s, mem = +33MB
	Matching x 4,745 ops/sec ±3.68% (75 runs sampled)
	Filters removal: time = 0.224s
Member

scottinet commented Sep 25, 2018

Just for fun, I ran the benchmark with 1 000 000 filters per keywords:

» node --max_old_space_size=8192 benchmark.js
Filter count per tested keyword: 1000000

> Benchmarking keyword: equals
	Indexation: time = 27.632s, mem = +2212MB
	Matching x 2,063,035 ops/sec ±1.16% (83 runs sampled)
	Filters removal: time = 2.162s

> Benchmarking keyword: exists
	Indexation: time = 51.874s, mem = +-7MB
	Matching x 1,273,604 ops/sec ±1.08% (90 runs sampled)
	Filters removal: time = 4.128s

> Benchmarking keyword: geoBoundingBox
	Indexation: time = 83.23s, mem = +621MB
	Matching x 222,023 ops/sec ±0.50% (90 runs sampled)
	Filters removal: time = 16.28s

> Benchmarking keyword: geoDistance
	Indexation: time = 135.244s, mem = +113MB
	Matching x 397,197 ops/sec ±0.96% (95 runs sampled)
	Filters removal: time = 17.932s

> Benchmarking keyword: geoDistanceRange
	Indexation: time = 197.662s, mem = +1431MB
	Matching x 508,669 ops/sec ±0.69% (85 runs sampled)
	Filters removal: time = 17.156s

> Benchmarking keyword: geoPolygon (10 vertices)
	Indexation: time = 152.354s, mem = +-150MB
	Matching x 174 ops/sec ±1.74% (80 runs sampled)
	Filters removal: time = 21.025s

> Benchmarking keyword: in (5 random values)
	Indexation: time = 316.394s, mem = +3609MB
	Matching x 416,351 ops/sec ±0.49% (66 runs sampled)
	Filters removal: time = 11.08s

> Benchmarking keyword: range (random bounds)
	Indexation: time = 35.759s, mem = +33MB
	Matching x 4,745 ops/sec ±3.68% (75 runs sampled)
	Filters removal: time = 0.224s
@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Sep 25, 2018

Codecov Report

Merging #18 into 1-dev will increase coverage by 1.43%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##            1-dev      #18      +/-   ##
==========================================
+ Coverage   95.18%   96.62%   +1.43%     
==========================================
  Files          33       33              
  Lines        1081      978     -103     
==========================================
- Hits         1029      945      -84     
+ Misses         52       33      -19
Impacted Files Coverage Δ
lib/match/matchGeospatial.js 100% <100%> (ø) ⬆️
lib/storage/removeOperands.js 99.01% <100%> (+3.4%) ⬆️
lib/match/index.js 100% <100%> (ø) ⬆️
lib/storage/objects/subfilter.js 100% <100%> (ø) ⬆️
lib/match/testTables.js 100% <100%> (ø) ⬆️
lib/storage/objects/rangeCondition.js 100% <100%> (ø)
lib/storage/storeOperands.js 98.78% <100%> (-0.32%) ⬇️
lib/match/matchRange.js 100% <100%> (ø) ⬆️
lib/storage/objects/condition.js 100% <100%> (ø) ⬆️
lib/storage/index.js 98.18% <100%> (+8.3%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65b1cae...cdc648b. Read the comment docs.

codecov-io commented Sep 25, 2018

Codecov Report

Merging #18 into 1-dev will increase coverage by 1.43%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##            1-dev      #18      +/-   ##
==========================================
+ Coverage   95.18%   96.62%   +1.43%     
==========================================
  Files          33       33              
  Lines        1081      978     -103     
==========================================
- Hits         1029      945      -84     
+ Misses         52       33      -19
Impacted Files Coverage Δ
lib/match/matchGeospatial.js 100% <100%> (ø) ⬆️
lib/storage/removeOperands.js 99.01% <100%> (+3.4%) ⬆️
lib/match/index.js 100% <100%> (ø) ⬆️
lib/storage/objects/subfilter.js 100% <100%> (ø) ⬆️
lib/match/testTables.js 100% <100%> (ø) ⬆️
lib/storage/objects/rangeCondition.js 100% <100%> (ø)
lib/storage/storeOperands.js 98.78% <100%> (-0.32%) ⬇️
lib/match/matchRange.js 100% <100%> (ø) ⬆️
lib/storage/objects/condition.js 100% <100%> (ø) ⬆️
lib/storage/index.js 98.18% <100%> (+8.3%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65b1cae...cdc648b. Read the comment docs.

@Aschen

This comment has been minimized.

Show comment
Hide comment
@Aschen

Aschen Sep 25, 2018

Contributor

Insane !

Contributor

Aschen commented Sep 25, 2018

Insane !

scottinet added some commits Sep 25, 2018

@scottinet scottinet merged commit 7e168da into 1-dev Sep 25, 2018

3 checks passed

codecov/project 96.62% (+1.43%) compared to 65b1cae
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@scottinet scottinet deleted the optimize-filter-removal branch Sep 25, 2018

@scottinet scottinet referenced this pull request Sep 25, 2018

Merged

Release 1.2.0 #19

scottinet added a commit that referenced this pull request Sep 25, 2018

Release 1.2.0 (#19)
* dependencies update (#15)

* fix #13

* The keyword "exists" can now search for array values (#17)

* document array value syntax

* unit tests update + bugfixes

* fix broken test

* forgot to add the new file to git -_-

* use a Set to store field=>operand keys

* fix array value description

* add missing tests (oops)

* even better performances

* apply @amaret comment

* Optimize filter removal (#18)

* document array value syntax

* wip

* unit tests update + bugfixes

* fix broken test

* forgot to add the new file to git -_-

* use a Set to store field=>operand keys

* fix array value description

* add missing tests (oops)

* even better performances

* (wip)

* further simplification + unit tests updates

* modernize, simplify, optimize

* leftovers

* add LGTM badges

* Release 1.2.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment