Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize filter removal #18

Merged
merged 16 commits into from Sep 25, 2018
Merged
42 changes: 26 additions & 16 deletions README.md
@@ -1,5 +1,7 @@
[![Build Status](https://travis-ci.org/kuzzleio/koncorde.svg?branch=master)](https://travis-ci.org/kuzzleio/koncorde)
[![Codecov](http://codecov.io/github/kuzzleio/koncorde/coverage.svg?branch=master)](http://codecov.io/github/kuzzleio/koncorde?branch=master)
[![Code Quality: Javascript](https://img.shields.io/lgtm/grade/javascript/g/kuzzleio/koncorde.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/kuzzleio/koncorde/context:javascript)
[![Total Alerts](https://img.shields.io/lgtm/alerts/g/kuzzleio/koncorde.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/kuzzleio/koncorde/alerts)

# Koncorde

Expand Down Expand Up @@ -1379,36 +1381,44 @@ The following results are obtained running `node benchmark.js` at the root of th
Filter count per tested keyword: 10000

> Benchmarking keyword: equals
Registration: time = 0.453s, mem = +40MB
Matching x 3,444,291 ops/sec ±0.83% (94 runs sampled)
Indexation: time = 0.435s, mem = +41MB
Matching x 4,006,895 ops/sec ±0.35% (97 runs sampled)
Filters removal: time = 0.02s

> Benchmarking keyword: exists
Registration: time = 0.518s, mem = +12MB
Matching x 1,953,425 ops/sec ±0.68% (94 runs sampled)
Indexation: time = 0.487s, mem = +-2MB
Matching x 2,449,897 ops/sec ±0.95% (97 runs sampled)
Filters removal: time = 0.023s

> Benchmarking keyword: geoBoundingBox
Registration: time = 0.936s, mem = +17MB
Matching x 1,234,466 ops/sec ±0.50% (94 runs sampled)
Indexation: time = 0.751s, mem = +14MB
Matching x 1,339,779 ops/sec ±0.21% (95 runs sampled)
Filters removal: time = 0.096s

> Benchmarking keyword: geoDistance
Registration: time = 1.25s, mem = +16MB
Matching x 1,255,571 ops/sec ±0.84% (97 runs sampled)
Indexation: time = 1.254s, mem = +6MB
Matching x 1,226,643 ops/sec ±0.73% (92 runs sampled)
Filters removal: time = 0.093s

> Benchmarking keyword: geoDistanceRange
Registration: time = 1.857s, mem = +12MB
Matching x 1,338,788 ops/sec ±0.77% (93 runs sampled)
Indexation: time = 1.762s, mem = +-10MB
Matching x 1,199,081 ops/sec ±0.26% (96 runs sampled)
Filters removal: time = 0.088s

> Benchmarking keyword: geoPolygon (10 vertices)
Registration: time = 1.148s, mem = +21MB
Matching x 52,636 ops/sec ±0.16% (97 runs sampled)
Indexation: time = 1.184s, mem = +1MB
Matching x 53,395 ops/sec ±0.95% (96 runs sampled)
Filters removal: time = 0.103s

> Benchmarking keyword: in (5 random values)
Registration: time = 1.554s, mem = +61MB
Matching x 1,782,624 ops/sec ±0.25% (96 runs sampled)
Indexation: time = 1.417s, mem = +40MB
Matching x 2,086,572 ops/sec ±2.02% (92 runs sampled)
Filters removal: time = 0.058s

> Benchmarking keyword: range (random bounds)
Registration: time = 0.41s, mem = +17MB
Matching x 31,933 ops/sec ±13.76% (92 runs sampled)
Indexation: time = 0.407s, mem = +-140MB
Matching x 38,611 ops/sec ±0.32% (95 runs sampled)
Filters removal: time = 0.064s
```

_(results obtained with node v10.2.1)_
17 changes: 15 additions & 2 deletions benchmark.js
Expand Up @@ -18,6 +18,7 @@ const
int: Random.integer(-10000, 10000)
};

let filters = [];
const koncorde = new Koncorde();

const matching = (name, document) => {
Expand All @@ -29,10 +30,22 @@ const matching = (name, document) => {
})
.on('cycle', event => {
console.log(String(event.target));
removeFilters();
})
.run({async: false});
};

function removeFilters() {
const removalStart = Date.now();

for (const filter of filters) {
koncorde.remove(filter);
}

filters = [];
console.log(`\tFilters removal: time = ${(Date.now() - removalStart)/1000}s`);
}

const test = Bluebird.coroutine(function *_register(name, generator, document) {
let i,
filterStartTime,
Expand All @@ -46,11 +59,11 @@ const test = Bluebird.coroutine(function *_register(name, generator, document) {
for (i = 0;i < max; i++) {
// Using the filter name as a collection to isolate
// benchmark calculation per keyword
yield koncorde.register('i', name, generator());
filters.push((yield koncorde.register('i', name, generator())).id);
}

filterEndTime = (Date.now() - filterStartTime) / 1000;
console.log(`\tRegistration: time = ${filterEndTime}s, mem = +${Math.round((v8.getHeapStatistics().total_heap_size - baseHeap) / 1024 / 1024)}MB`);
console.log(`\tIndexation: time = ${filterEndTime}s, mem = +${Math.round((v8.getHeapStatistics().total_heap_size - baseHeap) / 1024 / 1024)}MB`);

matching(name, document);
});
Expand Down
7 changes: 2 additions & 5 deletions lib/README.md
Expand Up @@ -24,10 +24,8 @@ The canonicalized filter is split and its parts are stored in different structur
- `storage.subfilters` provides a bidirectional link between a subfilter, its associated filters, and its associated conditions
- `storage.conditions` provides a link between a condition and its associated subfilters. It also contains the condition's value

Once stored, filters are indexed:

- `storage.foPairs` regroups all conditions associated to a field-operand pair. It means that, for instance, all "equals" condition on a field "field" are regrouped and stored together. The way these values are stored closely depends on the corresponding operand (for instance, "range" operands use a specific augmented AVL tree, while geospatial operands use a R\* tree)
- `storage.testTables` is the index allowing to efficiently track how many conditions a given subfilter has validated. This structure is the most important part of the matching mechanism (performance-wise) as it allows to very quickly check if a subfilter is completely matched and what filters should be returned for a given document.
Once stored, filters are indexed in the `storage.foPairs` structure, regrouping all conditions associated to a field-operand pair.
It means that, for instance, all "equals" condition on a field "field" are regrouped and stored together. The way these values are stored closely depends on the corresponding operand (for instance, "range" operands use a specific augmented AVL tree, while geospatial operands use a R\* tree)

## Matching

Expand All @@ -41,4 +39,3 @@ The way each field-operand pair performs its match depends closely on the keywor
## Deleting a filter

When a filter gets deleted, the filters, subfilters, conditions and field-operand structures are cleaned up.
The indexes are left alone, unless more than 10% of the referenced subfilters have been deleted. If so, an index rebuild is triggered. This allow mutualizing the cost of rebuilding the indexes.
11 changes: 5 additions & 6 deletions lib/index.js
Expand Up @@ -81,10 +81,10 @@ class Koncorde {

/**
* Returns an optimized version of the provided filter, with
* its associated filter unique ID.
* its associated filter unique ID.
* Does not store anything in the DSL structures
* The returned object can either be used with store(), or discarded.
*
*
* @param {string} index index
* @param {[type]} collection collection
* @param {[type]} filter filter
Expand All @@ -95,15 +95,15 @@ class Koncorde {
.then(normalized => ({
index,
collection,
normalized,
normalized,
id: this.storage.getFilterId(index, collection, normalized)
}));
}

/**
* Stores a normalized filter into this DSL structures.
* A normalized filter is obtained using a call to normalize()
*
*
* @param {Object} normalized Obtained with a call to normalize()
* @return {{diff: Object, id: String}}
*/
Expand All @@ -130,7 +130,7 @@ class Koncorde {
* @returns {Array} Array of matching filter IDs
*/
getFilterIds(index, collection) {
return this.exists(index, collection) ? this.storage.filtersIndex[index][collection] : [];
return this.exists(index, collection) ? Array.from(this.storage.filtersIndex[index][collection]) : [];
}

/**
Expand All @@ -155,7 +155,6 @@ class Koncorde {
* Removes all references to a given filter from the real-time engine
*
* @param {string} filterId - ID of the filter to remove
* @returns {Promise}
*/
remove(filterId) {
return this.storage.remove(filterId);
Expand Down
7 changes: 4 additions & 3 deletions lib/match/index.js
Expand Up @@ -57,11 +57,12 @@ class Matcher {
* @return {Array}
*/
match(index, collection, data) {
const testTables = new TestTables(this.store.testTables, index, collection);
const testTables = new TestTables();

for (const matcher of this.matchers) {
if (this.store.foPairs[index][collection][matcher[0]]) {
matcher[1](this.store.foPairs[index][collection][matcher[0]], testTables, data);
const matcherStorage = this.store.foPairs[index][collection].get(matcher[0]);
if (matcherStorage) {
matcher[1](matcherStorage, testTables, data);
}
}

Expand Down
2 changes: 1 addition & 1 deletion lib/match/matchGeospatial.js
Expand Up @@ -50,7 +50,7 @@ function MatchGeospatial (storage, testTables, document) {
const result = storage.custom.index.queryPoint(point.lat, point.lon);

for(j = 0; j < result.length; j++) {
testTables.addMatch(storage.fields[key][result[j]]);
testTables.addMatch(storage.fields[key].get(result[j]));
}
}
}
Expand Down
11 changes: 9 additions & 2 deletions lib/match/matchRange.js
Expand Up @@ -33,10 +33,17 @@
*/
function MatchRange (storage, testTables, document, not = false) {
for (const key of storage.keys) {
let rangeConditions;
if (typeof document[key] === 'number') {
testTables.addMatch(storage.fields[key].tree.search(document[key], document[key]));
rangeConditions = storage.fields[key].tree.search(document[key], document[key]);
} else if (not) {
testTables.addMatch(storage.fields[key].tree.search(-Infinity, Infinity));
rangeConditions = storage.fields[key].conditions.values();
}

if (rangeConditions !== undefined) {
for (const cond of rangeConditions) {
testTables.addMatch(cond.subfilters);
}
}
}
}
Expand Down
32 changes: 9 additions & 23 deletions lib/match/testTables.js
Expand Up @@ -42,42 +42,28 @@
* @param collection
*/
class TestTables {
constructor(testTablesRef, index, collection) {
this.conditions = testTablesRef[index][collection].conditions;
constructor() {
this.matchedConditions = {};
this.matched = {};
}

/**
* Registers a matching subfilters in the test tables
*
* @param {Array} subfilters - array of matching subfilters
* @param {Set} subfilters - matching subfilters
*/
addMatch(subfilters) {
// Declaring "i" inside the "for" statement downgrades
// performances by a factor of 3 to 4
// Should be fixed in later V8 versions
// (tested on Node 6.9.x)
let i; // NOSONAR

for (i = 0; i < subfilters.length; i++) {
const sf = subfilters[i];
const matched = this.matchedConditions[sf.cidx] || this.conditions[sf.cidx];
subfilters.forEach(sf => {
const matched = this.matchedConditions[sf.id] || sf.conditions.size;

if (matched > 1) {
this.matchedConditions[sf.cidx] = matched - 1;
this.matchedConditions[sf.id] = matched - 1;
} else {
// Declaring "j" inside the "for" statement downgrades
// performances by a factor of 3 to 4
// Should be fixed in later V8 versions
// (tested on Node 6.9.x)
let j; // NOSONAR

for (j = 0; j < sf.filters.length; j++) {
this.matched[sf.filters[j].id] = 1;
}
sf.filters.forEach(filter => {
this.matched[filter.id] = 1;
});
}
}
});
}
}

Expand Down