Correctly support Regex and default lookup #32

mcdonnelldean · 2016-06-18T21:52:41Z

@mcollina @davidmarkclements Looks like Regex isn't actually supported. See failure.

EDIT

Default lookup and Regex are now supported but there is a perf hit at 500+ entries. This is because in order to check regexs we need to always check all buckets.

I think we need two changes to undo the perf penalty,

Mark a bucket when it contains a regex, regex buckets are always returned on matching
If an input to lookup contains a regex and returns no matches, test all buckets.

This means adding regex's will slow your instance down but this should be assumed since it is essentially subverting the point of bucketing in the first place.

Some points to note.

If regexs are used in lookup only that lookup suffers a perf hit, all others do not
If a regex is used when adding a pattern, all lookups suffer a perf hit.

The perf hit on adding regexs is because each lookup will always check N number of buckets where N is equal to it's matches plus any buckets with regex's.

coveralls · 2016-06-18T21:53:49Z

Coverage increased (+0.3%) to 98.972% when pulling 12e6965 on test-fix into 56ed0aa on master.

coveralls · 2016-06-18T22:57:30Z

Coverage increased (+0.5%) to 99.231% when pulling e24e475 on test-fix into 56ed0aa on master.

mcdonnelldean · 2016-06-18T23:02:25Z

@mcollina @davidmarkclements Regex now supported in both add() and list(). The only issue is the benchmark suffers, CF,

threeEntries*100000: 236ms
fiveHundredEntries*100000: 3956ms
fiveHundredEntriesAndProperties*100000: 6237ms
fiveHundredEntriesAndKnownProperties*100000: 4021ms
patrunThreeEntries*100000: 4182ms
patrunFiveHundredEntriesAndProperties*100000: 20024ms
threeEntries*100000: 219ms
fiveHundredEntries*100000: 3909ms
fiveHundredEntriesAndProperties*100000: 6266ms
fiveHundredEntriesAndKnownProperties*100000: 4027ms
patrunThreeEntries*100000: 4206ms
patrunFiveHundredEntriesAndProperties*100000: 18810ms

-----------------------------------------------------------------

threeEntries*100000: 235ms
fiveHundredEntries*100000: 3934ms
fiveHundredEntriesAndProperties*100000: 6226ms
fiveHundredEntriesAndKnownProperties*100000: 4021ms
patrunThreeEntries*100000: 4105ms
patrunFiveHundredEntriesAndProperties*100000: 18959ms
threeEntries*100000: 244ms
fiveHundredEntries*100000: 4030ms
fiveHundredEntriesAndProperties*100000: 6408ms
fiveHundredEntriesAndKnownProperties*100000: 4102ms
patrunThreeEntries*100000: 4098ms
patrunFiveHundredEntriesAndProperties*100000: 18917ms

The reason being usually when a pattern doesn't match you get no buckets and a fast answer, but to support regex we need to always check all buckets if none are found. Opinions to improve the speed welcome.

coveralls · 2016-06-18T23:19:01Z

Coverage increased (+0.6%) to 99.25% when pulling 4d381fb on test-fix into 56ed0aa on master.

mcdonnelldean · 2016-06-18T23:19:05Z

Default pattern instance.add({}, 'foo') now supported.

mcollina · 2016-06-20T06:37:19Z

Regarding the regexp, it is possible to have no penalty in most cases. Let's divide into two case:

Patterns where the regexp is not the only specified component { ns: 'matteo', cmd: /save.*/ }
Patterns where the regexp is the only specified component { cmd: /save.*/ }

The original implementation only supported case 1, but it should have probably thrown on add() on case 2 (not a silent fail). Anyway, we need to give fast answers to case 1. We can do this just by ignoring the regexp when matching the buckets, and selecting only the buckets that are matching.

To support a quick 2 implementation, I think we should treat them as "default routes with condition", meaning the _defaultResult will be a default bucket we match against (and we always do).

mcollina · 2016-06-20T06:43:34Z

bloomrun.js

@@ -16,6 +16,7 @@ function BloomRun (opts) {
  this._isDeep = opts && opts.indexing === 'depth'
  this._buckets = []
  this._properties = new Set()
+  this._defaultResult = null


This needs to be an array, we can specify multiple catchall, as that is what the API says.
That's part of the reason I'm against having a catchall, as I think it's a feature of someone including bloomrun.

I would rather have the distinction between match all and catch all. Meaning regexs on their own should get a their own array and not the defaultResult which I would only see as a single object

In bloomrun you can add multiple (identical) patterns. This should apply also to the catchall.
The way patterns with only regexps can be implemented is to skip them in the indexing: for all intent and purposes are catchalls.

Not true, a regex is only applicable when it matches right? If it never matches it is not applicable. There is a distinction between catch all and match all.

By it's very nature you can only have one catch all. The idea being when Nothing matches then I use my catch all

catchall

IMHO either it is special, and you implement it outside of bloomrun, or it is "business as usual", and it follows the same
rules of everything else from an API point of view. Having something special that you add via normal APIs might trigger unsuspected bugs later on.

Neither this PR nor the current master follows this rule, so we need to fix.

matchall

For me this is: { cmd: /.*/ }. Is this what you mean?

From a code point of view, this pattern cannot be indexed by the bloom filters. This means we have {} to be indexed (and this is why these were not working before). Given any object, if the bloom filters are not matching, then you are in the catchall space. Before applying a catchall, we need to check the regexps.

davidmarkclements · 2016-06-20T08:02:37Z

Just for reference (re @mcollina saying maybe this should be outside bloomrun):

https://github.com/apparatus/mu/blob/master/lib/router.js#L18

Catch all is super easy, outside bloomrun

mcollina · 2016-06-20T08:08:45Z

@davidmarkclements that's what I mean. I still think we should add something here, but it should not be "special". If you want a special behavior, add it outside of bloomrun.

mcdonnelldean · 2016-06-20T08:14:24Z

@mcollina @davidmarkclements Can we do it without being special though? I like the idea of the feature but I am in agreement that because it is 'special' it is akin to having a sort of icky DSL. I would personally prefer to have this as something like bloomrun.default(some_thing) and then that is returned on no match. Same functionality but far more explicit.

mcollina · 2016-06-20T08:17:28Z

To recap:

throw on empty patterns (!)
put all regexp-only patterns in their own array, and go through them when no other thing matches
add a default() thing

mcdonnelldean · 2016-06-20T08:18:25Z

@mcollina 👍

mcdonnelldean · 2016-06-21T16:26:41Z

@davidmarkclements @mcollina First draft based on above, needs some perf work but all the tests pass. Please review

mcollina · 2016-06-21T16:42:36Z

bloomrun.js

@@ -55,7 +58,16 @@ function removeProperty (key) {
  this.delete(key)
 }

+BloomRun.prototype.default = function (payload) {
+  this._defaultResult = payload || payload


mcollina · 2016-06-21T16:45:42Z

Good work! Some minor thing that should sort out the perf.

mcdonnelldean · 2016-06-28T13:13:49Z

@mcollina Can you take a quick look over before I release?

mcollina · 2016-06-28T13:37:39Z

Is perf ok?

mcdonnelldean · 2016-06-28T13:51:54Z

@mcollina Results looking like,

threeEntries*100000: 277ms
fiveHundredEntries*100000: 2570ms
fiveHundredEntriesAndProperties*100000: 2764ms
fiveHundredEntriesAndKnownProperties*100000: 2569ms
patrunThreeEntries*100000: 4929ms
patrunFiveHundredEntriesAndProperties*100000: 22033ms

threeEntries*100000: 378ms
fiveHundredEntries*100000: 3108ms
fiveHundredEntriesAndProperties*100000: 2979ms
fiveHundredEntriesAndKnownProperties*100000: 3019ms
patrunThreeEntries*100000: 5998ms
patrunFiveHundredEntriesAndProperties*100000: 29071ms

threeEntries*100000: 282ms
fiveHundredEntries*100000: 2514ms
fiveHundredEntriesAndProperties*100000: 2460ms
fiveHundredEntriesAndKnownProperties*100000: 2521ms
patrunThreeEntries*100000: 5004ms
patrunFiveHundredEntriesAndProperties*100000: 19479ms

mcollina · 2016-06-28T14:10:47Z

Ok to me.

LGTM.

mcdonnelldean · 2016-06-28T15:23:46Z

Good stuff, I have time first thing in the morning to merge and push a version. (10am)

mcdonnelldean mentioned this pull request Jun 18, 2016

Regex isn't supported as tests claim #33

Closed

mcdonnelldean mentioned this pull request Jun 18, 2016

catch all/wild card matching #28

Closed

mcdonnelldean changed the title ~~Regex isn't actually supported in either add or lookup~~ Correctly support Regex and default lookup Jun 18, 2016

mcollina reviewed Jun 20, 2016
View reviewed changes

mcdonnelldean added 3 commits June 21, 2016 15:53

Show regex isn't supported

c043707

Regex now supported

5aa79ea

Default pattern now supported

d120c59

mcdonnelldean force-pushed the test-fix branch from 4d381fb to d120c59 Compare June 21, 2016 14:54

Reset tests, corrected logic

e8fa016

mcollina reviewed Jun 21, 2016
View reviewed changes

perf improvement for onlyRegex

88d2f54

mcdonnelldean merged commit 03a8aed into master Jul 13, 2016

mcdonnelldean deleted the test-fix branch July 13, 2016 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly support Regex and default lookup #32

Correctly support Regex and default lookup #32

mcdonnelldean commented Jun 18, 2016 •

edited

coveralls commented Jun 18, 2016 •

edited

coveralls commented Jun 18, 2016 •

edited

mcdonnelldean commented Jun 18, 2016

coveralls commented Jun 18, 2016 •

edited

mcdonnelldean commented Jun 18, 2016

mcollina commented Jun 20, 2016

mcollina Jun 20, 2016

mcdonnelldean Jun 20, 2016

mcollina Jun 20, 2016 •

edited

mcdonnelldean Jun 20, 2016

mcollina Jun 20, 2016

davidmarkclements commented Jun 20, 2016 •

edited

mcollina commented Jun 20, 2016

mcdonnelldean commented Jun 20, 2016

mcollina commented Jun 20, 2016

mcdonnelldean commented Jun 20, 2016

mcdonnelldean commented Jun 21, 2016

mcollina Jun 21, 2016

mcollina commented Jun 21, 2016

mcdonnelldean commented Jun 28, 2016

mcollina commented Jun 28, 2016

mcdonnelldean commented Jun 28, 2016 •

edited

mcollina commented Jun 28, 2016

mcdonnelldean commented Jun 28, 2016

Correctly support Regex and default lookup #32

Correctly support Regex and default lookup #32

Conversation

mcdonnelldean commented Jun 18, 2016 • edited

coveralls commented Jun 18, 2016 • edited

coveralls commented Jun 18, 2016 • edited

mcdonnelldean commented Jun 18, 2016

coveralls commented Jun 18, 2016 • edited

mcdonnelldean commented Jun 18, 2016

mcollina commented Jun 20, 2016

mcollina Jun 20, 2016

Choose a reason for hiding this comment

mcdonnelldean Jun 20, 2016

Choose a reason for hiding this comment

mcollina Jun 20, 2016 • edited

Choose a reason for hiding this comment

mcdonnelldean Jun 20, 2016

Choose a reason for hiding this comment

mcollina Jun 20, 2016

Choose a reason for hiding this comment

catchall

matchall

davidmarkclements commented Jun 20, 2016 • edited

mcollina commented Jun 20, 2016

mcdonnelldean commented Jun 20, 2016

mcollina commented Jun 20, 2016

mcdonnelldean commented Jun 20, 2016

mcdonnelldean commented Jun 21, 2016

mcollina Jun 21, 2016

Choose a reason for hiding this comment

mcollina commented Jun 21, 2016

mcdonnelldean commented Jun 28, 2016

mcollina commented Jun 28, 2016

mcdonnelldean commented Jun 28, 2016 • edited

mcollina commented Jun 28, 2016

mcdonnelldean commented Jun 28, 2016

mcdonnelldean commented Jun 18, 2016 •

edited

coveralls commented Jun 18, 2016 •

edited

coveralls commented Jun 18, 2016 •

edited

coveralls commented Jun 18, 2016 •

edited

mcollina Jun 20, 2016 •

edited

davidmarkclements commented Jun 20, 2016 •

edited

mcdonnelldean commented Jun 28, 2016 •

edited