Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly support Regex and default lookup #32

Merged
merged 5 commits into from
Jul 13, 2016
Merged

Conversation

mcdonnelldean
Copy link
Contributor

@mcdonnelldean mcdonnelldean commented Jun 18, 2016

@mcollina @davidmarkclements Looks like Regex isn't actually supported. See failure.

EDIT

Default lookup and Regex are now supported but there is a perf hit at 500+ entries. This is because in order to check regexs we need to always check all buckets.

I think we need two changes to undo the perf penalty,

  1. Mark a bucket when it contains a regex, regex buckets are always returned on matching
  2. If an input to lookup contains a regex and returns no matches, test all buckets.

This means adding regex's will slow your instance down but this should be assumed since it is essentially subverting the point of bucketing in the first place.

Some points to note.

  • If regexs are used in lookup only that lookup suffers a perf hit, all others do not
  • If a regex is used when adding a pattern, all lookups suffer a perf hit.

The perf hit on adding regexs is because each lookup will always check N number of buckets where N is equal to it's matches plus any buckets with regex's.

@coveralls
Copy link

coveralls commented Jun 18, 2016

Coverage Status

Coverage increased (+0.3%) to 98.972% when pulling 12e6965 on test-fix into 56ed0aa on master.

@coveralls
Copy link

coveralls commented Jun 18, 2016

Coverage Status

Coverage increased (+0.5%) to 99.231% when pulling e24e475 on test-fix into 56ed0aa on master.

@mcdonnelldean
Copy link
Contributor Author

@mcollina @davidmarkclements Regex now supported in both add() and list(). The only issue is the benchmark suffers, CF,

threeEntries*100000: 236ms
fiveHundredEntries*100000: 3956ms
fiveHundredEntriesAndProperties*100000: 6237ms
fiveHundredEntriesAndKnownProperties*100000: 4021ms
patrunThreeEntries*100000: 4182ms
patrunFiveHundredEntriesAndProperties*100000: 20024ms
threeEntries*100000: 219ms
fiveHundredEntries*100000: 3909ms
fiveHundredEntriesAndProperties*100000: 6266ms
fiveHundredEntriesAndKnownProperties*100000: 4027ms
patrunThreeEntries*100000: 4206ms
patrunFiveHundredEntriesAndProperties*100000: 18810ms

-----------------------------------------------------------------

threeEntries*100000: 235ms
fiveHundredEntries*100000: 3934ms
fiveHundredEntriesAndProperties*100000: 6226ms
fiveHundredEntriesAndKnownProperties*100000: 4021ms
patrunThreeEntries*100000: 4105ms
patrunFiveHundredEntriesAndProperties*100000: 18959ms
threeEntries*100000: 244ms
fiveHundredEntries*100000: 4030ms
fiveHundredEntriesAndProperties*100000: 6408ms
fiveHundredEntriesAndKnownProperties*100000: 4102ms
patrunThreeEntries*100000: 4098ms
patrunFiveHundredEntriesAndProperties*100000: 18917ms

The reason being usually when a pattern doesn't match you get no buckets and a fast answer, but to support regex we need to always check all buckets if none are found. Opinions to improve the speed welcome.

@coveralls
Copy link

coveralls commented Jun 18, 2016

Coverage Status

Coverage increased (+0.6%) to 99.25% when pulling 4d381fb on test-fix into 56ed0aa on master.

@mcdonnelldean
Copy link
Contributor Author

Default pattern instance.add({}, 'foo') now supported.

@mcdonnelldean mcdonnelldean changed the title Regex isn't actually supported in either add or lookup Correctly support Regex and default lookup Jun 18, 2016
@mcollina
Copy link
Owner

Regarding the regexp, it is possible to have no penalty in most cases. Let's divide into two case:

  1. Patterns where the regexp is not the only specified component { ns: 'matteo', cmd: /save.*/ }
  2. Patterns where the regexp is the only specified component { cmd: /save.*/ }

The original implementation only supported case 1, but it should have probably thrown on add() on case 2 (not a silent fail). Anyway, we need to give fast answers to case 1. We can do this just by ignoring the regexp when matching the buckets, and selecting only the buckets that are matching.

To support a quick 2 implementation, I think we should treat them as "default routes with condition", meaning the _defaultResult will be a default bucket we match against (and we always do).

@@ -16,6 +16,7 @@ function BloomRun (opts) {
this._isDeep = opts && opts.indexing === 'depth'
this._buckets = []
this._properties = new Set()
this._defaultResult = null
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an array, we can specify multiple catchall, as that is what the API says.
That's part of the reason I'm against having a catchall, as I think it's a feature of someone including bloomrun.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have the distinction between match all and catch all. Meaning regexs on their own should get a their own array and not the defaultResult which I would only see as a single object

Copy link
Owner

@mcollina mcollina Jun 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bloomrun you can add multiple (identical) patterns. This should apply also to the catchall.
The way patterns with only regexps can be implemented is to skip them in the indexing: for all intent and purposes are catchalls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true, a regex is only applicable when it matches right? If it never matches it is not applicable. There is a distinction between catch all and match all.

By it's very nature you can only have one catch all. The idea being when Nothing matches then I use my catch all

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catchall

IMHO either it is special, and you implement it outside of bloomrun, or it is "business as usual", and it follows the same
rules of everything else from an API point of view. Having something special that you add via normal APIs might trigger unsuspected bugs later on.

Neither this PR nor the current master follows this rule, so we need to fix.

matchall

For me this is: { cmd: /.*/ }. Is this what you mean?

From a code point of view, this pattern cannot be indexed by the bloom filters. This means we have {} to be indexed (and this is why these were not working before). Given any object, if the bloom filters are not matching, then you are in the catchall space. Before applying a catchall, we need to check the regexps.

@davidmarkclements
Copy link
Contributor

davidmarkclements commented Jun 20, 2016

Just for reference (re @mcollina saying maybe this should be outside bloomrun):

https://github.com/apparatus/mu/blob/master/lib/router.js#L18

Catch all is super easy, outside bloomrun

@mcollina
Copy link
Owner

@davidmarkclements that's what I mean. I still think we should add something here, but it should not be "special". If you want a special behavior, add it outside of bloomrun.

@mcdonnelldean
Copy link
Contributor Author

@mcollina @davidmarkclements Can we do it without being special though? I like the idea of the feature but I am in agreement that because it is 'special' it is akin to having a sort of icky DSL. I would personally prefer to have this as something like bloomrun.default(some_thing) and then that is returned on no match. Same functionality but far more explicit.

@mcollina
Copy link
Owner

To recap:

  1. throw on empty patterns (!)
  2. put all regexp-only patterns in their own array, and go through them when no other thing matches
  3. add a default() thing

@mcdonnelldean
Copy link
Contributor Author

@mcollina 👍

@mcdonnelldean
Copy link
Contributor Author

@davidmarkclements @mcollina First draft based on above, needs some perf work but all the tests pass. Please review

@@ -55,7 +58,16 @@ function removeProperty (key) {
this.delete(key)
}

BloomRun.prototype.default = function (payload) {
this._defaultResult = payload || payload
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ||?

@mcollina
Copy link
Owner

Good work! Some minor thing that should sort out the perf.

@mcdonnelldean
Copy link
Contributor Author

@mcollina Can you take a quick look over before I release?

@mcollina
Copy link
Owner

Is perf ok?

@mcdonnelldean
Copy link
Contributor Author

mcdonnelldean commented Jun 28, 2016

@mcollina Results looking like,

threeEntries*100000: 277ms
fiveHundredEntries*100000: 2570ms
fiveHundredEntriesAndProperties*100000: 2764ms
fiveHundredEntriesAndKnownProperties*100000: 2569ms
patrunThreeEntries*100000: 4929ms
patrunFiveHundredEntriesAndProperties*100000: 22033ms

threeEntries*100000: 378ms
fiveHundredEntries*100000: 3108ms
fiveHundredEntriesAndProperties*100000: 2979ms
fiveHundredEntriesAndKnownProperties*100000: 3019ms
patrunThreeEntries*100000: 5998ms
patrunFiveHundredEntriesAndProperties*100000: 29071ms

threeEntries*100000: 282ms
fiveHundredEntries*100000: 2514ms
fiveHundredEntriesAndProperties*100000: 2460ms
fiveHundredEntriesAndKnownProperties*100000: 2521ms
patrunThreeEntries*100000: 5004ms
patrunFiveHundredEntriesAndProperties*100000: 19479ms

@mcollina
Copy link
Owner

Ok to me.

LGTM.

@mcdonnelldean
Copy link
Contributor Author

Good stuff, I have time first thing in the morning to merge and push a version. (10am)

@mcdonnelldean mcdonnelldean merged commit 03a8aed into master Jul 13, 2016
@mcdonnelldean mcdonnelldean deleted the test-fix branch July 13, 2016 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants