-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly support Regex and default lookup #32
Conversation
@mcollina @davidmarkclements Regex now supported in both
The reason being usually when a pattern doesn't match you get no buckets and a fast answer, but to support regex we need to always check all buckets if none are found. Opinions to improve the speed welcome. |
Default pattern |
Regarding the regexp, it is possible to have no penalty in most cases. Let's divide into two case:
The original implementation only supported case 1, but it should have probably thrown on To support a quick 2 implementation, I think we should treat them as "default routes with condition", meaning the |
@@ -16,6 +16,7 @@ function BloomRun (opts) { | |||
this._isDeep = opts && opts.indexing === 'depth' | |||
this._buckets = [] | |||
this._properties = new Set() | |||
this._defaultResult = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be an array, we can specify multiple catchall, as that is what the API says.
That's part of the reason I'm against having a catchall, as I think it's a feature of someone including bloomrun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have the distinction between match all and catch all. Meaning regexs on their own should get a their own array and not the defaultResult which I would only see as a single object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In bloomrun you can add multiple (identical) patterns. This should apply also to the catchall.
The way patterns with only regexps can be implemented is to skip them in the indexing: for all intent and purposes are catchalls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true, a regex is only applicable when it matches right? If it never matches it is not applicable. There is a distinction between catch all and match all.
By it's very nature you can only have one catch all. The idea being when Nothing matches then I use my catch all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catchall
IMHO either it is special, and you implement it outside of bloomrun, or it is "business as usual", and it follows the same
rules of everything else from an API point of view. Having something special that you add via normal APIs might trigger unsuspected bugs later on.
Neither this PR nor the current master follows this rule, so we need to fix.
matchall
For me this is: { cmd: /.*/ }
. Is this what you mean?
From a code point of view, this pattern cannot be indexed by the bloom filters. This means we have {}
to be indexed (and this is why these were not working before). Given any object, if the bloom filters are not matching, then you are in the catchall space. Before applying a catchall, we need to check the regexps.
Just for reference (re @mcollina saying maybe this should be outside bloomrun): https://github.com/apparatus/mu/blob/master/lib/router.js#L18 Catch all is super easy, outside bloomrun |
@davidmarkclements that's what I mean. I still think we should add something here, but it should not be "special". If you want a special behavior, add it outside of bloomrun. |
@mcollina @davidmarkclements Can we do it without being special though? I like the idea of the feature but I am in agreement that because it is 'special' it is akin to having a sort of icky DSL. I would personally prefer to have this as something like |
To recap:
|
@davidmarkclements @mcollina First draft based on above, needs some perf work but all the tests pass. Please review |
@@ -55,7 +58,16 @@ function removeProperty (key) { | |||
this.delete(key) | |||
} | |||
|
|||
BloomRun.prototype.default = function (payload) { | |||
this._defaultResult = payload || payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ||
?
Good work! Some minor thing that should sort out the perf. |
@mcollina Can you take a quick look over before I release? |
Is perf ok? |
@mcollina Results looking like,
|
Ok to me. LGTM. |
Good stuff, I have time first thing in the morning to merge and push a version. (10am) |
@mcollina @davidmarkclements Looks like Regex isn't actually supported. See failure.
EDIT
Default lookup and Regex are now supported but there is a perf hit at 500+ entries. This is because in order to check regexs we need to always check all buckets.
I think we need two changes to undo the perf penalty,
This means adding regex's will slow your instance down but this should be assumed since it is essentially subverting the point of bucketing in the first place.
Some points to note.
The perf hit on adding regexs is because each lookup will always check N number of buckets where N is equal to it's matches plus any buckets with regex's.