-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIXED] Use of *
and >
in subjects as literals
#561
Conversation
The issue was that a subject such as `foo.bar,*,>` would be inserted to the cache as is, but when trying to remove from the cache, calling matchLiteral() with the above subject in the cache against the same subject would return false. This is because matchLiteral would treat those characters as wildcards token. Note that the sublist itself splits subjects on the `.` separator and seem not bothered by such subject (would have `foo` and `bar,*,>` tokens). Also, note that IsValidSubject() and IsValidLiteralSubject() properly checked that the characters `*` and `>` are treated as wildcards only if they are tokens on their own. Resolves #558
Does this have a noticeable impact on that functions performance? Should we track separators and token length to make the checks more straightforward? |
Will look into make it faster. But yes, there is an impact as it is now. But fundamentally, do you agree that |
Yes I agree, but want to be sensitive to performance impacts.
…On Wed, Aug 16, 2017 at 2:16 PM, Ivan Kozlovic ***@***.***> wrote:
Will look into make it faster. But yes, there is an impact as it is now.
But fundamentally, do you agree that matchLiteral("foo,*,>", "foo,*,>")
should return true because both strings are literals? (current behavior
is that second param of the function will be interpreting the * and >
as wildcards and therefore return false).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#561 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAFf8cRCnJWu8ZO1Gt77uZNQskE6CVhkks5sY1wkgaJpZM4O5T8K>
.
|
Agreed.. working on optimizations, but difficult to be as fast as with old code since it was not checking that |
I think the trick may be in tracking "." separators and keeping separate
count on token length, if when you see next ".", or end of subject, etc.
then and only then to you check for pwc or fwc.
…On Wed, Aug 16, 2017 at 5:14 PM, Ivan Kozlovic ***@***.***> wrote:
Agreed.. working on optimizations, but difficult to be as fast as with old
code since it was not checking that * or > were single tokens. Still
making some improvements and adding a benchmark test. Will update the PR
later.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#561 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAFf8T95srlBa4NGY8bKzDy7Z8jN1Lb9ks5sY4X6gaJpZM4O5T8K>
.
|
I think I tried that but it was slow, will try again. Anytime we check ahead (even if we know that it is in bound, assembly code shows that there is a bound check with possible invocation of panicindex()). |
I am going to push a change that includes a small optimization, simplification of 'if' statements and a benchmark test. Will continue trying to optimize. Let's not merge for now. |
Changes Unknown when pulling 0cc49ec on fix_issue_558 into ** on master**. |
Impact on normal bench is not visible. This is a comparison between master and the branch in 1 run only: (note negative value means branch (new) is faster)
I ran the test another time and the |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This is similar to #561 where `*` and `>` characters appear in tokens as literals, not wilcards. Both Insert() and Remove() were checking that the first character was `*` or `>` and consider it a wildcard node. This is wrong. Any token that is more than 1 character long must be treated as a literal. Only for token of size one should we check if the character is `*` or `>`. Added a test case for Insert and Remove with subject like `foo.*-` or `foo.>-`.
The issue was that a subject such as
foo.bar,*,>
would beinserted to the cache as is, but when trying to remove from the
cache, calling matchLiteral() with the above subject in the cache
against the same subject would return false. This is because
matchLiteral would treat those characters as wildcards token.
Note that the sublist itself splits subjects on the
.
separatorand seem not bothered by such subject (would have
foo
andbar,*,>
tokens). Also, note that IsValidSubject() and IsValidLiteralSubject()
properly checked that the characters
*
and>
are treatedas wildcards only if they are tokens on their own.
Resolves #558
/cc @derekcollison if you could have a look to see if this makes sense, thanks!