-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: network filter ending with right hat recognized as regex #3871
Conversation
// Remove leading '*' if the filter is not hostname anchored. | ||
if ( | ||
getBit(mask, NETWORK_FILTER_MASK.isHostnameAnchor) === false && | ||
filterIndexEnd - filterIndexStart > 0 && | ||
line.charCodeAt(filterIndexStart) === 42 /* '*' */ | ||
) { | ||
mask = clearBit(mask, NETWORK_FILTER_MASK.isLeftAnchor); | ||
filterIndexStart += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to the fix of filters ending with ^
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a kind of optimisation. There's a test relevant to this: *bar^
. Also see https://ghostery.slack.com/archives/C06JX8NJVMF/p1712045816656759
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be issues with this change. For example the following three cases return false
but I would expect all of them to return true
instead:
const { Request, f } = require("@cliqz/adblocker");
console.log(f`foo^`.match(Request.fromRawDetails({
url: "http://foo.com/bar",
type: "script"
})));
console.log(f`foo^`.match(Request.fromRawDetails({
url: "http://example.com/foo/bar",
type: "script"
})));
console.log(f`foo^`.match(Request.fromRawDetails({
url: "http://foo.com/foo/bar",
type: "script"
})));
The caret symbol is used to match the start and end of the given pattern. The leading asterisk symbol is effective without explicitly having it. Therefore, removing the leading asterisk is an effective optimisation.
The transformation to regexp filter in the parsing logic is intended, Closing. /**
* Compiles a filter pattern to a regex. This is only performed *lazily* for
* filters containing at least a * or ^ symbol. Because Regexes are expansive,
* we try to convert some patterns to plain filters.
*/
function compileRegex(
filter: string,
isLeftAnchor: boolean,
isRightAnchor: boolean,
isFullRegex: boolean,
): RegExp In the future, especially for optimisation, I think an update to matching mechanism should be a better path. |
fixes #3694
Summary
This fixes a network filter ending with a hat sign (
hostname^
) not to be resolved as a regexp filter.Internals
The
NETWORK_FILTER_MASK.isRegex
is being set at the following point. Thefilter
has a legit length thencheckIsRegex
does false positives.packages/adblocker/src/filters/network.ts
checkIsRegex
checks if thefilter
string has a hat sign or asterisk sign. However, this logic also triggershostname^
pattern to be checked.Suggested changes
I added a proper hostname checker to validate if the
filter
is a valid hostname in thecheckIsRegex
function. The suggested changes check if thefilter
string has a trailing hat because if a hat sign in other locations does present, it does mean it's often used as a regex symbol.Also some cases including glob pattern need to be judged. For example,
hostname*^
pattern should be handled as regex because it includes glob pattern. Therefore, checking the trailing hat is preferred in this case.