-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement $replace
modifier, improve option parsing
#3897
base: master
Are you sure you want to change the base?
Conversation
$replace
modifier
Note: |
|
const [, rawRegexp, replacement, modifiers] = splitUnescaped(optionValue, '/'); | ||
const regexp = new RegExp(rawRegexp, modifiers); | ||
|
||
return [regexp, replacement]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the filter doesn't need to deal with multiple html filtering filter modifiers, we may build regex and replacement at the parse time instead of runtime. (Then throw an error if multiple html filtering modifiers were found)
@seia-soto Could you share more information on the
Thanks, I think it will help review the implementation. |
@@ -2444,4 +2444,12 @@ describe('scriptlets arguments parsing', () => { | |||
); | |||
} | |||
}); | |||
|
|||
it('parses replace modifier', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add more tests since anything related to HTML filtering is critical to get right. If possible, let's add one test case for each existing filter in the lists used so far (I assume there are not so many). Let's particularly try to find corner cases of the behavior of this new $replace
option.
We also need more end-to-end tests on the whole HTML filtering.
return nextIndex; | ||
} | ||
|
||
function splitUnescaped(text: string, character: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we add enough test cases to cover this (and above) functions.
return null; | ||
} | ||
|
||
public isHtmlFilteringRule(): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: I wonder if it could make sense to also use the NetworkFilter abstraction to represent the ^script-text html filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking a similar system to that. However, I don't know if it's possible to make a proper regexp filtering system with streaming data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation already supports regexps from what I can tell.
For example this filter (already supported):
##^script:has-text(/innerHTML.*appendChild/)
Would be equivalent to (if my understanding is correct):
$replace=/innerHTML.*appendChild//
So the current mechanism needs to be extended but it seems doable without changing to a different framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation does buffer the all of the script tag until they're finished as I know. This means we know the start and end of the part. However, $replace
is performed in full content which means we might need to know full text.
@remusao In aspect of adblocker library, the importance of |
All uBO filters
|
Just a note: better filter selection should be done from |
Current matching logic in Ghostery 10 for Firefox: will have to changed from:
to:
|
2a03145
to
da5f5b9
Compare
Note: This PR requires additional updates coming from: seia-soto#3 |
2d34e34
to
18bd35b
Compare
$replace
modifier$replace
modifier, improve option parsing
This also requires a performance improvement.
Ratio offset from initialisation time:
*sha:dc9d1b6a0711c22464c6e3006338165cc26d4ba2 |
it('removes matching modifiers', () => { | ||
const modifiers: HTMLModifier[] = [[new RegExp('"trackingParam":"(\\w+)"'), '"$1":""']]; | ||
|
||
expect(filter(`{"trackingParam":"a"}`, [], modifiers)).to.be.eql(`{"a":""}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have more tests with fragments that look like real HTML.
@@ -212,6 +217,12 @@ describe('html-filtering', () => { | |||
).to.equal(` | |||
<!DOCTYPE html><html lang="en"><head><div id="2x-container"><<script src="https://www.redditstatic.com/desktop2x/js/ads.js"></script><script id="data">window.___r = {"accountManagerModalData":{}},;</script><script defer="" src="https://www.example.com/foo.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/RedesignContentFonts.509eef5d33306bd3b0d5.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/RedesignOldContentFonts.e450653685d17337cac6.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/vendors~Chat~Governance~Reddit.503ee0c2d353daa60d6e.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/vendors~Governance~Reddit.7e2adb288af56de67f65.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/vendors~Poll~Reddit.5f77a82de48fbb3beb21.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/vendors~EconHelperActions~Reddit.ae3c9f7d5b30b3be7151.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/vendors~Reddit.bb2ade21a865dbd52f3f.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/Chat~Governance~Reddit.19024d94a81678cf79e8.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/Governance~Reddit.98e55a3111b273b2f5dd.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/AdminCommunityTopics~Reddit.f091b12b417d6343dc18.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/Reddit.dc10f78afef6b219b26f.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/vendors~CollectionCommentsPage~CommentsPage~Explore~Frontpage~GovernanceReleaseNotesModal~ModListing~afc2720f.c6d86939d4bd0e144927.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/vendors~Chat~ChatMessageInput~CollectionCommentsPage~CommentsPage~Frontpage~PostCreation~RedesignCha~0aefb917.e6923ac4e90b854a1995.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/ChatMessageInput~ChatPost~CollectionCommentsPage~CommentsPage~Explore~Frontpage~GovernanceReleaseNot~3a34166c.dab3a37bed364deddf0e.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/ChatPost~CollectionCommentsPage~CommentsPage~Explore~Frontpage~GovernanceReleaseNotesModal~ModListin~44a849ee.85ccf598d319cc749b92.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/CollectionCommentsPage~CommentsPage~Explore~Frontpage~ModListing~ModQueuePages~ModerationPages~Multi~33b955cc.9c1942fb8eb4378e467c.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/CollectionCommentsPage~CommentsPage~Explore~Frontpage~GovernanceReleaseNotesModal~ModListing~ModQueu~900871b8.509ec80000f4968bb546.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/ChatPost~CollectionCommentsPage~CommentsPage~Frontpage~ModListing~ModQueuePages~ModerationPages~Mult~8849df7b.067fda741fb3f181c83f.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/CollectionCommentsPage~CommentsPage~Explore~Frontpage~ModListing~ModQueuePages~ModerationPages~Multi~5f2f5c2a.9e1690590f39e0d92f45.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/ChatPost~CollectionCommentsPage~CommentsPage~Frontpage~ModListing~ModQueuePages~Multireddit~Original~029c3338.33a759111dafa848481d.js"></script><script defer="" src="https://www.redditstatic.com/desktop2x/CommentsPage.dce215cbd6a2ed7e9969.js"></script></body></html>`); | |||
}); | |||
|
|||
it('removes matching modifiers', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few things to improve here:
- HTMLSelector tests to include HTMLModifier's to ensure no breakage when both are applied
- HTMLModifier tests to cover both examples of a complete buffer but also when it is being streamed
- HTMLModifier tests to include full document tests, can be run on top of the
doc
present in the tests already
eadf7fb
to
be245e8
Compare
https://adguard.com/kb/general/ad-filtering/create-own-filters/#replace-modifier feat: shared option value feat: html modifiers test: replace modifier test: escaped string utils chore: replace modifierOptionValue to optionValue chore: remove unintended logger feat: validate replace option value on parse feat: html filter index for network filter bucket chore: check html modifiers on filtering req html fix: splitUnescaped test: add replace filters fix: add export of HTMLModifier feat: proper option parser for replace modifier (#3) * fix: properly find the filter options index * test: validate the function with real world example * feat: use lastIndexOf position instead of text slicing * chore: check lastIndex out of bound * feat: support `$replace` modifier https://adguard.com/kb/general/ad-filtering/create-own-filters/#replace-modifier * feat: shared option value * feat: html modifiers * test: replace modifier * test: escaped string utils * chore: replace modifierOptionValue to optionValue * chore: remove unintended logger * feat: validate replace option value on parse * feat: html filter index for network filter bucket * chore: check html modifiers on filtering req html * fix: splitUnescaped * test: add replace filters * feat: proper option parser for replace modifier * chore: move string methods to utils.ts * chore: add comments to the functions * chore: remove unused imports fix: invalid syntax declaration chore: optimise code flow chore: reduce loops and footprint chore: remove unnecessary condition It's already decided before this step chore: force line end to stop parsing additional options fix(test): irregular access of replace modifier array test: parsing regexp test fix(test): invalid cases and evals chore: keep isRedirectRule flag chore: obsolete splitUnescaped chore: clean up - remove duplicate function defs - scope each filter testing of replace modifier - clean up imports fix: chance of dropping unhandled buffer test: more html modifier scenarios
d9521bc
to
d2f4c60
Compare
! uBlockOrigin/uAssets#22620 alliptvlinks.com##+js(no-xhr-if, /doubleclick|googlesyndication/, length:10) ||alliptvlinks.com/tktk-content/plugins/$script,1p,replace=/\bconst now.+?, 100/clearInterval(timer);resolve();}, 100/gms
Quick look on the CLI introduced in
|
fixes #3886 built top on #3887
https://adguard.com/kb/general/ad-filtering/create-own-filters/#replace-modifier
also includes:
Why
rawOptions
?$replace
is not the only option we should deal with network filters' html filtering capability. Most of network filtering options involve pattern definition in its option. This prevents writing additional fields in future.