Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A case that a custom RegExp rule doesn't work #3208

Closed
danny0838 opened this issue Nov 8, 2017 · 8 comments
Closed

A case that a custom RegExp rule doesn't work #3208

danny0838 opened this issue Nov 8, 2017 · 8 comments

Comments

@danny0838
Copy link

danny0838 commented Nov 8, 2017

Steps for anyone to reproduce the issue

  1. Add a new Chrome user and switch to it.
  2. Install UBO.
  3. Add a single custom RegExp rule /^https?://(?:\w+[.])?youtube[.]com/user/julica138(?=[/?#]|$)/
  4. Load this page: https://www.youtube.com/user/julica138
  5. The page is not blocked by the RegExp rule.

Your settings

  • OS/version: Windows 7 SP1
  • Browser/version: Google Chrome x64 62.0.3202.89
  • uBlock Origin version: uBlock Origin v1.14.16
@danny0838 danny0838 changed the title Custom RegExp rule not work A case that a custom RegExp rule doesn't work Nov 8, 2017
@gorhill gorhill added the invalid label Nov 8, 2017
@gorhill
Copy link
Owner

gorhill commented Nov 8, 2017

The page is not blocked by the RegExp rule

No blocker will block a whole page with such a filter. Use the document option to block a whole page.

@gorhill gorhill closed this as completed Nov 8, 2017
@danny0838
Copy link
Author

danny0838 commented Nov 8, 2017

I am confused. According to the documentation, strict blocking is default. For example, the rules example.com and http://example.com makes the page http://example.com blocked, without the document option. The documentation for RegExp rules doesn't mention that the default behavior is different to normal rules.

Additionally, using same test method, the rule /^https?://(?:\w+[.])?youtube[.]com/channel/UCwsAJAMyGad-kFJePW2vtZw(?=[/?#]|$)/ causes the page https://youtube.com/channel/UCwsAJAMyGad-kFJePW2vtZw blocked. If a RegExp rule doesn't block the whole page, then this rule is working wrong?

@gorhill
Copy link
Owner

gorhill commented Nov 8, 2017

and http://example.com makes the page http://example.com blocked, without the document option

Yes, there is a heuristic in place so that filters like those appearing in malware lists are wholly blocked, while filters which match beyond the hostname are not wholly blocked, to avoid tiresome false positives. The current heuristic is the result of many discussions about the original implementation which was to block as soon as there was a match. The document option just ignore the heuristic and block the whole page, so there is no danger of false positive when the whole-document-blocking is explicit.

@danny0838
Copy link
Author

danny0838 commented Nov 8, 2017

You mean that a rule blocking a domain defaults to strict blocking, while a rule that blocks a specific path and all RegExp rules doesn't? Is there a more comprehensive documentation about this?

And the issue of 3F that the RegExp rule /^https?://(?:\w+[.])?youtube[.]com/channel/UCwsAJAMyGad-kFJePW2vtZw(?=[/?#]|$)/ causing the page https://youtube.com/channel/UCwsAJAMyGad-kFJePW2vtZw blocked still seems to have no explanation...

@gorhill
Copy link
Owner

gorhill commented Nov 8, 2017

still seems to have no explanation

That is unexpected, I have to investigate.

@gorhill gorhill reopened this Nov 8, 2017
@gorhill gorhill removed the invalid label Nov 8, 2017
@gorhill
Copy link
Owner

gorhill commented Nov 8, 2017

Ok, the explanation for the second case is two-fold:

  1. There is an old regression that is still left in the code, false should be returned at that line.

  2. The regex is found not matching in 1) because the URL is converted to lowercase before being matched against the regex, however the regex is not converted to be case insensitive, while it should.

So these two things need to be fixed.

@gorhill gorhill closed this as completed in 3dcfc30 Nov 8, 2017
gorhill added a commit that referenced this issue Nov 9, 2017
@lewisje
Copy link

lewisje commented Nov 15, 2017

Why should the whole URL be converted to lowercase, rather than just the scheme and domain name? Most parts of a URL are case-sensitive.

@gorhill
Copy link
Owner

gorhill commented Nov 15, 2017

Because by default ABP's filter syntax is case-insensitive -- one needs to use match-case for case-sensitiveness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants