Skip to content
This repository has been archived by the owner on May 11, 2021. It is now read-only.

Can hostname regex pattern in a rule become more flexible (powerful)? #13

Closed
stewie opened this issue Oct 22, 2014 · 10 comments
Closed

Comments

@stewie
Copy link

stewie commented Oct 22, 2014

Currently, how would I block (for example) all US hostnames?
/*.us/ is not an acceptable input
/[.]us$/ is not acceptable

Perhaps leading wildcard patterns are impossible to hash, so should not be supported.
But let's please consider another example:

How would I, via a single rule, block all webtrendslive.DOMAIN.TLD ?
/^webtrendslive.*/ is not an acceptable hostname input

As of this patch:
ed3e522
we are able to supply a (one) IP address.
Ability to whitelist (for instance) 192.168.* would be quite useful
as would ability to block an entire malware-prevalent subnet (invented example) 69.228.*

Regardless what wildcards might be supported, I'm wishing for availability of a rule which would block (or raise an infobar notice) when an href or xmlhttp requests a numeric IP address, vs a DNS hostname (often practiced by hit-n-run malware distributors).

@stewie
Copy link
Author

stewie commented Oct 22, 2014

Trying to be helpful, I checked whether NoScript extension handles partial IP address (subnet) patterns

http://noscript.net/features
"
Subnet matching - an address with a partial numeric IPv4 IP will match all the subnet. You must specify at least the 2 leftmost bytes, e.g. 192.168 or 10.0.0.
"

Also, on the same page, regarding handling of patterns containing asterksks:
"
Since protocol specification is mandatory [in noscript patterns], regular subdomain matching with rightmost components comparison couldn't work for multiple subdomain. You can specify subdomain matching patterns using an asterisk in place of the leftmost domain component: for instance, you need to match all the subdomains of acme.org for all ports with the HTTPS protocol, you can whitelist https://*.acme.org:0. This is the ONLY situation where asterisk is considered a wildcard.
"

@stewie
Copy link
Author

stewie commented Oct 22, 2014

To be clear, I'm not suggesting the validation needs to consider whether a pattern "is a valid netrange (subnet)". The request URI has already been split, and here the code is just examining the "hostname" portion of it.

I'm just hoping the parsing can be robust enough to accommodate
patterns containing beginning-of-line anchors
/^88./
or even just
/^88/
and patterns containing end-of-line anchors, like
/.us$/
or
/us$/
or
/[.]us$/

"or", as in, you tell me ~~ I'll gladly type whatever escape chars are necessary

@futpib
Copy link
Owner

futpib commented Oct 22, 2014

So we have that ruleset-language for complex stuff like regex (which I didn't document well, my bad). That commit you referenced is merely an input validation fix for what is called 'user' rulesets in code (user_persistent and user_temporary). Those are made simplistic for UI to be able to handle them easier.

Here is an example ruleset for you. One could save it in a text file 'file.ruleset' and install on preferences.xul#rulesets-manager, but trying to do this right now, i discovered a bug, so... This should work once fixed. Writing example anyway

magic: policeman_ruleset
version: 0.1
id: "my_nasty_filtering"

l10n:
  en-US:
    name: "Nasty"
    description: ""

rules:
  # The syntax is indentation-based, 2-spaces, it is very strict about this
  # The predicates are tried top-bottom left-right,
  # first one to make it to REJECT or ACCEPT wins.

  # this specifies type of uri scheme
  # currently there are 'inline', 'internal', 'web' and 'file' types
  # (see lib/request-info.coffee for exact meaning)
  # * and empty string match anything, bare string 'web' matches only web schemes
  * -> web:
    * -> /*.us$/: REJECT
    * -> * ".us": REJECT # Exactly same effect, but executed as str.endsWith()
  web -> web:
    "192.168." * -> *: ACCEPT # this will also match 192.168.example.com
    /192\.168\.[0-9]+\.[0-9]+/ -> *: ACCEPT # more strict

    # You can specify uri component of interest (overriding default of [host])
    # https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIURI#Components_of_a_URI
    [prePath] "https://" * ".acme.org:0" ->: ACCEPT

Check out /src/defaults/rulesets/ for more examples.

@futpib
Copy link
Owner

futpib commented Oct 22, 2014

Fixed installing rulesets in c066480

@stewie
Copy link
Author

stewie commented Oct 22, 2014

The functionality you've achieved in policeman is wonderful! Thanks for explaining.

@futpib futpib closed this as completed Oct 24, 2014
@somini
Copy link

somini commented Nov 5, 2014

What's wrong with this ruleset? I can't install it.
https://gist.github.com/somini/0a9e0406f03d5f14363a/8550ae5dad956e31eafddcef70d0aad39770d6c6

@futpib
Copy link
Owner

futpib commented Nov 5, 2014

@somini Looks like something is wrong with the parser. #48 #49

@dxdragon
Copy link

dxdragon commented Nov 4, 2015

@futpib Like this rule as follow
web -> web:
*.example.com -> /\d+.\d+.\d+.\d+/: ACCEPT
Then how to limit the content type, e.g. objects? I try it but it doesn't work.

@futpib
Copy link
Owner

futpib commented Nov 4, 2015

@dxdragon

rules:
  web -> web:
    *.example.com -> /\d+.\d+.\d+.\d+/:
      [contentType] OBJECT: ACCEPT

Another example

@dxdragon
Copy link

dxdragon commented Nov 4, 2015

@futpib Yeah, I do write the rule referring to that example before, the ruleset format is correct but it doesn't work, so does above code you provided. I test other content types (e.g. image, script, and stylesheet), they work well except object type. I think it's a bug. and then I test it with the ruleset 'allow_objects_anywhere' (move it to the top position), the result is that it still doesn't work...

rules:
web -> web:
[contentType] OBJECT: ACCEPT

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants