Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions #1930

Closed
gorhill opened this issue Aug 23, 2016 · 99 comments

Comments

@gorhill
Copy link
Owner

@gorhill gorhill commented Aug 23, 2016

[Intentionally empty]

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 23, 2016

Regarding issue https://issues.adblockplus.org/ticket/2278:

@kzar, @ameshkov

Being able to have a token for regex-based filters would definitely help performance. However trying to programmatically extract a token from a regex-based filter sounds scary to me, too much risk of extracting erroneous tokens.

Suggestion: create a new filter option, token=[...], which filter creators can use to assign a predefined token to the filter. The creator of a filter is best placed to figure if and what token will work to store the filter internally.

For example, this filter in EasyList:

/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script

Could simply have been written by a filter creator:

/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script,token=filenuke
@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 23, 2016

Hey guys! I was thinking about solving this issue a while ago. Even tried to implement a simple token-extracting algorithm. I will post my ideas a bit later though.

Meanwhile, here is a list of known regexp rules:

/^(?![a-z]+\:\/+([^\/\:]+\.(il|com|net)|[\.0-9]+|([^\/\:\.]+\.)*(spot\.im|vine\.co|periscope\.tv|vid\.me|mako\.tools|minidom\.org|jquerymin\.org|logidea\.info|zoomanalytics\.co|firstimpression\.io))\.?([\/\:]|$))^[^\/\:\.]+\:\/+[^\/\:\.]/$third-party,domain=mako.co.ilEasyList Hebrewhttps://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google|icdn|auto|sport5|smartair|mysupermarket|blms|linicom)\.co\.il\.?([\/\:]|$))^[a-z]+\:\/+[^\/\:]+\.il\.?([\/\:]|$)/$third-party,domain=mako.co.ilEasyList Hebrewhttps://github.com/AdBlockPlusIsrael/EasyListHebrew
/^[a-z]+\:\/+[\.0-9]+([\/\:]|$)/$image,media,object,script,stylesheet,subdocument,third-party,domain=mako.co.ilEasyList Hebrewhttps://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(fbcdn|cloudfront|facebook|akamaihd|ctedgecdn|2mdn|uploaditnow|edgesuite|doubleclick|dmcdn|slideshare|advsnx)\.net\.?([\/\:]|$))^[a-z]+\:\/+[^\/\:]+\.net\.?([\/\:]|$)/$third-party,domain=mako.co.ilEasyList Hebrewhttps://github.com/AdBlockPlusIsrael/EasyListHebrew
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google|facebook|twitter|instagram|youtube|jquery|googleapis|vicomi|twimg|cdninstagram|pinterest|pinimg|giphy|playbuzz|outbrain|ytimg|amazonaws|cloudflare|gstatic|sniperm|dinovich|shortaudition|linkedin|opinionstage|vimeo|vimeocdn|dailymotion|flickr|staticflickr|tumblr|soundcloud|scribd|syteapi|addthis|addthisedge|reddit|disqus|disquscdn|apester|qmerce|taboola|taboolasyndication|google-analytics|googletagservices|googletagmanager|googleadservices|googlesyndication|h-cdn|scorecardresearch|serving-sys|bootstrapcdn|tiviclick|ruchlis|hotjar|flx1|mxpnl|themarker|adnxs|conduit|fourtips|makojs)\.com\.?([\/\:]|$))^[a-z]+\:\/+[^\/\:]+\.com\.?([\/\:]|$)/$third-party,domain=mako.co.ilEasyList Hebrewhttps://github.com/AdBlockPlusIsrael/EasyListHebrew
/quang%20cao/ABPVN Listhttp://abpvn.com/
/YanAds/ABPVN Listhttp://abpvn.com/
/www/images/ABPVN Listhttp://abpvn.com/
/ads-pic/Adblock-Persian listhttp://ideone.com/K452p
/eshop-eca/Adblock-Persian listhttp://ideone.com/K452p
/eshop98/Adblock-Persian listhttp://ideone.com/K452p
/402x192/Adblock-Persian listhttp://ideone.com/K452p
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cnChinaList+EasyListhttp://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.comChinaList+EasyListhttp://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.comChinaList+EasyListhttp://www.adtchrome.com/extension/adt-chinalist-easylist.html
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.euChinaList+EasyListhttp://www.adtchrome.com/extension/adt-chinalist-easylist.html
/NetInsight/text/$domain=~ads.pandora.tv|~opt.mgoon.comKorean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/omniture/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/NetInsight/html/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/cgi-bin/conad.fcgi/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/acecounter/$domain=~acecounter.comKorean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/adNdsoft/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/wisenut/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/ad-pay/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/wp-content/plugins/google-analyticator/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/realclick/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/max-banner-ads-pro/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/RealMedia/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/bannerManager/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/autoPage/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/overture/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/wiseAd/euckr/inc/$subdocumentKorean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/NetInsight/js/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/scrap_logs/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/banner_event/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/images/adpresso/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/AdBanner/Korean Adblock Listhttps://github.com/gfmaster/adblock-korea-contrib
/cdsbData_gal/bannerFile/$image,domain=mybogo.net|zipbogo.netList-KRhttps://list-kr.github.io/
/nad/media/List-KRhttps://list-kr.github.io/
/ajrotator/Filtros Nauscopicoshttp://nauscopio.nireblog.com/cat/filtrado
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!rodzinnawycieczka)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)[a-zA-Z0-9]{10,}\.wp.pl\//Adblock polskie regułyhttp://certyficate.it/polski-filtr-adblock/
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!facet)(?!wyleczto)(?!kuchnia)(?!film)(?!moto)(?!gwiazdy)(?!teleshow)(?!finanse)(?!kobieta)(?!dom)(?!pogoda)(?!tech)(?!historia)(?!czat)(?!ksiazki)(?!gryonline)(?!hotele)(?!narty)(?!samoloty)(?!wycieczki)(?!hosting)(?!irlandia)(?!multikurs)(?!casino)(?!foto)(?!tech)(?!www)(?!stg)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!alefolwark)(?!angielski)(?!arenamody)(?!beniamin)(?!bon)(?!bsg)(?!casino)(?!diety)(?!dlaprasy)(?!dlugi)(?!doladowania)(?!dom)(?!dysk)(?!ebiznes)(?!ebooki)(?!empire)(?!fantasyliga)(?!film)(?!fundusze)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)(?!gazetki)(?!gry)(?!horoskop)(?!kalendarz)(?!katalog)(?!khanwars)(?!komiks)(?!konflikty)(?!kontakty)(?!korsarze)(?!kultura)(?!mini)(?!mmho)(?!mobilna)(?!morizon)(?!moto)(?!muzyka)(?!narty)(?!naryby)(?!onas)(?!orangekinoletnie)(?!piraci)(?!poczta)(?!pomoc)(?!praca)(?!profil)(?!programtv)(?!pytamy)(?!rekrutacja)(?!rss)(?!rtvagd)(?!rycerzeiksiezniczki)(?!smeet)(?!speedwaymanager)(?!szkola)(?!szukaj)(?!tech)(?!teleshow)(?!triviador)(?!turystyka)(?!twojeip)(?!ulubiency)(?!warodfragons)(?!wycieczki)(?!zdrowie)(?!zoomumba)(?!topnews)(?!erotyka)(?!dzieci)(?!fitness)(?!gielda)(?!finansomat)(?!biznes)(?!sport)[a-zA-Z0-9]{4,9}\.wp.pl\//Adblock polskie regułyhttp://certyficate.it/polski-filtr-adblock/
/commoncfm/images/microsoftxboxone/$domain=buffed.de|gamesaktuell.de|gamezone.de|pcgames.de|videogameszone.deGerman filterhttp://adguard.com/filters.html#german
/[a-z0-9]{32,}/$third-party,domain=picshare.ruRussian filterhttp://adguard.com/filters.html#russian
/[a-zA-Z0-9]{35,}/$script,third-party,domain=bigtorrent.org|bigtorrents.ru|cashtube.ru|cmexota.ru|dreamprogs.net|dsvload.net|ecsebo.ru|enotbox.com|faspiic.ru|imagefile.org|imgpay.ru|kordonivkakino.net|mcdownloads.ru|mega-pic.org|odnopolchane.net|payforpic.ru|pic4cash.ru|pic4you.ru|picclick.ru|picforall.ru|pics-money.ru|pirat-pic.ru|planeta51.com|pronpic.org|prons.org|q32.ru|rustorrents.net|santikov.net|sharezones.biz|torrent-pirat.com|unionpeer.org|uraltrack.net|viewy.ru|xhamster-pic.comRussian filterhttp://adguard.com/filters.html#russian
/http:\/\/rustorka.com\/[a-z]+\.js/$domain=rustorka.comRussian filterhttp://adguard.com/filters.html#russian
/http:\/\/rustorka.com\/[a-z0-9]+\.(jpg|gif)/$image,domain=rustorka.comRussian filterhttp://adguard.com/filters.html#russian
/[a-zA-Z0-9]{35,}/$domain=anime-free.net|cyberpirate.me|imgbum.net|online-porno-hd.ru|tecnomectrani.comRussian filterhttp://adguard.com/filters.html#russian
/[a-z0-9]{30,}/$script,third-party,domain=free-torrent.org|free-torrents.orgRussian filterhttp://adguard.com/filters.html#russian
/^http://[a-z0-9_]{15,}\.[a-z0-9-]+\.[a-z]{2,}\/.*[a-zA-Z0-9]{100,}/$object-subrequest,domain=wat.tvListe FRhttp://adblock-listefr.com/
/^http://[a-z0-9_-]{10,}\.[a-z0-9-]+\.[a-z]{2,}\/.*?\w{30,}/$~xmlhttprequest,domain=gentside.com|maxisciences.com|ohmymag.comListe FRhttp://adblock-listefr.com/
/content/stargate/$domain=hlamer.ru|kadu.ru|krasview.ruRU AdListhttps://code.google.com/p/ruadlist/
/output/index/$third-party,scriptRU AdListhttps://code.google.com/p/ruadlist/
/https?://(?!(mc\.yandex\.ru|www\.google-analytics\.com)/)/$third-party,script,subdocument,domain=massivmebel.byRU AdListhttps://code.google.com/p/ruadlist/
/^https?://goodgame\.ru/[a-z0-9]+$/$subdocument,domain=goodgame.ruRU AdListhttps://code.google.com/p/ruadlist/
/wp-content/plugins/popup-maker/$domain=info-life.in.ua|intermarium.com.ua|paragraf.net.ua|unn24.com.ua|varota.com.uaRU AdListhttps://code.google.com/p/ruadlist/
/^https?://(?!static\.)([^.]+\.)+?fastpic\.ru[:/]/$script,domain=fastpic.ruRU AdListhttps://code.google.com/p/ruadlist/
/images/brandings/$image,domain=sc2tv.ruRU AdListhttps://code.google.com/p/ruadlist/
/default/vbanners/$domain=noi.mdRU AdListhttps://code.google.com/p/ruadlist/
/branding/$subdocument,domain=fanserials.tv|kino-filmi.netRU AdListhttps://code.google.com/p/ruadlist/
/serial_adv_files/$image,domain=xn--80aacbuczbw9a6a.xn--p1ai|куражбамбей.рфRU AdListhttps://code.google.com/p/ruadlist/
/^https?://(?!www\.)([^.]+\.)+?(kordonivkakino\.net|m(ac-torrent-download\.net|oviki\.ru))[:/]/$scriptRU AdListhttps://code.google.com/p/ruadlist/
/popupclick/$popupRU AdListhttps://code.google.com/p/ruadlist/
/http://[a-zA-Z0-9]+\.[a-z]+\/.*(?:[!"#$%&'()*+,:;<=>?@/\^_`{|}~-]).*[a-zA-Z0-9]+/$script,third-party,domain=keezmovies.com|redtube.com|tube8.com|tube8.es|tube8.fr|www.pornhub.com|youporn.comEasyListhttps://easylist.github.io/
/\/[0-9].*\-.*\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com|keezmovies.com|spankwire.com|tube8.com|tube8.es|tube8.frEasyListhttps://easylist.github.io/
/\.sharesix\.com/.*[a-zA-Z0-9]{4}/$scriptEasyListhttps://easylist.github.io/
/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$scriptEasyListhttps://easylist.github.io/
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cnEasyList Chinahttp://abpchina.org/forum/
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.comEasyList Chinahttp://abpchina.org/forum/
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.euEasyList Chinahttp://abpchina.org/forum/
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.comEasyList Chinahttp://abpchina.org/forum/
/ulightbox/$domain=hdkinomax.com|tvfru.netRU AdList: BitBlockhttps://code.google.com/p/ruadlist/
/http://cdn[0-9]\.spiegel\.de/images/image-([^-]+)-[^-]+-[^-]+-(?!\1)[^-]+\.jpg/$image,domain=spiegel.deEasyList Germanyhttps://easylist.github.io/
@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 23, 2016

Please note the number of rules which are mistakenly made regexp-type.

@kzar

This comment has been minimized.

Copy link

@kzar kzar commented Aug 23, 2016

@gorhill I've not been involved in that issue so far, so just done a quick bit of reading. I might get some things wrong.

While I agree that grabbing a keyword from the regexp seems scary, I'm not sure how the suggested token option would help. Take your filenuke example, there the automatic keyword would have been "filenuke" anyway.

Now if you think of a more advanced example which matches one of two possible domains, what would you put for the token option? If you chose to use parts of one of the domain as a keyword you'd end up not matching the other domain. Instead you'd have to omit the token option, which would end up as the same result as the automatic approach. (Since they mention that those kind of strings should be ignored.)

@kzar

This comment has been minimized.

Copy link

@kzar kzar commented Aug 23, 2016

(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 23, 2016

(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)

  1. This would be an overkill
  2. In order to do it they have restricted regular expressions support to a very limited subset.
@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 23, 2016

Take your filenuke example

Yes, bad example. Here is another one found in EasyList:

/\/[0-9].*\-.*\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com|keezmovies.com|spankwire.com|tube8.com|tube8.es|tube8.fr

Not sure if a token was available for this one -- whoever created the filter knows, but mainly my point is that token= option, would be an easy low-tech way available immediately (easy implementation) to deal with this, with no need for a regex parser (which would fail anyway with the filter here). If no token is present for untokenizable filter, then we just end up with the current behavior.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 23, 2016

Let's first think about what issue we are trying to solve.

First of all, domain-restricted filters are not a problem as there is no influence on the overall performance.

I suppose, that what we really need is to reduce the negative impact of the mistakes made by filters authors. For instance, the filters like /ajrotator/ and such. There is no problems with extracting a token from a rule like this.

Here is just a dirty example of a token extracting function:

var extractToken = function(ruleText) {

    // Get the regexp text
    var reText = ruleText.match(/\/(.*)\/(\$.*)?/)[1];

    var specialCharacter = "...";

    if (reText.indexOf('(?') >= 0 || reText.indexOf('(!?') >= 0) {
        // Do not mess with complex expressions which use lookahead
        return null;
    }

    // (Dirty) prepend specialCharacter for the following replace calls to work properly
    reText = specialCharacter + reText;

    // Strip all types of brackets
    reText = reText.replace(/[^\\]\(.*[^\\]\)/, specialCharacter);
    reText = reText.replace(/[^\\]\[.*[^\\]\]/, specialCharacter);
    reText = reText.replace(/[^\\]\{.*[^\\]\}/, specialCharacter); 

    // Strip some special characters
    reText = reText.replace(/[^\\]\\[a-zA-Z]/, specialCharacter); 

    // Split by special characters
    var parts = reText.split(/[\\^$*+?.()|[\]{}]/);
    var token = "";
    var iParts = parts.length;
    while (iParts--) {
        var part = parts[iParts];
        if (part.length > token.length) {
            token = part;
        }
    }

    return token;
};

I've tried this function with the rules above and here is the result:
https://ameshkov.github.io/web/regex-tokens.html?1

What for the token proposition, here are the downsides I see:

  1. It does not solve the issue with regex filters created by mistake.
  2. Complex rules which cannot be tokenized are rare. There are only 2 such filters in EasyList and both are domain-restricted.
  3. No backward compatibility, filters with unknown options will be ignored by old versions. Also, for instance, getadblock guys aren't invited to our party so it could be a surprise for them.
@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 23, 2016

getadblock guys aren't invited to our party

They are using ABP's filtering engine since AdBlock v3.0. See https://github.com/kzar/watchadblock/releases/tag/3.0.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 23, 2016

The other points still stand though:)

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 24, 2016

I wasn't aware of the many erroneous regex filters, looks like this can be easily addressed with a trivial code for these cases.

Mainly it was just to throw an idea out there, since these untokenizable filters have always bothered me[1], and I knew there was an issue like this opened on ABP issue tracker -- so I just threw the idea out there to have an easy fix, worth only if actually used by filter list maintainers.

Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.

[1] I was looking to even skip testing for domain hit -- but this is an implementation-dependent detail I suppose
[2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

[2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.

By the way, I'd like to raise a question about the non-standard syntax.

You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has(), :xpath(), :matches-css [1] and such.

The idea is really great and we will support some of these extended selectors as well (:has() and :contains() are currently in the beta testing stage, :matches-css() is coming).

However, there is one issue that bothers me. The syntax you use (pseudo-classes syntax) is not backward-compatible and it will break good old stylesheet-based ad blockers like Adguard and ABP.

/* browser will ignore the whole style due to the second selector */
#banner, #banner:has(.test) { display: none; }

I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.

Backward compatible synonym for :has(...) will be [-ext-has="..."]
Backward compatible synonym for :matches-css(...) will be [-ext-matches-css="..."]
Backward compatible synonym for :xpath(...) will be [-ext-xpath="..."]

[1] As I understand, there is a backward compatible :matches-css() option already: https://issues.adblockplus.org/ticket/2390

@kzar

This comment has been minimized.

Copy link

@kzar kzar commented Aug 24, 2016

You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has() ...

FWIW We are working towards adding the :has selectors too https://issues.adblockplus.org/ticket/3143

Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.

👍 Please do, I think collaboration benefits us all.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

@kzar so, what do you think about the backward compatible syntax proposition?

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

@kzar regarding Lain's comment:

I think it's worth mentioning that :has() selector must work in combination with -abp-properties. So, filter like site.name##.block:has([-abp-properties="background: yellow"])

Using proposed syntax it could look like this:
##.block[-ext-has="*:matches-css(background: yellow)"]

@kzar

This comment has been minimized.

Copy link

@kzar kzar commented Aug 24, 2016

@ameshkov Well I think the idea is that when browsers eventually support :has selectors those filters will be again using standard CSS selectors anyway. We only need to implement special logic for those filters in the mean time as a stop-gap. I guess it's true (and unfortunate) that the syntax will break filters for ad blockers which haven't added support for now, but I guess that's not too bad since uBlock, AdGuard and Adblock Plus all plan to support them. (Also because they are only planned to be something used as a last resort.)

As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties attribute.)

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

Well I think the idea is that when browsers eventually support :has selectors those filters will be again using standard CSS selectors anyway.

True. However, here is one more argument for that type of syntax. We all support a lot of different browsers (including mobile and such) and trying to use pseudo-classes syntax requires us to do it simultaneously for all the platforms. While backward-compatible syntax allows us to roll this feature out gradually.

As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties attribute.)

Yeah, I know, that's why I was surprised by the implementation proposed in the issue 3143.

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 24, 2016

I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.

I will support the backward-compatible syntax where possible, but personally, internally I prefer using the :() syntax. I see these new operators as nodes in a processing graph, and thus being able to easily and freely combine them I see this as a requirement for the future. Example[1]:

div.red:has(div.blue:matches-css(position: fixed;):contains(allo)):contains(publicité)

It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):

div.red[-ext-has="div.blue[-ext-matches-css=\"position: fixed;\"][-ext-contains=\"allo\"]"][-ext-contains="publicité"]

Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)') would throw).

[1] Ok, the example is contrived, but it's just to illustrate easily combining such filters.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):

Yeah, frankly, when I check something, I prefer to use the newer syntax as well.

However, it's not that bad, there's no need to support it inside of a composite filter.

Here, look at this example:

div.red[-ext-has="div.blue:matches-css(position: fixed):contains(allo):contains(publicité)"]
@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)') would throw).

Nope, in fact it was all of a sudden for us:) Also there's no way we could do it in desktop and mobile versions.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

@gorhill one more thing regarding the :matches-css(). I propose using a bit different syntax for it.

Could you please read this issue description and tell me what you think about it?
AdguardTeam/ExtendedCss#7

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 24, 2016

Q: Why additional pseudo-classes for matching before and after

I already support selector:after:style-properties(pattern), I just extract the :after before using the selector at setup time. But I would not mind selector:style-properties-before(pattern) -- it would just make the setup code a bit simpler.

Q: Why pattern-matching?

I agree with (optional) pattern matching. Pattern-matching is not something I implemented, but I don't see a problem supporting this. For the implementation side of such filter however, I would just want to be sure its semantic does not force a very specific implementation.[1]

I suppose that using this approach we could also cover existing abp-properties rules

Note that ABP's -abp-properties has been implemented with a very different semantic in mind than something like :matches-css: to reverse lookup CSS rules. Such filters shouldn't be used directly on a set of nodes for filtering purpose. The purpose of all the filters I have been adding lately are to reduce a set of nodes (starting with one as small as possible), so the suffix part is key, to start with the smallest set of nodes possible is key for performance.

For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px'], given that it has no suffix selector, would have to be tested for all elements on a page, which would just kill performance.


[1] I see using cssText as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText string is generated on the fly by the browser when "getted"; b) using cssText forces the use of a regex which will apply to a potentially large string.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

I already support selector:after:style-properties(pattern)

It may look pretty good, but it bothers me that :after in fact can't be part of a valid selector as pseudo-element cannot be selected. I suppose it could mislead a filter author.

[1] I see using cssText as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText string is generated on the fly by the browser when "getted"; b) using cssText forces the use of a regex which will apply to a potentially large string.

Yep, I've run into a number of issues while implementing it. For now I've used a cross-browser function for extracting the cssText string:
https://github.com/AdguardTeam/ExtendedCss/blob/feature/issues/7/lib/style-property-matcher.js#L96

Also I agree with you on the enumerated properties approach. There's no need in building the cssText field, I will change the current implementation.

For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px'],

Yeah, you're right. Also now when I know how this type of rules work, I find it a bit misleading. At least I think Lain_13 does not understand how it works.

@kzar what do you think about implementing something more "straightforward"?

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

I guess if we use the properties approach and agree on *-before/after postfix, there is no need for me to use another name for that pseudo class. matches-css, matches-css-before and matches-css-after sounds good and describes the filter behaviour very well.

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 24, 2016

matches-css, matches-css-before and matches-css-after sounds good and describes the filter behaviour very well.

I agreed with this. This new selector, combinable with :has() is going make filter list maintainers' life easier.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 24, 2016

I've updated the syntax description:
AdguardTeam/ExtendedCss#7

@gorhill gorhill changed the title Placeholder issue for discussion of issues in ABP's issue tracker -- and possible solutions Placeholder issue for discussion of issues in ABP/AdGuard issue tracker -- and possible solutions Aug 26, 2016
@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Aug 26, 2016

Looking into this specific case this morning: uBlockOrigin/uAssets#110.

This would be solvable without exception filters if it was possible to outright remove the targeted nodes from the DOM:

finanzen.net###bodyCenter > div[id]:has(:scope > #Ads_BA_Sky):remove()

The current implicit action to take on targeted nodes is to hide them. However, being to re-style has make the job of working against anti-blocker mechanisms much easier (AdGuard support this).

Additionally, being able to remove nodes from the DOM is something I have found would take care of many other cases as well (I do believe AdGuard support this in some ways, not sure). From my point of view, being forced to whitelist network requests from 3rd-party advertisers/trackers is always the worst option, and we should extend the capabilities of cosmetic filtering (element hiding) to avoid such whitelisting.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Aug 26, 2016

Oh, you have finally faced these german wunderwaffe-anti-adblock-solutions:)
I was impressed when I saw this particular script for the first time.

Currently the easiest way to circumvent it is to inject a script like this:

Object.defineProperty(window, `UABPtracked`, { get: function() { return true; }, set: function() {} })
@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Jun 8, 2017

@mapx-, @kzar

About https://issues.adblockplus.org/ticket/5291, what I don't see being done in the comment is:

  • Open the dev tools
  • Go to Memory tab
  • Click the trash icon (I do 2-3 times, with a 2-second delay in between).

Yes, I have seen ABP grow to past 340 MB, and even leaving the browser on idle did not cause the memory to be garbage-collected. However, after accomplishing the above steps, there was memory being garbage-collected, and the memory snapped back to expected levels (keeping in mind fragmentation, js engine internals, etc.).

Any report of high memory usage should always be done after the steps above, including the equivalent ones on Firefox.

@kzar

This comment has been minimized.

Copy link

@kzar kzar commented Jun 8, 2017

Thanks that's a good tip, I will give it a try. It's just weird (to me at least) that garbage collection is not happening automatically. We can hardly expect users to trigger it manually :(

@hemantgoyal

This comment has been minimized.

Copy link

@hemantgoyal hemantgoyal commented Jun 21, 2017

Why does the chrome store say it is corrupted now ? :(

@bershan2

This comment has been minimized.

Copy link
Contributor

@bershan2 bershan2 commented Jun 21, 2017

@hemantgoyal That is a Chrome hash function bug. See https://github.com/gorhill/uBlock/issues/2720 (already fixed with a bit of padding).

@gorhill

This comment has been minimized.

@mapx-

This comment has been minimized.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Nov 8, 2017

Hey guys, I've recently stumbled upon an interesting adblocking circumvention technique used by Yandex.

The thing is that they use shadow DOM to circumvent element hiding rules:
https://uploads.adguard.com/up04_5pkb0_Yandeks.png

The open root is used in the screenshot so we can get inside with a /deep/, and when a closed root is used, /deep/ cannot help us. Anyway, all the shadow piercing selectors are deprecated and will be eventually removed so we have a problem here.

Possible solution (rather ugly though):

  1. Support ::shadow pseudo-selector to pierce inside open roots.
  2. Override attachShadow and force all shadow roots to be open.

Have you already faced anything like that? Thoughts?

@mapx-

This comment has been minimized.

Copy link

@mapx- mapx- commented Nov 8, 2017

@ameshkov
I added you here: https://issues.adblockplus.org/ticket/5318
and
https://issues.adblockplus.org/ticket/5302

perhaps it's about the same yandex stuff

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Nov 8, 2017

@mapx- thank you! Yeah, it's been a while since they began their crusade and both issues are relevant.

What bothers me is that the "closed shadow root" approach seems to be a universal way to avoid elements hiding and even user stylesheets won't help us defeat it once Chrome stops supporting /deep/ and ::shadow.

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Nov 8, 2017

The /deep/ issue has been raised before. It's being deprecated as a valid CSS selector component in a CSS rule, but will still be valid as a CSS selector in querySelector[All] call. My understanding.

So currently, not an issue with Firefox I presume (does not support shadow stuff yet). An issue with Chrome, but can be worked around by manually hiding through querySelectorAll.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Nov 8, 2017

@gorhill

So currently, not an issue with Firefox I presume (does not support shadow stuff yet). An issue with Chrome, but can be worked around by manually hiding through querySelectorAll.

That's basically what I meant -- support either shadow or /deep/ "polyfills" just like we do with :has.

Good news is that /deep/ is able to pierce inside closed shadow roots so my the second point in my comment is redundant.

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Nov 9, 2017

just like we do with :has.

I just realized we can probably already just use :has for filter with /deep/?

example.com##div.container:has(/deep/ .aq)

Would this work now?

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Nov 9, 2017

I just realized we can probably already just use :has for filter with /deep/?

We don't yet support it (but we definitely will), but this is a partial solution anyway.

For instance, in Yandex case they shadow contains legit elements as well so we need something like example.org##div /deep/ span:has(.banner)

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Nov 13, 2017

@ameshkov

The "fastest" way to perform a dynamic script injection is to use the onCommitted listener

I'm experimenting with this and this works fine so far on both Chromium and Firefox. I see 10-20ms gain in how earlier the scriptlets are injected (using tabs.executeScript), though when I measure with the number of scripts already handled (document.scripts.length), I can't see gain so far for the little I have tested.

Anyway, I want to ask why did you chose to go through messaging to inject the scriptlets rather than injecting directly using tabs.executeScript?

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Nov 15, 2017

@gorhill

Anyway, I want to ask why did you chose to go through messaging to inject the scriptlets rather than injecting directly using tabs.executeScript?

As I recall, we compared both and didn't see any serious difference. Actually, in future updates, we'll migrate to tabs.insertCSS in light of the coming user-agent styles priority improvement.

okiehsch referenced this issue in uBlockOrigin/uAssets Dec 6, 2017
@mjethani

This comment has been minimized.

Copy link

@mjethani mjethani commented Feb 3, 2018

@ameshkov regarding Shadow DOM, we have user style sheets in Chromium now (works on Canary) along with the cssOrigin option to tabs.insertCSS. I'm also working on allowing extensions to access closed shadow roots. We should be in a good place soon.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Feb 4, 2018

Hi @mjethani! I'm actually keeping an eye on all the pull requests you're pushing to Chromium, and you're doing a great job, thank you!

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Mar 4, 2018

Hey guys, coming at you with a new modifier idea:
AdguardTeam/AdguardBrowserExtension#961

I suppose it can benefit all the privacy-oriented subscriptions so we're planning to implement it in one of the future updates.

@gorhill

This comment has been minimized.

Copy link
Owner Author

@gorhill gorhill commented Mar 5, 2018

@ameshkov I don't see that much privacy value in dealing with cookies alone given that data can be stored in other local storage such as localStorage, indexedDB, etc.Blocking 3rd-party cookies in browser settings should be the first step for any privacy conscious person -- this also takes care of all local storages.

@ameshkov

This comment has been minimized.

Copy link

@ameshkov ameshkov commented Mar 5, 2018

Extending this modifier to handle localStorage sounds useful indeed, and it can be done without changing the modifier syntax.

However, I've never seen indexedDB used for tracking purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.