Skip to content
This repository has been archived by the owner on Nov 15, 2017. It is now read-only.

Ad blocking differs from Adblock with the same rules #260

Closed
ghost opened this issue May 10, 2014 · 45 comments
Closed

Ad blocking differs from Adblock with the same rules #260

ghost opened this issue May 10, 2014 · 45 comments

Comments

@ghost
Copy link

ghost commented May 10, 2014

Today I stumpled upon this site:

http://www.heise.de/newsticker/meldung/Firefox-beerdigt-Plaene-fuer-Werbung-auf-Tab-Seite-2186990.html

In HTTPS Adblock complex rules are enabled. If I load above site with the Adblock extension disabled, I see several ads ("Anzeige") in the right column.

Some of the links of those ads are:

http://pubads.g.doubleclick.net/gampad/clk?id=28695830&iu=/6514/www.heise.de/clicktracking/usAd
http://pubads.g.doubleclick.net/gampad/clk?id=28685510&iu=/6514/www.heise.de/clicktracking/textlink
http://pubads.g.doubleclick.net/gampad/clk?id=28706750&iu=/6514/www.heise.de/clicktracking/textlink

I don't see those ads if Adblock is enabled. The funny thing is that both HTTPSB and Adblock seem to apply the same blocking rules.

Those are the ones used by Adblock:

http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=120x600,120x800,160x600,160x8[...] skript  /ad-emea.   
http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=300x200;kw=Browser,Firefox,Mo[...] skript  /ad-emea.   
http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=300x201;kw=Browser,Firefox,Mo[...] skript  /ad-emea.   
http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=300x250,336x280;kw=Browser,Fi[...] skript  /ad-emea.   
http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=336x200;kw=Browser,Firefox,Mo[...] skript  /ad-emea.   
http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=500x500;kw=Browser,Firefox,Mo[...] skript  /ad-emea.   
http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=728x90,468x60;kw=Browser,Fire[...] skript  /ad-emea.   
http://heise.ivwbox.de/cgi-bin/ivw/CP/ix_news;%2Fix%2Fnews%2F2186990?[...]          grafik  /cgi-bin/ivw/*  
http://www.heise.de/js/plugins/socialshareprivacy/lang/de.lang                  xmlhttprequest  /socialshareprivacy/*   
http://www.heise.de/js/plugins/socialshareprivacy/socialshareprivacy.css            css /socialshareprivacy/*   
http://heise.ivwbox.de/2004/01/survey.js                            skript  ||ivwbox.de^$third-party    
http://heise.met.vgwort.de/na/4dedaabe653c4fbdbdca1dbd6fb004ec                  grafik  ||met.vgwort.de^

And those are the requests blocked by HTTPSB (according to its statistics tab):

script  <a>     http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=728x90,468x60;kw=Browser,Firefox,Mozilla,Werbung;tile=6;ord=1790385129?
script  <a>     http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=120x600,120x800,160x600,160x800;kw=Browser,Firefox,Mozilla,Werbung;tile=7;ord=1790385129?
script  <a>     http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=300x200;kw=Browser,Firefox,Mozilla,Werbung;tile=4;ord=1790385129?
script  <a>     http://heise.ivwbox.de/2004/01/survey.js
script  <a>  http://www.heise.de/js/ho/webtrekk-v3-bundle-heise-2013-01-21.js
script  <a>     http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=300x201;kw=Browser,Firefox,Mozilla,Werbung;tile=5;ord=1790385129?
script  <a>     http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=300x250,336x280;kw=Browser,Firefox,Mozilla,Werbung;tile=2;ord=1790385129?
script  <a>     http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=336x200;kw=Browser,Firefox,Mozilla,Werbung;tile=3;ord=1790385129?
script  <a>     http://ad-emea.doubleclick.net/N6514/adj/ix/ix-inhalt;sz=500x500;kw=Browser,Firefox,Mozilla,Werbung;tile=1;ord=1790385129?
xhr <a>  http://www.heise.de/js/plugins/socialshareprivacy/lang/de.lang
css <a>  http://www.heise.de/js/plugins/socialshareprivacy/socialshareprivacy.css
image   <a>     http://heise.met.vgwort.de/na/4dedaabe653c4fbdbdca1dbd6fb004ec
image   <a>     http://heise.ivwbox.de/cgi-bin/ivw/CP/ix_news;%2Fix%2Fnews%2F2186990?r=http%3A%2F%2Fwww.heise.de%2Fnewsticker%2F&d=54397.86098431796

Shouldn't the site look identical if both extensions block the same requests? I don't see Element Hiding Rules applied by Adblock.

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

Never mind, I see that it did block some ads, just not all. Investigating. Which version of HTTPSB do you have? I fixed a bug last night about ABP filters not being enforced. The fix is in 0.9.1.2.

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

Never mind, see findings below. Ok, to reproduce I need the rules in your matrix for this site. Can you paste the recipe here (using the new export recipe feature in the popup)? Also, I am guessing you have the Germany ABP filters enabled?

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

Ok, regarding http://pubads.g.doubleclick.net/gampad/clk?id=28695830&iu=/6514/www.heise.de/clicktracking/usAd, this is not a downloaded resource, this is a link (the href attribute of a <a> tag).

The resource of the ad itself is http://2.f.ix.de/imgs/02/1/2/1/6/1/1/6/usAd_heise-8cba3919baa5fe77.jpg. Now I have to investigate which ABP filter is used to block this one. I suspect this must be a filter which has unsupported options.

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

These are hidden elements. Using the real ABP, I see the image being downloaded, but not being shown. So since HTTPSB doesn't support element hiding, this is the expected behavior.

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

This is what my tool tells me for ABP vs HTTPSB.

ABP:

host           scr img
www.heise.de    12  13  176,093
script.ioam.de       1    5,494
1.f.ix.de            1    1,455
2.f.ix.de            5   68,133
3.f.ix.de            5   49,508

HTTPSB:

host           scr img
www.heise.de    11  18  172,314
1.f.ix.de            1    1,337
2.f.ix.de            5   67,543
3.f.ix.de            5   48,918

So the images you see on the right are coming from *.f.ix.de, ABP hides these.

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

I believe that would be the ABP filter that hides the images you see in HTTPSB:

##a[href^="http://pubads.g.doubleclick.net/"]

In EasyList.

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

Try this list in ABP, https://easylist-downloads.adblockplus.org/easylist_noelemhide.txt, which is the one really used by HTTPSB, and the ad will not be hidden by ABP.

@ghost
Copy link
Author

ghost commented May 11, 2014

You are right. There were Element Hiding Rules in Easylist-Germany. So it's rather a bug in Adblock that it didn't show that such rules were triggered.

@gorhill gorhill reopened this May 18, 2014
@gorhill
Copy link
Owner

gorhill commented May 18, 2014

Given that changed my mind yet another time, this one needs reopening, to track the progress of currently unsupported link-based cosmetic filters. (I went with the most common filters first -- next are the link-based ones).

@gorhill gorhill removed the not a bug label May 19, 2014
@gorhill
Copy link
Owner

gorhill commented May 19, 2014

Related: #276

@gorhill
Copy link
Owner

gorhill commented May 19, 2014

Actually for this one, a better solution is to use a custom filter like:

heise.de##.us_ad

Maybe it could be proposed for inclusion on ABP forum?

@gorhill
Copy link
Owner

gorhill commented May 21, 2014

As far as I can tell, 93725cb fixes the issue.

@gorhill gorhill closed this as completed May 21, 2014
@ghost
Copy link
Author

ghost commented May 23, 2014

Raymond, I've found one example where adblocking still differs between Adblock and HTTPSB. If I do a search in

https://startpage.com

I can see ads at the top of the result page with HTTPSB but none if Adblock is enabled. In both extensions the only filter reported is

?adtype=

So I guess that an Element Hiding Filter in Adblock is triggered which is not reported (as in the example above). If so - shouldn't it also work in HTTPSB?

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

I don't see the ad. I disabled cosmetic filtering, and it appeared. Enabled cosmetic filtering, and it disappeared. Looking up the ad, I see #sponsored. Is it the same for you?

To help you find it, the ad comes not long after this HTML comment: <!-- Bookmark - Div : End -->

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

Ok found the filter in EasyList, it is:

7search.com,espn.go.com,filenewz.com,general-files.com,independent.ie,internetretailer.com,ixquick.co.uk,ixquick.com,nickjr.com,rewind949.com,slickdeals.net,startpage.com,webhostingtalk.com,yahoo.com###sponsored

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

I need to reproduce this, I don't know where to start. This is what I get on my side, notice the display: none applied to the #sponsored element by HTTPSB:

to-forum

@ghost
Copy link
Author

ghost commented May 23, 2014

I don't see the ad. I disabled cosmetic filtering, and it appeared. Enabled cosmetic filtering, and it disappeared. Looking up the ad, I see #sponsored. Is it the same for you?

I also searched for "tree" and here's how it looks for me ("Anzeige" means ad):
startpage

Cosmetic filters are enabled.

BTW, I've noticed that it seems to depend on your search items if ads are displayed. For example, I searched for

arch reflector options

and ads were shown. I changed the search items to

arch linux reflector options

and no ads were shown. So it might be necessary for you to try different search items in order to reproduce.

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

I see the <div id="sponsored"> element in your screenshot. Can you select it and see what is its effective rule for the CSS display property?

@ghost
Copy link
Author

ghost commented May 23, 2014

I'm not sure what you exactly need. Here are 2 screenshots:

startpage1
startpage2

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

First screenshot has the information I need. So apparently the rules injected by HTTPSB (if they were) aren't seen by the element. Now the key for me is to reproduce and then from this point I can investigate why the injected CSS rules are not seen.

@gorhill gorhill reopened this May 23, 2014
@my-password-is-password
Copy link
Contributor

@tlu1024 Do you only have easylistgermany.txt checked? The german easylist has nothing for startpage.com. Its in the regular easylist.txt. Try either checking easylist.txt or block

startpage.com###inlinetable
startpage.com###sponsored

in "Your block rules". Those are 2 filters in easylist.txt for startpage.com.

@ghost
Copy link
Author

ghost commented May 23, 2014

Do you only have easylistgermany.txt checked?

No, I have both lists checked. Nevertheless I added both rules to the block rules - but no improvement ...

@my-password-is-password
Copy link
Contributor

Weird, I see the ad for half a second then it disappears every time.

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

There is a mistake in my injected rules:

var hideStyleText = '{{hideSelectors}} {display:none; !important}'
    .replace('{{hideSelectors}}', selectors.hide.join(','));

The !important keyword needs to appear before the ;. Still, it does appear to work on my side. But this means the !important specifity keyword is not taken into account, which maybe is a problem if a more specific rule than the ones injected by HTTPSB exists.

There are various ways I can apply the rules. Actually, I want to try to not inject a style, but rather change the style of the element directly through the DOM, so this way no need to create an expensive (I believe) style element. This way of doing is of course possible given that only the CSS selectors which matter to a page are used.

@my-password-is-password
Copy link
Contributor

Now I get ads when I check assets/thirdparties/hosts-file.net/hosts.txt with easylist.

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

I see, the cosmetic filter parser chokes with that file, there is a line ######### in there which the parser doesn't handle well it seems.

@tlu1024 you have this list enabled?

@ghost
Copy link
Author

ghost commented May 23, 2014

Yes, I have! I just disabled it and searched again for "tree" - and got the same result like my-password-is-password : There's an ad that disappears after a second or so.

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

Ok so its the parser. Still I will see what can be done at the same time for the second-long delay, it can be annoying, but ultimately, all is asynchronous so it's impossible to guarantee 100% that the ads will never appear even a fraction of second, I can try to experiment to minimize though.

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

By the way, thanks to @my-password-is-password, I would still be spending time trying to figure what is wrong, who knows for how long.

@my-password-is-password
Copy link
Contributor

No problem.

About the second-long delay, if you observe the DOM at document_start and as its being insterted can you remove it right away and have it not show at all? Or is observing at document_start slow?

@ghost
Copy link
Author

ghost commented May 23, 2014

Ok so its the parser. Still I will see what can be done at the same time for the second-long delay, it can be annoying, but ultimately, all is asynchronous so it's impossible to guarantee 100% that the ads will never appear even a fraction of second, I can try to experiment to minimize though.

I can live with that - don't devote too much time into it ;-)

Thanks to @my-password-is-password also from my side!

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

@my-password-is-password

In the case of "document_start", the files are injected after any files from css, but before any other DOM is constructed or any other script is run.

So at document_start, there is nothing in the DOM, so I would have to wait (because HTTPSB implementation relies on querying the content of the page to minimize CSS rules), hence I start to look at document_end, this way I don't have to test whether it's there or not.

@ghost
Copy link
Author

ghost commented May 23, 2014

// Any sequence of # longer than two means the line is not a valid
// cosmetic filter.

Raymond, are your sure that's correct? In Eayslist are many rules with 3 #.

@my-password-is-password
Copy link
Contributor

@gorhill

I uploaded an old version HTTPSB that I messed around with mutation observer long time ago. I removed everything from contentscript.js leaving just the observer. To test it go to startpage.com and search for anything. It hides the div with id 'sponsored' at document_start. But the page seems to load a little slow this way.

https://github.com/my-password-is-password/stuff/blob/master/httpswitchboard_0.7.9.2_MutationObserver_document_start.zip

@gorhill
Copy link
Owner

gorhill commented May 23, 2014

In your version there is no message detailing the content of the page sent to the background page, and no waiting for the answer from the background page. This is the way cosmetic filters are implemented in HTTPSB, the other alternative without messaging is to inject everything indiscriminately (the ABP way). I can't do it this way, it's abusive to the browser.

Now I instrumented the number of mutations, and it's just not an option to send a message for each mutation events, it was like a firehose of events, and now imagine that each of these result in a message sent call and corresponding handling of the received answer, the overhead is going to be over the top. And that is for a simple page.

A real proof of concept needs to take into consideration the reality, which is to support over 20K filters in EasyList. It gets worst quick with Fanboy annoyance, etc. Your version is hardcoded to find a div which id is sponsored. It just is not a solution to the real problem.

I have in mind something else I want to experiment, which I will detail in a TODO issue.

@gorhill
Copy link
Owner

gorhill commented May 24, 2014

@my-password-is-password

Thinking more about this, there could be a mixed approach: I could query the selectors matching a specific domain name at document_start as the domain of the page is information available before the DOM load, and for the generic selectors, they would be taken care at document_end. In the case of startpage.com that might help given that the ads are selectors specific to startpahe.com domain. There is still the roundtrip between the content script and the extension, be I expect there would be an improvement.

@gorhill
Copy link
Owner

gorhill commented May 24, 2014

Fetching and applying domain-based selectors after document_start works way better than I though -- I thought the roundtrip would take longer. I don't see the ad at all on startpage. The issue will still be with generic selectors, which still apply after document_end, but that is a definitive improvement, and as a bonus, this makes the over code more efficient as before the domain-based selectors were always returned qith each query for generic selectors, now they are returned only once.

@my-password-is-password
Copy link
Contributor

Startpage has been the only site that I can tell that a cosmetic filter is being used so your improvement is probably all you need.

@noblehng
Copy link
Contributor

Would you make a option to only use domain-based selectors? Since from a security and privacy point of view, generic element hiding rules are not that useful like generic filter rules, someone like me might only want element hiding rules in some regularly visit sites.

Or even better, make the parsed domain-based selectors list be scoped when there is a corresponding site in scoped rules of HTTPSB, and can be encoded as a recipe alongside with scoped rules. This way we can have a small "static" (only need a domain/site name lookup which must be done for scoped rules anyway) element hiding stylesheet for those sites. For generic sites, one could still choose to use generic selectors after scope lookup.

For those generic selectors that might be useful for scoped filter sites, one could let user select whether add those selectors to scoped rules when load a page from those sites. Maybe add a selector list in the "Scoped rules" tag for easy management.

@ghost
Copy link
Author

ghost commented May 24, 2014

Just another site where this happens. So startpage.com is not the only example.

It's http://www.cboe.com/data/mktstat.aspx . With assets/thirdparties/hosts-file.net/hosts.txt enabled you'll see that window in the picture at the bottom of the page. With assets/thirdparties/hosts-file.net/hosts.txt disabled it's not visible.

cboe

@gorhill
Copy link
Owner

gorhill commented May 24, 2014

Just another site where this happens

I can confirm the fix works well on my side (I see you are using Fanboy).

I hope to upload a new revision today (currently code reviewing). I am becoming more fearful of updating, the user base has kind of grown a lot last week (because of this), and my worst fear is to break important functionality in the extension because I overlooked a bug.

@ghost
Copy link
Author

ghost commented May 24, 2014

the user base has kind of grown a lot last week

Congratulations - great news!

Regarding bugs - may I repeat my question asked above? There are many filters in Easylist like

###AdAuth4
###AdBanner
###AdBannerSmallContainer
###AdBanner_F1
###AdBanner_S
###AdBar
###AdBar1
###AdBigBox

So, shouldn't the patch rather say:
// Any sequence of # longer than three means the line is not a valid
// cosmetic filter.

Perhaps I'm misunderstanding something ..

@gorhill
Copy link
Owner

gorhill commented May 24, 2014

// Any sequence of # longer than three means the line is not a valid

Sorry I hadn't noticed your comment. Yes my comment in the code is wrong, it should be "any sequence longer than one", since at this point I am testing the selector only.

@gorhill
Copy link
Owner

gorhill commented May 24, 2014

Any new issue similar to this one will have to go into a new bug, as just like the last one, it will likely be different code bug.

@gorhill
Copy link
Owner

gorhill commented May 29, 2014

@noblehng Sorry, I saw your comment but forgot to come back to it.

Would you make a option to only use domain-based selectors

I think this is something which ABP should do, so all users of these files would benefit from the finer granularity. I am not too fond of adding yet another checkbox, which requires to explain what it does etc.

However I could definitely add EasyList without element hiding filters as a choice, so that if you check "Parse and enforce element hiding filters" you would not suffer the memory footprint of the thousands of cosmetic filters in the main EasyList file, just those in whatever regional list you are using.

Or even better, make the parsed domain-based selectors list be scoped when there is a corresponding site in scoped rules of HTTPSB

Not sure I understand this part. When visiting a site, only the cosmetic filters which are meaningful to the sites are used anyways (this is true for generic filters too, as opposed to ABP).

So if on www.example.com, only cosmetic filters which are specific to example.com are going to be used on the page. For generic filters, HTTPSB analyzes the page ("analyzes" as in "querySelectorAll"...) so as to inject only the filters which will have an effect on the page. That's the significant improvement over ABP which they refuse to acknowledge.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants