Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adblock2privoxy erros out on several recods #6

Closed
wmyrda opened this issue May 15, 2018 · 10 comments
Closed

adblock2privoxy erros out on several recods #6

wmyrda opened this issue May 15, 2018 · 10 comments

Comments

@wmyrda
Copy link

wmyrda commented May 15, 2018

Using https://easylist-downloads.adblockplus.org/easylistpolish.txt among others there is quite a few records on which adblock2privoxy errors out. Would it be possible to fix them?

ERROR: gadzetomania.pl#?#DIV:-abp-contains(REKLAMA) + DIV > A > DIV > IMG (easylistpolish.txt: 2774) - Record type detection failed
ERROR: gadzetomania.pl#?#DIV:-abp-contains(REKLAMA) + DIV > DIV > IFRAME (easylistpolish.txt: 2775) - Record type detection failed
ERROR: wp.pl#?#DIV + DIV + div:-abp-contains(REKLAMA) + DIV (easylistpolish.txt: 2877) - Record type detection failed
ERROR: wp.pl#?#aside:-abp-contains(Reklama) + DIV (easylistpolish.txt: 2878) - Record type detection failed
ERROR: wp.pl#?#aside:-abp-contains(Reklama) + IFRAME (easylistpolish.txt: 2879) - Record type detection failed
ERROR: wp.pl#?#div:-abp-contains(REKLAMA) + DIV > A (easylistpolish.txt: 2880) - Record type detection failed
ERROR: wp.pl#?#div:-abp-contains(REKLAMA) + DIV > DIV > A (easylistpolish.txt: 2881) - Record type detection failed
@essandess
Copy link
Owner

It's possible, but requires digging into the specific errors and isolating whether the issue is with these specific rules, or the adblock2privoxy rule parser.

@wmyrda
Copy link
Author

wmyrda commented May 15, 2018

Great. I would very much appreciate looking into it as it would make privoxy even more on par with adblock

@wmyrda
Copy link
Author

wmyrda commented May 16, 2018

I am not a programmer, but I took quick look at the issue and patterns seems fine if one uses adblock only without privoxy. Looking at the code a bit I also found that word "div" does not exists in the code anywhere, therefore is is logical adblock2privoxy errors out on them.

Looking at the basic pattern I found number of cases where that html code is used

# cat easylist.txt |grep -i "##div"|wc
   1062    2899   72306
# cat easylist.txt |grep -i " div "|wc
     38     260    2106

Among them there is for example fancystreems.com##body > div > a which does show up in other syntaxes, but I have failed to find it in resulting files

cat /etc/privoxy/ab2p.* |grep fancystreems.com
# ||fancystreems.com/300x2503.php (easylist.txt: 47100)
.fancystreems.com/300x2503\.php
# ||fancystreems.com/300x2503.php (easylist.txt: 47100)
.fancystreems.com/300x2503\.php
# ||fancystreems.com/300x2503.php (easylist.txt: 47100)
.fancystreems.com/300x2503\.php

but it seems it has nothing to do the one using div. Therefore it looks like new feature/s would have to be implemented in PatternConverter.hs. I hoping it would not be too much work...

@essandess
Copy link
Owner

essandess commented May 16, 2018

  • Please confirm: adblock2privoxy reports errors when processing these specific rules, but successfully completes on all other rules. I.e. good functionality except for some of the rules. Is this what you see?

  • I don’t have the cycles now to dig into refactoring a Haskell parser, but I can point you to some clues that might help isolate the issue.

  • The rules all involve CSS element hiding. Here are links to the Adblock and basic selector syntax:

I don’t see anything wrong with the basic syntax of those selectors. Also I didn’t write the parser, and don’t know its limitations, e.g. tree depth and the like.

The first clue might be reference to the rules with stuff like DIV:-abp-contains(REKLAMA). Where is this defined? Should adblock2privoxy be able to parse this?

@wmyrda
Copy link
Author

wmyrda commented May 16, 2018

fancystreems.com I did not know such a page existed but to use it as example from the https://easylist-downloads.adblockplus.org/easylist.txt it turns out that adblock2privoxy uses only the first record and simply keeps quiet about the rest without any errors in the ab2b.task log as it likely had known it would not be able to parse it.

||fancystreems.com/300x2503.php
fancystreems.com###bannerfloat2
fancystreems.com###floatLayer1
fancystreems.com###y
fancystreems.com###yst1
fancystreems.com##img[width="300"][height="150"]
fancystreems.com##body > div > a
fancystreems.com##img[width="300"][height="250"]

For sake of easier reading I removed all other websites referenced in this examples separated by comma

Hence no errors are there, but records such as those from wp.pl and gadzetomania.pl defined in the link from the first post do give errors than means adblock2privoxy actually tried to use one of the known to it syntaxes to parse it at which it simply failed. I'll try to see which one, but I have feeling that finding it out would not change anything as it is likely it was meant for something else and new parser would be required to import syntaxes that error-ed out.

Going back to fancystream at least last 2 rules are legitimate block records which web page code still shows. Adding parser for such cases would be welcomed addition

<a href="http://www.fancystreems.com/tvcat/newstv.php"><img src="https://i.imgur.com/SnQS4Gt.jpg" width="300" height="250"></a>
<a href="http://www.fancystreems.com"><img src="http://www.fancystreems.com/images/dot.gif" alt="fancystreems logo" width="152" height="95" border="0" id="logo_icon" title="fancystreems logo" class="logo"></a>

About the cycles. Please take your time. If it gets done by the end of summer I'll be happy :)

@wmyrda
Copy link
Author

wmyrda commented May 17, 2018

Allow me to correct myself. All those fancystreems.com rules do get created!
Hence those elements rely on the CSS functionality than appropriate file gets created in the CSS directory and in this case it is /com/fancystreems/ab2p.css
Just like all othe css files it inherits all element hiding rules which may not be all that to optimized and in few cases lead to hiding too much, but nevertheless it works and only those rules that error out are the only ones that we left to worry about.

@wmyrda
Copy link
Author

wmyrda commented May 17, 2018

Reading from https://adblockplus.org/filters#elemhide-emulation turns out -abp-* are features that are adblock specific therefore would have to be translated into privoxy language. I tried converting them manually into number of different schemes to check would it work, but none of them did. Included few more that I read from webpage content, but with no luck either.

So what -abp-contains translate into? One has to know it before trying to write correct the parser. After reading https://www.w3schools.com/cssref/css_selectors.asp I tried many different scenarios where for DIV + DIV + div:-abp-contains(REKLAMA) + DIV I used among others

div + div + div[title~="REKLAMA"] + div,
div + div + div[target="REKLAMA"] + div,
div + div + div[href*="REKLAMA"] + div,
div + div + div[id^="REKLAMA"] + div,
div + div + div[id$="REKLAMA"] + div,
div + div + div[id*="REKLAMA"] + div,

but none of them seems to work. Any ideas?

@wmyrda
Copy link
Author

wmyrda commented May 18, 2018

Checking the website's code with Inspector in Firefox it seems that code in web page is not all to complicated as simply looks as <div>REKLAMA</div> https://i.imgur.com/hWXfqmt.png
However searching the web for config examples to privoxy which could take care of this does not yield any results. Translation propositions into privoxy format would be very much welcomed.

@wmyrda
Copy link
Author

wmyrda commented Jun 19, 2018

Bad news. According to filter writers css file is not enough for all those contains() rules as it no longer used/allowed by css specification.

Good news. Some code may be borrowed from Adblockplus source to create .js script to hide those elements.

@essandess
Copy link
Owner

I do not see a path to incorporate these abp-specific element hiding rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants