Skip to content

Commit

Permalink
Add a DOM blockers entropy source
Browse files Browse the repository at this point in the history
Squashed commit of the following:

commit b4b03d5
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Thu Apr 1 14:03:08 2021 +1000

    Actualize the code

commit d07d07b
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Tue Dec 29 09:47:48 2020 +1000

    Add a note about component instability

commit bdadff6
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Sat Dec 26 13:04:08 2020 +1000

    Remove the code duplication

commit 4437bbc
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Sat Dec 26 10:46:55 2020 +1000

    Add tests for the entropy source

commit fbf3f50
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Fri Dec 25 21:57:10 2020 +1000

    Enable the entropy source on Android too

commit ce185b3
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Fri Dec 25 15:01:41 2020 +1000

    Make a guide for maintaining the list of DOM blocking filters

commit 0f2a0aa
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Thu Dec 24 20:32:23 2020 +1000

    Fix the scripts, make a production list of filters to detect

commit 0206871
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Wed Dec 23 21:47:21 2020 +1000

    Make a script to get unique selectors from lists of really blocked selectors;

    Make a filters list for the entropy source basing on several filters (more will be added later);
    Rename the source to `domBlockers` because other blockers (e.g. HTTP requests blockers) will be separate sources;

commit c99dce6
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Wed Dec 23 17:27:39 2020 +1000

    Make a script to get blocked selectors from browser

commit 729724c
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Tue Dec 22 21:13:40 2020 +1000

    Modify the script to leave only suitable selectors

commit 8b70628
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Tue Dec 22 14:56:57 2020 +1000

    Make a helper scripts to find unique blocked selectors

commit bc87857
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Mon Dec 21 21:41:12 2020 +1000

    Detect rule lists instead of individual rules

    Also add tests to check the blockers list correctness

commit 335f569
Author: Surgie Finesse <finesserus@gmail.com>
Date:   Mon Dec 21 19:04:18 2020 +1000

    Add a PoC of content blockers component
  • Loading branch information
Finesse committed Apr 1, 2021
1 parent d7b018d commit 387eb83
Show file tree
Hide file tree
Showing 20 changed files with 1,040 additions and 5 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
# User input
/resources/content_blocking/blocked_selectors/
!/resources/content_blocking/blocked_selectors/.gitkeep
/resources/content_blocking/filters/
!/resources/content_blocking/filters/.gitkeep

# Project artifacts
/dist/
/node_modules/
/playground/dist/
/resources/content_blocking/selectors_tester.html
/resources/content_blocking/unique_filter_selectors.json

# Unwanted BrowserStack logs: https://github.com/karma-runner/karma-browserstack-launcher/issues/181
/browserstack.err
Expand Down
164 changes: 164 additions & 0 deletions docs/content_blockers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Content blockers

The page shows how to maintain content blockers entropy sources.

The entropy sources work only in Safari and on all Android browsers
because other browsers disable extensions in incognito mode and therefore the sources would be unstable.

## List of filters

Filter is a list of rules that tell browser what to block.
Filters are written using a common standard: [AdBlock Plus syntax](https://help.eyeo.com/en/adblockplus/how-to-write-filters).
Most ad blockers use this syntax, so the filters are universal.

Here are the filters that we consider (the most popular filters):

- [AdGuard](https://kb.adguard.com/en/general/adguard-ad-filters#adguard-filters)
- AdGuard Base filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_2_English/filter.txt
- AdGuard Mobile Ads filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_11_Mobile/filter.txt
- AdGuard Tracking Protection filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_3_Spyware/filter.txt
- AdGuard Social Media filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_4_Social/filter.txt
- AdGuard Annoyances filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_14_Annoyances/filter.txt
- AdGuard Russian filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_1_Russian/filter.txt
- AdGuard Chinese filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_224_Chinese/filter.txt
- AdGuard German filter (included in EasyList Germany): https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_6_German/filter.txt
- AdGuard Dutch filter (same selectors as in EasyList Dutch): https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_8_Dutch/filter.txt
- AdGuard French filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_16_French/filter.txt
- AdGuard Japanese filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_7_Japanese/filter.txt
- AdGuard Spanish/Portuguese filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_9_Spanish/filter.txt
- AdGuard Turkish filter: https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_13_Turkish/filter.txt
- [EasyList](https://easylist.to)
- EasyList: https://easylist.to/easylist/easylist.txt
- EasyPrivacy (no blocked selectors): https://easylist.to/easylist/easyprivacy.txt
- EasyList Cookie List: https://secure.fanboy.co.nz/fanboy-cookiemonster.txt
- EasyList Germany: https://easylist.to/easylistgermany/easylistgermany.txt
- EasyList Italy: https://easylist-downloads.adblockplus.org/easylistitaly.txt
- EasyList Dutch: https://easylist-downloads.adblockplus.org/easylistdutch.txt
- Liste FR (included in AdGuard French filter): https://easylist-downloads.adblockplus.org/liste_fr.txt
- EasyList China (included in AdGuard Chinese filter): https://easylist-downloads.adblockplus.org/easylistchina.txt
- Bulgarian List: https://stanev.org/abp/adblock_bg.txt
- ABPindo: https://raw.githubusercontent.com/heradhis/indonesianadblockrules/master/subscriptions/abpindo.txt
- Liste AR: https://easylist-downloads.adblockplus.org/Liste_AR.txt
- EasyList Czech and Slovak: https://raw.githubusercontent.com/tomasko126/easylistczechandslovak/master/filters.txt
- Latvian List: https://notabug.org/latvian-list/adblock-latvian/raw/master/lists/latvian-list.txt
- EasyList Hebrew: https://raw.githubusercontent.com/easylist/EasyListHebrew/master/EasyListHebrew.txt
- EasyList Lithuania: https://raw.githubusercontent.com/EasyList-Lithuania/easylist_lithuania/master/easylistlithuania.txt
- AdBlock Warning Removal List: https://easylist-downloads.adblockplus.org/antiadblockfilters.txt
- [Fanboy](https://www.fanboy.co.nz)
- Fanboy Enhanced Trackers List: https://secure.fanboy.co.nz/enhancedstats.txt
- Fanboy Anti-Facebook Filters (included in Social List): https://www.fanboy.co.nz/fanboy-antifacebook.txt
- Fanboy Thirdparty Fonts Filters (no blocked selectors): https://www.fanboy.co.nz/fanboy-antifonts.txt
- Fanboy Social List (included in Annoyances): https://easylist.to/easylist/fanboy-social.txt
- Fanboy Annoyances: https://secure.fanboy.co.nz/fanboy-annoyance.txt
- Fanboy Anti-Cookie Filters (equals to EasyList Cookie List, included in Annoyances)
- Other
- Peter Lowe's Blocklist (no blocked selectors): https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblockplus&showintro=0&mimetype=plaintext
- Web Annoyances Ultralist: everything from https://github.com/yourduskquibbles/webannoyances/tree/master/filters
- I don't care about cookies: https://www.i-dont-care-about-cookies.eu/abp/
- ROList: https://zoso.ro/pages/rolist2.txt
- RU AdList (it doesn't work in AdGuard for some reason): https://easylist-downloads.adblockplus.org/advblock.txt
- Icelandic ABP List: https://adblock.gardar.net/is.abp.txt
- Greek AdBlock Filter: https://raw.githubusercontent.com/kargig/greek-adblockplus-filter/master/void-gr-filters.txt
- Thai Ads Filters: https://adblock-thai.github.io/thai-ads-filter/subscription.txt
- Hungarian filter: https://raw.githubusercontent.com/hufilter/hufilter/master/hufilter.txt
- ABPVN List (Vietnamese): https://abpvn.com/filter/abpvn-h9kF1c.txt
- Official Polish filters for AdBlock, uBlock Origin & AdGuard: https://raw.githubusercontent.com/MajkiIT/polish-ads-filter/master/polish-adblock-filters/adblock.txt
- Estonian List: https://adblock.ee/list.php
- Adblock-Persian list: https://ideone.com/plain/K452p
- List-KR: https://raw.githubusercontent.com/List-KR/List-KR/master/filter.txt
- Adblock List for Finland: https://raw.githubusercontent.com/finnish-easylist-addition/finnish-easylist-addition/master/Finland_adb.txt
- Frellwit's Swedish Filter: https://raw.githubusercontent.com/lassekongo83/Frellwits-filter-lists/master/Frellwits-Swedish-Filter.txt

## DOM blockers

This entropy source checks which DOM elements (CSS selectors) are blocked by browsers.
The source code is at `src/sources/dom_blockers.ts`.
It contains a list of filters and CSS selectors to detect.
This list should be actualized periodically.

### How to make the list of filters

#### 1. Download the filters

Download all the filters from the list above.
The downloaded file names mustn't start with `.` and must have the `.txt` extension.

#### 2. Make a selectors tester

Put the downloaded files into the `resources/content_blocking/filters` directory.
The open a terminal, go to the repository root and run:

```bash
yarn install
./node_modules/.bin/ts-node --compiler-options '{"module": "CommonJS"}' ./resources/content_blocking/make_selectors_tester.ts
```

An HTML file will be created at `resources/content_blocking/selectors_tester.html`.

#### 3. Get selectors blocked by each filter

Install an ad blocker where you can choose individual filters to use.
We strongly recommend to use AdGuard on iOS or macOS because AdGuard allows choosing individual filters,
includes all the filters above, and iOS is the №1 target of the entropy source
(macOS version works the same but allows custom filters in free version).

Open the HTML file created above in the browser.
AdGuard in Safari works well if you just open the local file directly.
You can use [ngrok](https://stackoverflow.com/a/58547760/1118709) to open the file on another device.

For each filter in the list above, except for the filters noted as having no blocked selectors or duplicating other filters, do the following steps:

1. Go to the ab blocker settings, turn on only this filter, make sure the new filter set is applied (click the refresh button in the ad blocker settings and wait a couple seconds).
2. Return to the browser, refresh the page (make sure the field content has changed). It will show which CSS selectors are blocked by the current filter.
3. Save the content of the field to a `.txt` file in the `resources/content_blocking/blocked_selectors` directory.
The file names will be names of the filters in the entropy source; see its source code to know the correct names.

After that, you will get the list of files that matches the current list of filters in the entropy source code.

#### 4. Get unique selectors for each filter

Open a terminal, go to the repository root and run:

```bash
./node_modules/.bin/ts-node --compiler-options '{"module": "CommonJS"}' ./resources/content_blocking/get_unique_filter_selectors.ts
```

A JSON file will be created at `resources/content_blocking/unique_filter_selectors.json`.
This file contains unique blocked selectors for each of the filters.

Take 5 random selectors for each filter from the file and copy them to `src/sources/dom_blockers.ts`.
I prefer selectors that depict features of filters (e.g. have foreign words or domains in case of regional filters),
they increase the stability of selectors.
Avoid selectors with `iframe` if possible as they produce excess load on browsers.

#### 5. Handle empty filters

If you see a filter with no unique selectors, it shall mean that the filter is included into another filter (see notes in the filter list above).
In this case, temporary move the files of the filters that include that filter out of the `resources/content_blocking/blocked_selectors` directory,
run `get_unique_filter_selectors.ts` again, see selectors for the filter in the new version of the `unique_filter_selectors.json` file
and return the moved files back (to the initial state).
Such way you'll get selectors that identify both the included, and the including filters.

Repeat the steps for all filters with no unique selectors.

#### 6. EasyList Android case

AdGuard on Android blocks slightly different selectors than AdGuard on iOS.
Sometimes it leads to false positive EasyList detection when AdGuard Base filter is used.

To solve it, you need an Android device or emulator with AdGuard installed.
Do the step 4, but instead of 5, copy all the selectors of EasyList.
On the device, enable AdGuard, open the settings and enable a few filters including AdGuard Base but not including EasyList.
Start the playground, connect Chrome (or any other browser) dubugger to the device, open the device browser console,
check which of the selectors are passed (see more detail about the debugging below) and copy 5 of them to the entropy source code.

If you don't have an ability to use an Android device/emulator,
just check that the current EasyList selectors don't give false positive using the debugging.

### Debug

If you run agent in debug mode (e.g. on the playground), the entropy source will print which CSS selectors are blocked and which aren't.
The selectors are grouped into filters (according to the entropy source code).
➡️ right to a selector means that it isn't blocked, 🚫 means that it's blocked.
You can adjust your ad blocker settings and see what changes.
Ideally, each filter must block all of its selectors and none of the other selectors.
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
"rollup-plugin-terser": "^7.0.2",
"terser-webpack-plugin": "^4.2.3",
"ts-loader": "^8.0.7",
"ts-node": "^9.1.1",
"typescript": "^4.0.3",
"ua-parser-js": "^0.7.22",
"webpack": "^4.44.2",
Expand Down
2 changes: 2 additions & 0 deletions resources/content_blocking/blocked_selectors/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!/.gitignore
2 changes: 2 additions & 0 deletions resources/content_blocking/filters/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!/.gitignore
67 changes: 67 additions & 0 deletions resources/content_blocking/get_unique_filter_selectors.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
/*
* See docs/content_blockers.md
*/

import * as path from 'path'
import { promises as fsAsync } from 'fs'
import { eachLineInFile } from './utils'

const inputDirectory = path.join(__dirname, 'blocked_selectors')
const outputFile = path.join(__dirname, 'unique_filter_selectors.json')

async function run() {
const filterSelectors = await getUniqueFilterSelectors(inputDirectory)
await fsAsync.writeFile(outputFile, stringifyResult(filterSelectors))
}

async function getUniqueFilterSelectors(directoryPath: string) {
const directoryItems = await fsAsync.readdir(directoryPath, { withFileTypes: true })
const selectors = new Map<string, string[]>()
const filterSelectors: Record<string, string[]> = {}

for (const directoryItem of directoryItems) {
if (!directoryItem.isFile()) {
continue
}

const nameMatch = /^([^.].*)\.txt$/.exec(directoryItem.name)
if (!nameMatch) {
continue
}

const filterName = nameMatch[1]
filterSelectors[filterName] = []

await eachLineInFile(path.join(inputDirectory, directoryItem.name), (line) => {
const selector = line.trim()
if (selector) {
let selectorFilters = selectors.get(selector)
if (!selectorFilters) {
selectorFilters = []
selectors.set(selector, selectorFilters)
}
selectorFilters.push(filterName)
}
})
}

selectors.forEach((selectorFilters, selector) => {
if (selectorFilters.length === 1) {
for (const filterName of selectorFilters) {
filterSelectors[filterName].push(selector)
}
}
})

return filterSelectors
}

function stringifyResult(filterSelectors: Record<string, string[]>) {
return JSON.stringify(filterSelectors, null, 2)
}

run().catch((error) => {
// eslint-disable-next-line no-console
console.error(error)
process.exitCode = 1
})
95 changes: 95 additions & 0 deletions resources/content_blocking/make_selectors_tester.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
/*
* See docs/content_blockers.md
*/

import * as path from 'path'
import { promises as fsAsync } from 'fs'
import * as rollup from 'rollup'
import { eachLineInFile } from './utils'

const inputDirectory = path.join(__dirname, 'filters')
const inputScript = path.join(__dirname, 'selectors_tester.ts')
const outputFile = path.join(__dirname, 'selectors_tester.html')

async function run() {
const uniqueSelectors = await getUniqueSelectorsFromDirectory(inputDirectory)
const testerHtml = await makeTesterHtml(uniqueSelectors)
await fsAsync.writeFile(outputFile, testerHtml)
}

async function getUniqueSelectorsFromDirectory(directoryPath: string) {
const directoryItems = await fsAsync.readdir(directoryPath, { withFileTypes: true })
const uniqueSelectors = new Set<string>()

for (const directoryItem of directoryItems) {
if (!directoryItem.isFile()) {
continue
}
if (!/^[^.].*\.txt$/.test(directoryItem.name)) {
continue
}
await eachSelectorInFile(path.join(inputDirectory, directoryItem.name), (selector) => {
uniqueSelectors.add(selector)
})
}

return uniqueSelectors
}

async function eachSelectorInFile(filePath: string, callback: (selector: string) => void | Promise<void>) {
await eachLineInFile(filePath, async (rule) => {
const selectorMatch = /^##(.+)$/.exec(rule)
if (!selectorMatch) {
return
}
const selector = selectorMatch[1]
// Leaves only selectors suitable for `parseSimpleCssSelector` and `offsetParent` usage
if (/(^embed([^\w-]|$)|\\|\[src.*=|\[style\W?=[^[]*\bposition:\s*fixed\b|\[[^\]]*\[)/.test(selector)) {
return
}
const selectorWithoutAttributes = selector.trim().replace(/\[.*?\]/g, '[]')
if (/[\s:]/.test(selectorWithoutAttributes)) {
return
}
await callback(selector)
})
}

async function makeTesterHtml(selectors: { forEach: (callback: (selector: string) => void) => void }) {
const selectorsList: string[] = []
selectors.forEach((selector) => selectorsList.push(selector))
const jsCode = await getJsToDetectBlockedSelectors(selectorsList)
return `<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>Selector blockers tester</title>
</head>
<body>
<script>
${jsCode}
</script>
</body>
</html>`
}

async function getJsToDetectBlockedSelectors(selectors: readonly string[]) {
// The first configuration from rollup.config.js is supposed to make a JS file with dependencies included
const bundle = await rollup.rollup({
input: inputScript,
// eslint-disable-next-line @typescript-eslint/no-var-requires
plugins: require('../../rollup.config')[0].plugins,
})
const { output } = await bundle.generate({
format: 'iife',
})
return output[0].code.replace(/\[\s*\/\*\s*selectors\s*\*\/\s*]/g, JSON.stringify(selectors))
}

run().catch((error) => {
// eslint-disable-next-line no-console
console.error(error)
process.exitCode = 1
})
39 changes: 39 additions & 0 deletions resources/content_blocking/selectors_tester.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import { getBlockedSelectors } from '../../src/sources/dom_blockers'

const display = document.createElement('textarea')
display.readOnly = true
display.style.boxSizing = 'border-box'
display.style.width = '100%'
display.style.height = '75vh'
display.value = 'Please wait...'

const copyButton = document.createElement('button')
copyButton.textContent = 'Copy'
copyButton.addEventListener('click', (event) => {
event.preventDefault()
display.focus()
display.select()
document.execCommand('copy')
})

document.body.appendChild(display)
document.body.appendChild(copyButton)

// Wait a bit to draw the initial UI
setTimeout(async () => {
try {
const selectors: string[] = [
/* selectors */
]
const blockedSelectors = await getBlockedSelectors(selectors)

display.value = ''
for (const selector of Object.keys(blockedSelectors)) {
if (blockedSelectors[selector]) {
display.value += `${selector}\n`
}
}
} catch (error) {
display.value = `${error}\n${error.stack}`
}
}, 10)

0 comments on commit 387eb83

Please sign in to comment.