Customising Sources for Ad Lists

Mcat12 edited this page Nov 14, 2016 · 2 revisions

Pi-hole's Default Block Lists

By default, when pihole -g pulls in lists of domains to block, we combine several lists, which are defined in /etc/pihole/adlists.default:

Note: There are several lists that are commented out. In order to enable them, follow the instructions at the top of the file. After making any changes, run pihole -g to pull in any changes.

If you add any domains, then they can go anywhere in the file, so long as they are not commented out (prefixed with #)

## Pi-hole ad-list default sources. Updated 29/10/2016 #########################
#                                                                              #
#  To make changes to this file:                                               #
#    1. run `cp /etc/pihole/adlists.default /etc/pihole/adlists.list`          #
#    2. run `nano /etc/pihole/adlists.list`                                    #
#    3. Uncomment or comment any of the below lists                            #
#                                                                              #
#  Know of any other lists? Feel free to let us know about them, or add them   #
#  to this file!                                                               #
################################################################################

# The below list amalgamates several lists we used previously.
# See `https://github.com/StevenBlack/hosts` for details
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts

# Other lists we consider safe:
http://mirror1.malwaredomains.com/files/justdomains
http://sysctl.org/cameleon/hosts
https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist
https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt

# hosts-file.net list. Updated frequently, but has been known to block legitimate sites.
https://hosts-file.net/ad_servers.txt

# Mahakala list. Has been known to block legitimage domains including the entire .com range.
# Warning: Due to the sheer size of this list, the web admin console will be unresponsive.
#http://adblock.mahakala.is/

# ADZHOSTS list. Has been known to block legitimate domains
#http://optimate.dl.sourceforge.net/project/adzhosts/HOSTS.txt

# Windows 10 telemetry list
#https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/win10/spy.txt

# Securemecca.com list - Also blocks "adult" sites (pornography/gambling etc)
#http://securemecca.com/Downloads/hosts.txt

# Quidsup's tracker list
https://raw.githubusercontent.com/quidsup/notrack/master/trackers.txt

# Block the BBC News website Breaking News banner
#https://raw.githubusercontent.com/BreakingTheNews/BreakingTheNews.github.io/master/hosts

# Untested Lists:
#https://raw.githubusercontent.com/reek/anti-adblock-killer/master/anti-adblock-killer-filters.txt
#https://raw.githubusercontent.com/Dawsey21/Lists/master/main-blacklist.txt
#http://malwaredomains.lehigh.edu/files/domains.txt
# Following two lists should be used simultaneously: (readme https://github.com/notracking/hosts-blocklists/)
#https://raw.github.com/notracking/hosts-blocklists/master/hostnames.txt
#https://raw.github.com/notracking/hosts-blocklists/master/domains.txt
# Combination of serveral host files on the internet (warning some facebook domains are also blocked but you can go to facebook.com). See https://github.com/mat1th/Dns-add-block for more information.
#https://raw.githubusercontent.com/mat1th/Dns-add-block/master/hosts

Block More Than Advertisements

By using alternate lists, you have the ability to block tracking sites, malware domains, known spam servers, and more. We've included many of these lists in adlists.default, but they are commented out. In order to use them, copy adlists.default to adlists.list and uncomment them.

These Lists Will Need Additional Parsing Logic

The lists below are not in standard hosts format. Since Pi-hole blocks ads at the DNS level, just the domain name needs to be extracted from the lists. To do this, you will likely need to use sed and awk to parse down to get just the domain names.

  • https://github.com/lewisje/jansal/blob/master/adblock/hosts
  • http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current
  • https://easylist-downloads.adblockplus.org/malwaredomains_full.txt
  • https://easylist-downloads.adblockplus.org/easyprivacy.txt
  • https://easylist-downloads.adblockplus.org/easylist.txt
  • https://easylist-downloads.adblockplus.org/fanboy-annoyance.txt
  • http://www.fanboy.co.nz/adblock/opera/urlfilter.ini
  • http://www.fanboy.co.nz/adblock/fanboy-tracking.txt

How To Parse A List To Get Just The Domain

Image you found a list you want to use, but it is formatted with a bunch of extra characters:

||unlimited-hacks.net^
||pakcircles.com^
||cracksplay.com^
||fbgamecheatz.info^
||linkz.it^

You can use sed and/or awk (or other commands) to remove the extra characters to get just the domain name. It helps to be familiar with scripting, but if you wanted to parse down the list above, you could do something like this:

curl -s http://some.list | sed 's/^||//'

This would remove the two pipes at the beginning of the lines, so your list would then look like this:

unlimited-hacks.net^
pakcircles.com^
cracksplay.com^
fbgamecheatz.info^
linkz.it^

Then, you could use sed again, or even something like cut. Since the domains won't have a carat in the name, you can use it as a delimiter with the cut command to display only the domain name.

curl -s http://some.list | sed 's/^||//' | cut -d'^' -f-1

Which leaves you with just the domain names:

unlimited-hacks.net
pakcircles.com
cracksplay.com
fbgamecheatz.info
linkz.it

There is more than one way to parse the list down and there is no right way, however, some methods are faster. If you can combine most of your parsing into a single awk command, it can process a large list much faster. For each | that you use in the command, you are slowing down the processing as it is running in another subshell.