A Docker image with Chrome configured for web crawling, including ad blocking, cookie banner bypass. Created initially for my RSS reader to parse article content from my subscriptions. But can be used literally for anything, as it exposes CDP port.
- Chrome with DevTools Protocol enabled
- Pre-installed extensions:
- uBlock Origin Lite (ad blocking)
- I Still Don't Care About Cookies (cookie banner bypass)
- VNC access for debugging (port 7900)
- Proxy support configured
- Ability to install any other extensions (given the zip archive with the extension)
Create a docker-compose.yml file:
services:
crawl-browser:
image: nuhotetotniksvoboden/crawl-browser:latest
ports:
# Optional: if you want to expose ports to you host,
# otherwise not needed.
- "9222:9222" # Chrome DevTools Protocol
- "7900:7900" # noVNC (optional for debugging)
# Either cap_add or --no-sandbox has to be specified in docker.
# If you're using rootless podman this is not needed.
cap_add:
- sys_admin
command:
- --no-sandboxThen run:
docker-compose upThe CHROME_EXTENSIONS variable allows you to install additional Chrome extensions at runtime. The format is:
CHROME_EXTENSIONS="alias1|download_url1[|extract_folder1][,alias2|download_url2[|extract_folder2]]"
Parameters:
alias: Short name for the extension (used in logs)download_url: Direct download URL for the extension zip fileextract_folder: (Optional) Specific folder name inside the zip file if the extension is not in the root
Examples:
-
MetaMask only:
environment: - CHROME_EXTENSIONS=mm|https://github.com/MetaMask/metamask-extension/releases/download/v12.22.3/metamask-flask-chrome-12.22.3-flask.0.zip
-
Just-Read only:
environment: - CHROME_EXTENSIONS=justread|https://github.com/ZachSaucier/Just-Read/archive/master.zip|Just-Read-master
-
Multiple extensions (MetaMask + Just-Read):
environment: - CHROME_EXTENSIONS=mm|https://github.com/MetaMask/metamask-extension/releases/download/v12.22.3/metamask-flask-chrome-12.22.3-flask.0.zip,justread|https://github.com/ZachSaucier/Just-Read/archive/master.zip|Just-Read-master
You can add additional argument to the chrome binary like this:
services:
crawl-browser:
image: nuhotetotniksvoboden/crawl-browser:latest
command:
- --proxy-server=http://proxy.mydomain.com:1081
- --proxy-bypass-list=.mydomain.com
The Chrome instance is accessible via:
- Chrome DevTools Protocol: port 9222
- VNC viewer: port 7900 (for debugging)
This project includes a comprehensive test suite using Playwright and Chrome DevTools Protocol.
# Run all tests
make test- Build the image:
make build - Run tests:
make test - Debug with VNC:
make test-debugthen open http://localhost:7900