Create a screenshot of a full web page, not only the visible part of the web page that is above the fold (browser viewport). This is achieved by automatically opening and scrolling through a web page to force dynamic images to load. Then a screenshot is saved to a PNG file.
PNG files can become quite large, like 30 MB or so for the front page of a news site.
- Install dependencies Selenium (the actual thing that does all the work) and python-slugify (converts URLs into file names, e.g.
pip install selenium pip install python-slugify
Download a web driver. I recommend firefox over chrome due to compatability.
Make sure Python can find the web driver by modifying your PATH environment variable. This is described in the Selenium installation guide.
$ git clone email@example.com:peterdalle/screenshot.git
Provide a URL or domain name as argument:
$ python screenshot.py google.com
A file like
2018-01-12_18-02_http-google-com.png is then saved in your current directory, with current date and time stamp (yyyy-mm-dd_hh-mm).
Provide multiple URLs or domin names as arguments:
$ python screenshot.py google.com bbc.com svt.se "https://example.net/search?q=test&p=3"
Note that the
& character in URLs have a special meaning in the terminal/command prompt, so don't forget to enclose those URLs in
You can also provide a file name (
urls.txt) with one URL or domain name per line:
$ python screenshot.py urls.txt
Change the behavior of the program in the
settings class. Each setting is documented there.
The most important setting is probably
headless = True which means that a browser is opened in the background without opening a visible browser window.
Selenium seem to have a problem closing the web driver, resulting in lots of web drivers left running and clogging down memory resources. You may need to kill the running processes now and then, especially if you screenshot with crontab.
Another approach is to use the following bash command that creates a virtual x server environment:
xvfb-run --auto-servernum --server-num=1 --server-args="-screen 0 1024x8048x16" cutycapt --url="http://example.net/" --out="example.net.jpg"
bash_screenshot.py is just a wrapper around this command that takes a
url as input parameter and outputs a file with a time stamp and url.
Use it as follows:
$ python bash_screenshot.py http://example.net/
This will produce a file like
2018-01-01-18-40_http-www-example-net.jpg. Make sure to use
.jpg as file extension since
.png will create much larger files (JPG has a lossy compression).