For Webpages Are Getting Larger Every Year, and Here’s Why it Matters
Author: Jorge Orpinel Perez
© 2018 Pingdom AB.
Website Page Size Scraper
Python script that uses Selenium and Headless Chrome to determine the average page size among a list of websites. This will include transferSize AND any other content loaded dynamically to display the home page of each site.
This tool was developed and ran with Python 3.6.5 on macOS 10.13
Further versions should continue to work.
Required Python package
- Python language bindings for Selenium WebDriver – selenium 3.14 used
To install, we will use virtualenv:
virtualenv venv source venv/bin/activate pip install -r requirements.txt
Virtualenv installs pip automatically.
Save a list of web page URIs (one per line) in a plain text file. Included in 2018-09-15-alexa-topsites-50-preview.txt is a sample list of 50 top sites published by Alexa (Sep 2018).
Make sure the script is executable by your user:
chmod u+x from_list.py
You may now run it:
chromedriver 2> /dev/null & # Implies --remote-debugging-port=9515. Runs in background. ./from_list.py 2018-09-15-alexa-top-sites-50.txt
See the file docstring in from_list.py for further info.
Don't forget to stop chromedriver after running the Python script e.g.:
fg # To bering chromedriver tot he background ^C # Ctrl + C