Skip to content

kjam/web-scraping-speed-comparison

Repository files navigation

PyCon Web Scraping Speed Comparison

A web scraping comparison for LXML, BeautifulSoup with additional tools to investigate Selenium and Scrapy.

Virtual Env

If you'd like to use virtual environments, please follow the following instructions. It is not required for the tutorial but may be helpful.

For more details on virtual environments

If you don't have virtual env wrapper and/or pip:

$ easy_install pip
$ pip install virtualenvwrapper

and read the additional instructions here

$ mkvirtualenv scraper_tutorial
$ pip install -r requirements.txt

LXML and Selenium

You will need both LXML and Selenium to run all of the tests in this repository.

If you are using a Mac, I would highly recommend using Homebrew. It will help make pip install very easy for you to use.

If you are using Windows, it might be worth it to run this within a Linux Virtual Machine. If you are a Windows + Python guru, please follow these installation instructions. I can help as needed but I have not programmed on Windows in more than 5 years.

Firefox Web Browser

Firefox comes as the default web driver for Selenium. To use Selenium easily, please download and install Firefox.

Questions?

/msg kjam on freenode or @kjam on twitter

About

A Python web scraping speed comparison

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages