Data scraper that scans parliement websites and extracts data on MPs, their memberships, and votes. The scraped data is for the Visegrad+ project and is made accessible from the Visegrad+ parliament API.
Data is scraped for the following countries:
- Lower House
- Upper House
- Made to run on a Unix distro. Development was done in Ubuntu.
- Install cURL. Required to download Python and/or virtualenv (in Ubuntu: sudo apt-get install curl).
- Install python-dev. Required to compile 3rd party python libraries.
$ sudo mkdir --p /home/projects/scrapers
$ cd /home/projects/scrapers
$ sudo git clone https://github.com/opendatakosovo/parldata-scraper.git
Get VPAPI client and SSH certificate of the server:
- Install the required libraries for running the scraper.
$ bash install.sh
The scraper is executed by running the scrape.sh shell script. The script accepts the following parameters.
|countries||Comma Separated String||List the countries from which we want to scrape data.|
|people||String||Scrape MP data.|
|votes||String||Scrape cast votes.|
|loop||String||Loop scraper with given interval sleep time (in seconds) or (e.g. 2d - 'd' means days).|
To illustrate how the scraper's parameters are used, consider the following examples.
Scrape people and vote data for Armenia and Georgia. Run the scraper script every 3 minutes:
bash run.sh --countries armenia,georgia --people yes --votes yes --loop 180
Run scraper every 2 days to retrieve people and votes data from Armenia parliament:
bash run.sh --countries armenia --people yes --votes yes --loop 2d
Run scraper every day to retrieve people and votes data from all available parliaments:
bash run.sh --countries all --people yes --votes yes --loop 1d
Scrape people and vote data for Belarus Lowerhouse and Upperhouse. Run the scraper script every 3 minutes:
bash run.sh --countries belarus-lowerhouse,belarus-upperhouse --people yes --votes yes --loop 180