Get all metadata about the Klima- und Energieregionen from https://www.klimaundenergiemodellregionen.at/modellregionen/liste-der-regionen/
- Getting Started
- Prerequisites
- Deployment
- Built With
- Contributing
- Roadmap
- Versioning
- Authors
- License
- Acknowledgments
- Project History
These instructions will get you a copy of the project up and running on your local machine for development. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software and how to install them
I recommend you to use the setup_development.sh script by running
./setup_development.sh
but if you don't want to do that, here is the complete list of dependencies:
- Python 3.6.8
- Python 3 PIP 9.0.1
- Python Venv 3.6.7-1
- Scrapy 1.7.3
- Sqlalchemy 1.3.7
- Pyexcel 0.5.15
- Pyexcel-ods 0.5.6
- Pyexcel-xls 0.5.8
- Pyexcel-xlsxw 0.4.2
Activate the environment
source venv/bin/activate
Change to the scrapy project
cd kem
Start the crawler
scrapy crawl getcontacts
After the crawler finishes, you'll want to export the data:
./export.py results.db kem getcontacts 1 KEM-Contacts_YYYY-MM-DD
where 1 is the job id and YYYY-MM-DD should be replaced by the date on which you crawled the website.
You will see the job id at the beginning of the log log.txt: Job ID is: XX.
- Ubuntu 18.04.3 LTS - The operating system I use
- Sublime Text 3 - The code editor I use
- Python 3.6.8 - The programming language
- Python 3 PIP 9.0.1 - The package manager of the programming language
- Python Venv 3.6.7-1 - The project bundler of the programming language
- Scrapy 1.7.3 - The crawling framework
- Sqlalchemy 1.3.7 - The database interface library
- Pyexcel 0.5.15 - For exporting to spreadsheet formats
- Pyexcel-ods 0.5.6 - For exporting as ODS spreadsheet
- Pyexcel-xls 0.5.8 - For exporting as XLS spreadsheet
- Pyexcel-xlsxw 0.4.2 - For exporting as XLSX spreadsheet
Please open an issue if you want to help or have questions.
Things I already plan to implement, but didn't have yet:
- Change database scheme to be individual to crawler, make exporter therefore export specific table.
We use SemVer for versioning. For the versions available, see the tags on this repository.
- Max Fuxjäger - Initial work - MaxValue
This project is licensed under the MIT License - see the LICENSE.txt file for details.
This project was created because I (Max) was asked to crawl this website.