Web Crawler - KEM Contacts

Get all metadata about the Klima- und Energieregionen from https://www.klimaundenergiemodellregionen.at/modellregionen/liste-der-regionen/

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development. See deployment for notes on how to deploy the project on a live system.

Prerequisites

What things you need to install the software and how to install them

I recommend you to use the setup_development.sh script by running

./setup_development.sh

but if you don't want to do that, here is the complete list of dependencies:

Deployment

Activate the environment

source venv/bin/activate

Change to the scrapy project

cd kem

Start the crawler

scrapy crawl getcontacts

After the crawler finishes, you'll want to export the data:

./export.py results.db kem getcontacts 1 KEM-Contacts_YYYY-MM-DD

where 1 is the job id and YYYY-MM-DD should be replaced by the date on which you crawled the website.

You will see the job id at the beginning of the log log.txt: Job ID is: XX.

Built With

Ubuntu 18.04.3 LTS - The operating system I use
Sublime Text 3 - The code editor I use
Python 3.6.8 - The programming language
Python 3 PIP 9.0.1 - The package manager of the programming language
Python Venv 3.6.7-1 - The project bundler of the programming language
Scrapy 1.7.3 - The crawling framework
Sqlalchemy 1.3.7 - The database interface library
Pyexcel 0.5.15 - For exporting to spreadsheet formats
Pyexcel-ods 0.5.6 - For exporting as ODS spreadsheet
Pyexcel-xls 0.5.8 - For exporting as XLS spreadsheet
Pyexcel-xlsxw 0.4.2 - For exporting as XLSX spreadsheet

Contributing

Please open an issue if you want to help or have questions.

Roadmap

Things I already plan to implement, but didn't have yet:

Change database scheme to be individual to crawler, make exporter therefore export specific table.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Max Fuxjäger - Initial work - MaxValue

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Project History

This project was created because I (Max) was asked to crawl this website.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
kem		kem
.gitignore		.gitignore
.hidden		.hidden
KEM Contacts.sublime-project		KEM Contacts.sublime-project
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup_development_environment.sh		setup_development_environment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler - KEM Contacts

Contents

Getting Started

Prerequisites

Deployment

Built With

Contributing

Roadmap

Versioning

Authors

License

Project History

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Crawler - KEM Contacts

Contents

Getting Started

Prerequisites

Deployment

Built With

Contributing

Roadmap

Versioning

Authors

License

Project History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages