Please note that this project is released with a Contributor Code of Conduct. By participating in this project, you agree to abide by its terms.
Since Diário Oficial seeks to become a trustful source of official governmental announcements, every data source must also be trustful. A private company publishing government gazettes without official permission, for instance, is not a trustful source.
At this moment, the project aims for the 100 top Brazilian municipalities by population, collecting all the gazettes since 2015. If you want to help us reach this goal, we're tracking the progress of crawlers in CITIES.md file. Have a look at it if you want to know where contributors are already working.
Starting to contribute
If you have never contributed to any open source project, there are tasks just for you. They are labeled "good first issue". Before starting, drop a comment saying you plan to work on it and if you have significant questions.
Sometimes the community organizes events for solving multiple issues in a short amount of time. Especially in these cases, an issue or pull request may be considered abandoned when is waiting for the contributor but doesn't receive updates for a couple of weeks. Just let us know you need more time, so we don't end up closing it by mistake.
Please refer to the README file to set up the project. Also, read the
Makefile for understanding the available tasks and what they run.
Writing crawlers for new municipalities
For collecting the gazettes from the official websites, we use a crawling framework
called Scrapy. You may find its
official tutorial helpful
to get started with the architecture. Our project can be found in the
Two commands may especially be useful:
scrapy shell and
Scrapy has a shell interface for experimenting with crawlers, or spiders (how prefers to call it). To see how the framework reads a webpage before writing a spider for it, try the following, where you can replace the URL for a different municipality website:
$ scrapy shell http://www2.portoalegre.rs.gov.br/dopa/
For running an existing spider, please refer to the README file.
Automated code formatting
If you followed the setup instructions, installing pre-commit hooks, it is possible that you will never
need to run these tools manually, as they will be execute before each commit. However, if you want
to run them in all files in the project, you have
make format command that will call these tools.
Guidelines to maintainers
This section is a space for project maintainers, since as code review and repository organization are part of the contribution process, all code review guidelines are of public interest to anyone in the community.
Responsibilities of a Querido Diario maintainer:
- Respect the code of conduct and ensure that anyone who may have suffered harassment has a help channel.
- Always justify a suggestion according to the practices already adopted in the project, legibility and simplicity. It is essential for a civil project to have a structure as easy as possible for beginners.
- The maintainers should run all the spiders before merging it.
- If a Pull Request have many commits and their messages are not clear, use the Squash and merge option. It is not necessary to do squash when the commits discipline is good, as an example: Pull Request.