Galician Official journal (DOGA) Scraper
It seems that the easiest way to access the DOGA dispositions is through the DOGA search page filtering the desired dates.
This scraper was created and used as a socurce for the publication of this data journalism story
The script expects a year as an input parameter and scrapes all the available documents to the data folder (automatically created). It creates a folder with the year passed as an argument and stores the documents in two formats PDF and HTML.
If some unexpected behaviour is found the script logs the details inside the logs folder (automatically created)
- require 'mechanize'
- require 'fileutils'
- require 'pty' # To buffer out the stdout
Execution of the script
To run the script
$ rake scrape:DOGA