Skip to content

offenesdresden/ratsinfo-scraper

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
lib
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ratsinfo-scraper

Scrape documents with associated metadata from http://ratsinfo.dresden.de

INSTALLATION

Get the code

git clone https://github.com/Mic92/ratsinfo-scraper.git

Get ruby (>= 2.0.0)

gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
\curl -sSL https://get.rvm.io | bash -s stable
rvm install 2.2.3
rvm --default use 2.2.3

Install bundler

gem install bundler

Install Dependencies

cd ratsinfo
bundle install

USAGE

To start scraping use on other console:

rake

This will extract all documents to the path of the environment variable DOWNLOAD_PATH (defaults to "data") and convert it to xml files, containing metadata and full text of the pdfs

To scrape an individual session for example: http://ratsinfo.dresden.de/to0040.php?__ksinr=100

rake testmonth

To just download a tiny set of Data, only session data. Just for testing.

To display all tasks use:

rake -T

The download directory will have the following scheme:

  • each session have a directory, where the id is the directory name

  • every document belonging to this session will be extracted to this directory

  • additionally a JSON file is created, with the session id in its name. This is a machine-readable version of the index.htm file, which is contained in the document archives

We do now follow the OParl specification!

Deviations from the OParl spec:

  • Numerical id everywhere, because we don't yet serve the data on HTTP URIs

About

Scrape documents with associated metadata from http://ratsinfo.dresden.de

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published