Transform unstructured document collections to structured Linked Data
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
ferenda
lagen
test
tools
.gitignore
.travis.yml
LICENSE.txt
MANIFEST.in
README.rst
appveyor.yml
ferenda-setup.py
requirements.py2.txt
requirements.py3.txt
requirements.rtfd.txt
setup.cfg
setup.py
tox.ini

README.rst

Ferenda is a python library and framework for transforming unstructured document collections into structured Linked Data. It helps with downloading documents, parsing them to add explicit semantic structure and RDF-based metadata, finding relationships between documents, and publishing the results, including through a REST-based HTTP API.

https://badge.fury.io/py/ferenda.png https://travis-ci.org/staffanm/ferenda.png?branch=master https://ci.appveyor.com/api/projects/status/aqdo3c39cdof8opa/branch/master https://coveralls.io/repos/staffanm/ferenda/badge.png?branch=master Code Health https://pypip.in/d/ferenda/badge.png

Quick start

This example uses ferenda's project framework to download the 50 latest RFCs and W3C standards, parse documents into structured, RDF-enabled XHTML documents, loads all RDF metadata into a triplestore and generates a web site of static HTML5 files that are usable offline:

pip install ferenda
ferenda-setup myproject
cd myproject
./ferenda-build.py ferenda.sources.tech.RFC enable
./ferenda-build.py ferenda.sources.tech.W3Standards enable
./ferenda-build.py all all --downloadmax=50 --staticsite --fulltextindex=False
open data/index.html

The same functionality can also be accessed through a python API, if you want to use ferenda as part of a larger system. It's also possible to just use the parts of ferenda that you need (eg. only the downloading and parsing features).

More information

See http://ferenda.readthedocs.org/ for in-depth documentation.

Copyright and license

Most of the code written by Staffan Malmgren, licensed under the main 2-clause BSD license.

Some bundled code are written by other authors, included in accordance with their respective licenses: