Invenio module to interact with Grobid API for metadata extraction from PDF.
- Free software: GPLv2 license
- Documentation: https://invenio-grobid.readthedocs.org.
This is an experimental developer preview release.
This module provide an interface for uploading PDFs to a Grobid instance and allows to submit extracted metadata to a configurable callback.
NOTE: This packages assumes you have setup a local Grobid REST service. For more information about this and more, read the official Grobid documentation.
pip install invenio-grobid
Note that you also need a running Grobid REST service.
Add invenio_grobid
package to your Invenio PACKAGES
config in your
overlay/config.py
to be picked up by the Invenio application loader.
Configure the URL to your Grobid REST service with GROBID_HOST
.
inveniomanage config set GROBID_HOST 'http://localhost:8080'
If you want to change your standard upload handler after extraction, update GROBID_RESULT_HANDLER
.
inveniomanage config set GROBID_RESULT_HANDLER 'my_overlay.grobid:upload_handler'
The uploader interface is available under the /grobid
endpoint by default. E.g. http://localhost:4000/grobid
- Choose a PDF to extract metadata from and hit
Upload
. - Wait a bit and metadata will be displayed.
- Click on
Submit
button to push the metadata to yourGROBID_RESULT_HANDLER
Special thanks to Joseph Boyd (@jcboyd) and Gilles Louppe (@glouppe) for Grobid support.
Happy hacking and thanks for flying Invenio Grobid.