Skip to content

lectaurep/lepidemo

Repository files navigation

License: CC BY 4.0 DOI

LEPIDEMO : LECTAUREP PIPELINE DEMONSTRATOR

Going from eScriptorium to TEI-Publisher

This demonstration shows the implementation of a pipeline going from PAGE XML to TEI Publisher created within the frame of the LECTAUREP project.

LECTAUREP is a project jointly led by Inria (ALMAnaCH) and the Archives nationales de France (DMC). Its purpose is to facilitate the exploration of thousands of pages of directories listing minutes and deeds redacted by Parisians notaries between the beginning of the 19th century and the mid-20th centuries. To do so, LECTAUREP relies on automatic transcription performed with Kraken via the eScriptorium web application.

Images are loaded on the platform, then transcribed and annotated, and finally exported to PAGE XML files. The last section of the pipeline aims at offering users a platform to visualise, querry and read the pages of the directories. An almost ready-to-use solution consist in using TEI-Publisher, which requires transforming the PAGE XML files into compliant TEI XML.

LEPIDEMO demonstrates how this transformation can be plugged into eScriptorium as a simple python script.

A Jupyter notebook

The demonstration can be followed step by step using the lepidemo.ipynb Jupyter scenario.

Installation

  • Create a python virtual environment: `virtualenv -p python3 [ENVIRONMENT NAME]
  • Activate it source [ENVIRONMENT NAME]/bin/activate
  • Then launch Jupyter with jupyter notebook
  • Openlepidemo.ipynb with jupyter browser and then follow cells instructions.

Cite this work

Chagué, A., & Scheithauer, H. LEPIDEMO, a Pipeline Demonstrator for LECTAUREP to go from eScriptorium to TEI-Publisher [Computer software]