Skip to content

Sandbox for an open-data version of the Digital Index of Middle English Verse.

Notifications You must be signed in to change notification settings

icornelius/dimev-sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is a sandbox in which to prototype tools for cleanup, transformation, and validation of data curated by editors of DIMEV: An Open Access, Digital Edition of the "Index of Middle English Verse". Researchers interested in Middle English verse should consult dimev.net, not this repository, as the XML source files in this repository are a snapshot and will not be updated. They are for testing only. Commentary is welcome.

The repository also hosts source files for an experimental new DIMEV website, built with Jekyll and hosted by GitHub Pages. All this is very much work in progress. An inspiration is Andrew Dunning's prototype for a digital edition of Richard Sharpe, A Handlist of Latin Writers of Great Britain and Ireland Before 1540.

Repository contents

  • artefacts/ Warnings, reports, and csv artefacts of the scripts in scripts/. Transformed source data are written instead to docs/ for use by the Jekyll website builder.
  • DIMEV_XML/ DIMEV source files as of May 2023.
  • docs/ Source files and templates for a website. The contents of docs/_items/ are written by scripts/transform-Records.py.
  • schemas/ JSON schemas for validation of transformed source files.
  • scripts/ Python scripts for review and transformation of the files in DIMEV_XML. For details see comments at the head of each file.

Technical direction

  • Records.xml will be atomized (one file per <record>) to make effective use of git distributed version control. Data will be parsed to identify irregularities, remediated (manually where necessary), and written to a new consistent structure. For instance, any field that may be an array must be an array (even if an array of one). After migration, subsequent updates to any file must validate against a schema. Early prototypes of data files are in docs_items. An early prototype of the schema is schemas/records.json. Cross references (i.e., those <record> items without an @xml:id) will be handled differently, tbd.
  • Manuscripts.xml and MSSIndex.xml will be de-duplicated. Data will be atomized (one file per <item>), parsed, remediated, and written to a new consistent structure. For an early partial prototype, see the output of scripts/transform-Manuscripts.py. Inscriptions.xml and PrintedBooks.xml will be handled similarly. After migration, subsequent updates to any file must validate against a schema.
  • Bibliography.xml. Data will be parsed and remediated (as above), written to a standard bibliographic data format and imported to Zotero for distribution and curation on that platform. For a prototype of this conversion, see artefacts/bibliography.yaml; the schema is schemas/csl-data.json. To import tags we must target a format other than CSL JSON, per this discussion. Tags will be used to link bibliographic items to their objects, as in the Bodleian Library's bibliographical references for Western manuscripts. Links to on-line facsimiles of manuscripts will be handled differently, probably as a field within the data structure for manuscripts.
  • Glossary.xml tbd.

About

Sandbox for an open-data version of the Digital Index of Middle English Verse.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages