Skip to content
jabrah edited this page Feb 5, 2015 · 14 revisions

The archive structure has been adapted from its original form of containing only a single collection (Roman de la Rose) to accomodate new collections, such as the manuscripts in the Pizan collection, and heavily annotated early printed books in the Archaeology of Reading collection.

ARCHIVE/
   |
   | --- COLLECTION1/
   |        | --- <required files>
   |        | --- BOOK1/
   |        | --- BOOK2/
   |       ...      | --- <images>
   |                | --- <other files>
   |
   | --- COLLECTION2/
  ...       |
           ...

Importing Data Into the Archive

Archive

The archive base directory contains directories for each collection. If any other files exist in this directory, they will have no effect on the rest of the data.

Collection

In each collection directory, there live separate directories for each manuscript. Outside of the manuscript, several files hold information that is shared between all manuscripts. These files are required for all collections.

  • character_names.csv : a CSV (comma separated value) document containing characters that appear in the Rose poem, with different variations in spelling. Character id number, "Site name", French variant, English name
  • illustration_titles.csv : a CSV document mapping simple IDs to fill titles of illustrations that appear in many Rose manuscripts. Image ID number, Title
  • narrative_sections.csv : a CSV document containing information about different narrative sections in the Rose poems. Section, Lines, Lecoy, Description
  • config.properties : a properties file that includes relevant configuration information, such as the supported languages for the collection
  • missing_image.tif : a placeholder image that should be displayed if a folio is missing.
  • .SHA1SUM : checksum information about the files in this directory.

Manuscript (or book)

In each manuscript directory, some files are required, while others are optional. The name of a manuscript directory defines its manuscript ID (MSID). These files must follow a certain naming scheme in order to be identified by our tools. All files should use MSID as a prefix on its file name (MSID.filename.extension).

Required:

  • Image list (MSID.images.csv) : a CSV document that holds a list of all image names along with its width and height in pixels.
  • Description (MSID.description_language.xml) : an XML document in a particular language. This document contains prose description of the manuscript along with structural metadata. There must be one description file for each supported language.
  • Permission (MSID.permission_language.html) : an HTML document with a permission statement allowing us to use this manuscript. This can contain only an HTML snippet.
  • Checksums (MSID.SHA1SUM) : file holding checksum information for all files in this directory. Each line of this file must contain a file name and its current SHA-1 checksum value.
  • Images : image files as TIFF images. They follow a strict naming convention, as seen in the Rosa File Naming page

Optional:

  • Cropped image list (MSID.images.crop.csv) : a CSV document that holds a list of all cropped image names along with the image width and height in pixels
  • Cropped image directory (cropped) : a directory containing the cropped images. This directory should be here if the cropped image list is present.
  • Image/Illustration tagging (MSID.imagetag.csv) : a CSV document containing information about the illustrations that appear in the manuscript. This information is as follows: illustration id, Folio it appears on, Illustration title, Textual elements, Initials, Characters, Costume, Objects, Landscape, Architecture, Other. Illustration titles in this document contain IDs that point to 'illustration_titles.csv' found in the collection directory.
  • Narrative tagging (MSID.nartag.csv) : a CSV document with information about the different narrative sections that appear in the manuscript. Parts of this data will point back to 'narrative_sections.csv' found in the collection directory.
  • Reduced tagging (MSID.redtag.txt) : a text file with scholarly tagging of the manuscript.
  • Transcriptions (MSID.transcription.xml) : an XML file with transcriptions of the text in the manuscript. Transcriptions are identified by folio, column, and line.
  • Annotations (MSID.aor.page.xml) : These XML files contain annotations made by readers in the book.

Other files can appear in the manuscript directory, but they will be ignored by our tools. Some other files will be present in a manuscript directory to aid in the import of the data, for example.