Skip to content

Current Architecture

jabrah edited this page Nov 6, 2014 · 18 revisions

Source code

The source code is stored on github in the rosa repository and organized into modules using maven.

Documentation

Much of the source code lacks Javadoc. Various documentation is scattered around the JHU library wiki.

Archive layer

  • Code contained in the The rosa-core and rosa-tool modules.
  • Each collection of content like rose or pizan or stored in a directory hierarchy.
  • The server holding the files backs them up to tape automatically.
  • The directories are shared over NFS.
  • The files in each collection must be named in certain ways and have certain formats and relationships to each other.

A Java framework with a command line has been written to interact with this content. It checks that various requirements on the content are kept and provides assistance importing new content and performing operations on content. The content can be indexed with lucene in a way required by the website. The content can also be transformed into formats required for inclusion into the website.

The rose and pizan collections use the same file formats.

Archive Command Line Tool

Operations

  • update [-f] [-t threads] DERIVNAME ARCHIVENAME
    • Options:
      • -f: force
      • -t <threads>: specify number of threads to use
  • export-balaur-collection DIRECTORY
  • check-files NAME1 NAME2 ...
  • validate DERIVNAME ARCHIVENAME
  • check-bnf DERIVNAME ARCHIVENAME
  • split-bnf CHECKSUMS BNF_DIRECTORY
  • create-descriptions-from-spreadsheet TEMPLATE_FILE SPREADSHEET_FILE OUTPUT_FILE
  • prepare-bnf OUTPUT_DIRECTORY BNF_DIRECTORY
  • move-bnf COLLECTION_DIRECTORY MANUSCRIPT_DIRECTORY
  • replace-imagetag-illus-title-ids
  • conf-get COOKIE TYPE DESTINATION_DIRECTORY
    • TYPE
      • pages
      • img
      • trans
      • nartag
      • redtag
      • perm
      • desc
      • bib
  • print-ms-file-names MANUSCRIPT_NAME NUM_FOLIOS
  • print-paginated-file-names MANUSCRIPT_NAME NUM_PAGES
  • print-reduced-tagging-template BOOK_ID NUM_FOLIOS
  • ls
  • list
  • rename-files CSV_MAP OLD_DIRECTORY NEW_DIRECTORY
  • import-files FILE1 ...
  • test-narrative-mapping ARCHIVE_NAME USE_SYNCH_POINTS

Archive File System Structure

Current archive file hierarchy

  • Top Level - Collection directory
    • narrative_sections.csv
    • character_names.csv
    • illustration_titles.csv
    • missing_image.tif (Image to display for missing folios)
    • missing.txt (all folios in the collection that are missing images)
    • Zero or more directories containing individual manuscripts/books/etc. Directory name = ID
      • Rose File Naming for image names
      • <ID>.crop.txt
      • <ID>.description_(en|fr).xml
      • <ID>.images.csv (name and dimensions of all images in a book)
      • <ID>.images.crop.csv
      • <ID>.imagetag.csv
      • <ID>.nartag.csv
      • <ID>.redtag.txt
      • <ID>.permission_(en|fr).html
      • <ID>.SHA1SUM
      • (sometimes) <ID>.transcription.<page>.txt
      • <ID>.transcription.xml
      • <ID>.bnf.filemap.csv
      • <ID>.bnf.foliation.xml
      • <ID>bnf.MD5SUM
      • (directory) cropped
        • cropped images

Required files:

  • In collection: narrative_sections.csv, illustration_titles.csv, character_names.csv, missing.tif
  • In book: descriptions, images.csv, permissions, SHA1SUM

Websites

Rose, Pizan

The code is in rose-website and pizan-website. The shared code is in rosa-website-common Functionality and UI are similar. The skins are different. But they have some different pages with different functionality. Built on GWT. Runs on tomcat. The Rose and Pizan websites share a fair amount of code, but they were refactored to share code after the fact.

A RPC service is used for searching a lucene index contained in the war.

The maven poms for these sites invoke the rosa command line tool to generate data for inclusion into the war. This includes all sorts of metadata as well as a lucene index.

The website code itself is not well organized. It started life using one of the first GWT releases and grew over time. It should be refactored to take advantage the new features and structured in a way which allows it to evolve more reasonably.

DLMM

Static website linking to manuscript sites. Also hosts the shared canvas prototype viewer. Runs on tomcat. The website is in a separate git repository.

FSI Image server

The FSI Image Server is a commercial image server with an HTTP API running on fsiserver.library.jhu.edu.

IIIF endpoint

Implements 1.0 version of IIIF spec by dispatching calls to FSI server.

Shared canvas endpoint.

Implements the Shared Canvas specification directly using JSON-LD to serialize the RDF. Each request results in metadata being loaded dynamically from a rosa manuscript site and then transformed to shared canvas and serialized to JSON-LD. This was done before the IIIF Presentation API was created. There is a lot of code duplicated from the website module that is used to parse the metadata.

Shared canvas test endpoint

Contains a small amount of test data with spatial information. This is needed because we have no real data with spatial information.

Shared canvas viewer.

Based on GWT and HTML 5. Allows multiple independent panels to browse manuscript repositories exposed as shared canvas. Some design info and documentation on JHU wiki. See [https://wiki.library.jhu.edu/x/vo7VAQ] for a postmortem.