Skip to content
John Abrahams edited this page Aug 8, 2018 · 22 revisions

Command line tool for handling data in the archive.

Group: rosa
Artifact: rosa-archive-tool
Version: 2.0.0-SNAPSHOT

Requirements:

  • list content
  • check content
  • verify content (bit check)
  • ability to handle all existing content
  • metadata transforms?
  • update checksums?
  • guess image order and write file?

Tool

In order to specify a custom location for the archive, use option: -D archive.path=<archive_path>

  • list

    • list: list all book collections in the archive
    • list <collectionId> : list all books in the collection
    • list <collectionId> <bookId> : list all items in the book
      • -showErrors flag, if present, the tool will show any errors encountered while processing the list command
  • check

    • check [options]: check everything in the archive
    • check [options] <collectionId> : check data in a collection
    • check [options] <collectionId> <bookId> : check data in a book
      • -checkBits forces the tool to check bit integrity along with data consistency.
  • update (update checksum info) If a file in the archive has been touched since the last checksum update, the checksum will be recalculated and updated in the SHA1SUM file. If the SHA1SUM file does not exist, then it will be created and the checksums for all files in the directory will be added. This can also be done using the sha1sum command (in Unix).

    • update [options] : update checksums of the whole archive
    • update [options] <collectionId> : update checksums for collection and all books in the collection
    • update [options] <collectionId> <bookId> : update checksums for the book
      • -force flag will force update all checksum values.
  • update-image-list

    • update-image-list [options] : update image lists for all books in all collections
    • update-image-list [options] <collectionId> : update all image lists for the collection
    • update-image-list [options] <collectionId> <bookId> : update image list for the book
      • -force forces the tool to overwrite any existing image lists. This flag must be used if a book already contains an image list, as the tool will not overwrite by default.
  • crop-images (opertions will also generate an image list of the cropped images)

    • crop-images [options] : crop images in all books in all collections
    • crop-images [options] <collectionId> : crop images for all books in the collection
    • crop-images [options] <collectionId> <bookId> : crop all images in the book
      • -force : force the tool to crop book images, overwriting any existing cropped images. Any existing cropped image lists will be overwritten.
  • file-map - generate a file map useful for renaming files to fit the archive naming standards. Note, this is a best guess based on user input, so it may need to be edited by hand after it is generated. NOTE that if any misc images are present, they MUST be edited by hand after generation in order to give them a proper label.

    • file-map <collectionId> <bookId> : interactively generate a file map for the book
  • validate-xml - validate AoR transcription data against a schema

    • validate-xml <collectionId>
    • validate-xml <collectionId> <bookId>
  • rename-images - after a file map is generated, images can be renamed according to the mapping.

    • rename-images [ -R | -I ] <collectionId> <bookId>
      • -R --reverse : rename items according to the reverse relationship in the file map.
      • -I --change-id : change only the book ID part of the image names to match the book ID of the book.
  • rename-transcriptions - AoR transcriptions can be renamed (and the filename attribute changed accordingly) following the file map.

    • rename-transcriptions [ -R ] <collectionId> <bookId>
      • -R --reverse : rename items according to the reverse relationship in the file map.
  • generate-tei - if a book contains transcriptions in .txt files, use this command to transform the transcription data into a single valid TEI XML file. This file will validate against the tei-ms schema found at: http://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_ms.xsd

  • generate-annotation-map <collection_id>- generates the id_locations.csv file used when resolving internal references within the a collection

  • check-aor - check AoR transcription files that have not been imported into the archive. Useful for checking transcription files that have just been pulled from Github.

Example:

  • java -jar tool.jar check-aor /tmp/aor/Domenichi true - the extra true argument will signal that the provided path is a book. Check all XML files in this directory.
  • java -jar tool.jar check-aor /tmp/aor - will check subdirectories and treat them as "books". Each subdirectory will be searched for XML files, which will be checked.