Skip to content

isl/FastCat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

FastCat

System Description

FastCat is a Web-based system designed for historians and other researchers who need to manually digitize structured and semi structured archival documents in a fast and accurate way to create their research dataset. It combines the ease of use and quick data entry functions of the classic spreadsheet with the information accuracy typically associated with a complex database. It does so by offering data entry templates designed to mirror, in the digital space, the structure and data entry logic of the original source.

In FastCat, archival documents are transcribed as ‘records’ belonging to specific ‘templates’, where a ‘template’ represents the structure of a single type of archival source. A record organizes the data and metadata in tables, offering functionalities like nesting tables and selection of term from a vocabulary. The system runs locally inside any modern web browser with possibility of automated synchronisation with an online database.

Data Curation with FastCat Team

FastCat Team is a special environment within FastCat that allows the collaborative curation of the transcribed data through the management of 'entities' and 'vocabulary terms'. With respect to the management of entities, users can inspect the main entity instances that appear in the data (e.g., names of persons or locations) and start curating them. Here a first automated curation step considers a set of rules for giving the same identity to a set of entity instances having some common characteristics. Then, the available curation actions include: i) corrections of entity names or other entity properties, ii) indication that two or more entity instances refer to the same real-world entity, thus they must have the same identity (manual instance matching), and iii) indication that a specific instance from a set of automatically matched instances is a different entity and thus must have a different identity.

With respect to the curation of vocabulary terms, users can provide a preferred term in English as well as its broader term (if any). The storage of broader terms provides an hierarchy for the terms, which can be very useful when exploring the data. For example, one can retrieve all data related to a general term through its narrow terms.

An important characteristic of FastCat Team is that it does not alter the data in the records as transcribed from the original sources. It achieves this by storing the curated data in a different database and maintaining the links to the original data. Maintaining this provenance information is very important for data verification and long-term validity, but also because data consolidation may be ambiguous and require further research and repeated revision at any time in the future.

More information about FastCat (and FastCat Team) is available here and in the following publication:

P. Fafalios, K. Petrakis, G. Samaritakis, K. Doerr, A. Kritsotaki, Y. Tzitzikas, and M. Doerr,
"FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities",
ACM Journal on Computing and Cultural Heritage, 2021.

(Paper PDF | BIB Entry)

Manuals

Getting Started

Built With

Dependency

FastCat uses handsontable library for some data entry functionalities. If you use this project for commercial purposes (whether in internal or externally facing projects), you need to purchase a Handsontable license.

Prerequisites

  • Java
  • Tomcat
  • CouchDB (installation documentation can be found here)

Installation and deployment

  1. Database configuration

    After successful CouchDB installation, database must have the following structure:

       ├── _users
       ├── admin
       ├── instances
       │   ├── instance 1
       │   ├── instance 2
       │   └── ...
       ├── public_records
       │   ├── record 1
       │   ├── record 2
       │   └── ...
       ├── public_vocabs
       └── templates
           ├── template 1
           └── etc
    
  2. Clone the repo

    git clone https://github.com/isl/FastCat.git
  3. The project is written mainly in JavaScript so it can be deployed directly on a web server (eg. Tomcat v7 or greater). Before deployment a basic configuration must be done by editing the database urls in the /js/global.js file

        "config": {
            "": {
                "http:": "http://[URL]:[PORT]",
                "https:": "https://[URL]"
            }      
        }

Configuration of Transcription Templates

Current Configuration

FastCat is currently configured for the case of archival documents of Maritime History, in the context of the SeaLiT Project. Specifically, the below 20 templates are already available, each one representing a type of archival source:

  • Crew List (Ruoli di Equipaggio) (example record here)
  • Crew and displacement list (Roll) (example record here)
  • General Spanish Crew List (example record here)
  • Accounts book (example record here)
  • Payroll (of Greek ships) (example record here)
  • Payroll (of Russian Steam Navigation and Trading Company) (example record here)
  • Logbook (example record here)
  • Census La Ciotat (example record here)
  • First national all-Russian census of the Russian Empire (example record here)
  • Civil Register (example record here)
  • Inscription Maritime - Maritime Register of the State for La Ciotat (example record here)
  • List of ships (example record here)
  • Naval Ship Register List (example record here)
  • Register of Maritime personel (example record here)
  • Register of Maritime workers (Matricole della gente di mare) (example record here)
  • Sailors register (Libro de registro de marineros) (example record here)
  • Seagoing Personel (example record here)
  • Students Register (example record here)
  • Employment records (Shipyards of Messageries Maritimes, La Ciotat) (example record here)
  • Notarial Deeds (example record here)

Creation of a new FastCat template

To create a new template the following steps must be followed:

  1. FastCat Application

    Each template consists of two files:

    • templates/<template_name>.html example

    • templates/js/<template_name>.js example

    By editing these two files user can create/modify templates.

  2. Database

    Add the new template to the database:

    • Go to templates directory and copy the JSON of one existing template eg:
           {
           "_id": "Accounts book",
           "_rev": "24-2bc1faca593f4c74e7a707eb2cccdc15",
           "keywords": "Accounts book",
           "sourceLanguage": "Greek",
           "title": "Accounts book",
           "organization": "FORTH/IMS",
           "vocabularies": [
             {
               "id": "collection_gr",
               "label": "Collection",
               "broader": "-"
             }
           ]
         }
    • Return to templates directory and create a new document
    • Paste the json you copied before
    • Delete the "_rev" row and change the "_id" value to be exactly the same as the name of the new template <template_name>

Removing a template

Το remove a template, just delete the corresponding entry from the templates directory in the database.

Contact

Acknowledgements

This work has received funding from the European Union's Horizon 2020 research and innovation programme under i) the European Research Council (ERC) grant agreement No 714437 (Project SeaLiT), and ii) the Marie Sklodowska-Curie grant agreement No 890861 (Project ReKnow).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •