FastCat is a Web-based system designed for historians and other researchers who need to manually digitize structured and semi structured archival documents in a fast and accurate way to create their research dataset. It combines the ease of use and quick data entry functions of the classic spreadsheet with the information accuracy typically associated with a complex database. It does so by offering data entry templates designed to mirror, in the digital space, the structure and data entry logic of the original source.
In FastCat, archival documents are transcribed as ‘records’ belonging to specific ‘templates’, where a ‘template’ represents the structure of a single type of archival source. A record organizes the data and metadata in tables, offering functionalities like nesting tables and selection of term from a vocabulary. The system runs locally inside any modern web browser with possibility of automated synchronisation with an online database.
Data Curation with FastCat Team
FastCat Team is a special environment within FastCat that allows the collaborative curation of the transcribed data through the management of 'entities' and 'vocabulary terms'. With respect to the management of entities, users can inspect the main entity instances that appear in the data (e.g., names of persons or locations) and start curating them. Here a first automated curation step considers a set of rules for giving the same identity to a set of entity instances having some common characteristics. Then, the available curation actions include: i) corrections of entity names or other entity properties, ii) indication that two or more entity instances refer to the same real-world entity, thus they must have the same identity (manual instance matching), and iii) indication that a specific instance from a set of automatically matched instances is a different entity and thus must have a different identity.
With respect to the curation of vocabulary terms, users can provide a preferred term in English as well as its broader term (if any). The storage of broader terms provides an hierarchy for the terms, which can be very useful when exploring the data. For example, one can retrieve all data related to a general term through its narrow terms.
An important characteristic of FastCat Team is that it does not alter the data in the records as transcribed from the original sources. It achieves this by storing the curated data in a different database and maintaining the links to the original data. Maintaining this provenance information is very important for data verification and long-term validity, but also because data consolidation may be ambiguous and require further research and repeated revision at any time in the future.
More information about FastCat (and FastCat Team) is available here and in the following publication:
P. Fafalios, K. Petrakis, G. Samaritakis, K. Doerr, A. Kritsotaki, Y. Tzitzikas, and M. Doerr,
"FAST CAT: Collaborative Data Entry and Curation for Semantic Interoperability in Digital Humanities",
ACM Journal on Computing and Cultural Heritage, 2021.
Manuals
-
Instructions on how to use FastCat: FastCat User Guide
-
Instructions on how to use FastCat Team: FastCat Team User Guide
FastCat uses handsontable library for some data entry functionalities. If you use this project for commercial purposes (whether in internal or externally facing projects), you need to purchase a Handsontable license.
- Java
- Tomcat
- CouchDB (installation documentation can be found here)
-
Database configuration
After successful CouchDB installation, database must have the following structure:
├── _users ├── admin ├── instances │ ├── instance 1 │ ├── instance 2 │ └── ... ├── public_records │ ├── record 1 │ ├── record 2 │ └── ... ├── public_vocabs └── templates ├── template 1 └── etc
-
Clone the repo
git clone https://github.com/isl/FastCat.git
-
The project is written mainly in JavaScript so it can be deployed directly on a web server (eg. Tomcat v7 or greater). Before deployment a basic configuration must be done by editing the database urls in the /js/global.js file
"config": { "": { "http:": "http://[URL]:[PORT]", "https:": "https://[URL]" } }
Current Configuration
FastCat is currently configured for the case of archival documents of Maritime History, in the context of the SeaLiT Project. Specifically, the below 20 templates are already available, each one representing a type of archival source:
- Crew List (Ruoli di Equipaggio) (example record here)
- Crew and displacement list (Roll) (example record here)
- General Spanish Crew List (example record here)
- Accounts book (example record here)
- Payroll (of Greek ships) (example record here)
- Payroll (of Russian Steam Navigation and Trading Company) (example record here)
- Logbook (example record here)
- Census La Ciotat (example record here)
- First national all-Russian census of the Russian Empire (example record here)
- Civil Register (example record here)
- Inscription Maritime - Maritime Register of the State for La Ciotat (example record here)
- List of ships (example record here)
- Naval Ship Register List (example record here)
- Register of Maritime personel (example record here)
- Register of Maritime workers (Matricole della gente di mare) (example record here)
- Sailors register (Libro de registro de marineros) (example record here)
- Seagoing Personel (example record here)
- Students Register (example record here)
- Employment records (Shipyards of Messageries Maritimes, La Ciotat) (example record here)
- Notarial Deeds (example record here)
Creation of a new FastCat template
To create a new template the following steps must be followed:
-
FastCat Application
Each template consists of two files:
By editing these two files user can create/modify templates.
-
Database
Add the new template to the database:
- Go to templates directory and copy the JSON of one existing template eg:
{ "_id": "Accounts book", "_rev": "24-2bc1faca593f4c74e7a707eb2cccdc15", "keywords": "Accounts book", "sourceLanguage": "Greek", "title": "Accounts book", "organization": "FORTH/IMS", "vocabularies": [ { "id": "collection_gr", "label": "Collection", "broader": "-" } ] }
- Return to templates directory and create a new document
- Paste the json you copied before
- Delete the "_rev" row and change the "_id" value to be exactly the same as the name of the new template <template_name>
Removing a template
Το remove a template, just delete the corresponding entry from the templates directory in the database.
- Fafalios Pavlos fafalios@ics.forth.gr
- Samaritakis Georgios samarita@ics.forth.gr
- Petrakis Kostas cpetrakis@ics.forth.gr
This work has received funding from the European Union's Horizon 2020 research and innovation programme under i) the European Research Council (ERC) grant agreement No 714437 (Project SeaLiT), and ii) the Marie Sklodowska-Curie grant agreement No 890861 (Project ReKnow).