CollateX – Software for Collating Textual Sources
HTML Java Python JavaScript Jupyter Notebook CSS Other
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
collatex-core add debug messages Sep 29, 2017
collatex-pythonport group TEI <rdg> elements independently of trailing whitespace Aug 19, 2018
collatex-tools Set timeout on dot process Feb 15, 2018
docs update version number in documentation Aug 16, 2018
tmp Ignore stuff Jun 21, 2017
.editorconfig Unified code formatting; made editor configuration explicit Feb 14, 2015
.gitignore ignore build and dist directories Aug 10, 2018
CREDITS Import JSuffixArrays source and Anton's adapter for generic comparabl… Oct 9, 2014
LICENSE.txt updated license information Jan 2, 2012 Site: documentation, documentation ... Mar 7, 2013 POMs: update to snapshot version for next development cycle Dec 17, 2015
changelog.txt Release of version 1.7.1 Dec 17, 2015 remove executable bit Jun 20, 2017
pom.xml POMs: update to snapshot version for next development cycle Dec 17, 2015

CollateX is a software to

  1. read multiple (≥ 2) versions of a text, splitting each version into parts (tokens) to be compared,
  2. identify similarities of and differences between the versions (including moved/transposed segments) by aligning tokens, and
  3. output the alignment results in a variety of formats for further processing, for instance
  4. to support the production of a critical apparatus or the stemmatical analysis of a text's genesis.

It resembles software used to compute differences between files (e.g. diff) or tools for sequence alignment which are commonly used in Bioinformatics. While CollateX shares some of the techniques and algorithms with those tools, it mainly aims for a flexible and configurable approach to the problem of finding similarities and differences in texts, sometimes trading computational soundness or complexity for the user's ability to influence results.

As such it is primarily designed for use cases in disciplines like Philology or – more specifically – the field of Textual Criticism where the assessment of findings is based on interpretation and therefore can be supported by computational means but is not necessarily computable.

Please go to for further information.