Skip to content

This repository contains CTSized Ancient Greek Literature texts

Notifications You must be signed in to change notification settings

gcelano/CTSAncientGreekXML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Tokenized and sentence-splitted CTSized Ancient Greek Texts (v1.1.0)

This repository contains the graphic-word tokenized texts of the following two repositories (I also provide them in zipped format):

The texts have been generated completely automatically from the original XML files which are well-formed and CTS-compliant (some are not). Some conversion errors are already known to be ascribable to annotation inconsistencies/errors in the original files (which errors I have not tried to solve). For example, an inconsistent cts-urn location in the xml file or lack of numeration for each verse in a poem will generate errors (typically missing text).

Check the XQuery module in the scripts folder for details.

Each file contains the following information:

  • the @p attribute lists the passage (the full cts urn derives from merging this value and the cts urn of the text in the @text-cts attribute in the text element)
  • the @n attribute shows the running number id for each word (numeration starts again as the passage changes)
  • the text() of each t element contains the word form
  • the optional @join attribute specifies whether a punctuation mark should be attached to either the preceding (b) or the following (a) word.
  • the optional @tag element shows some special elements which contained the given word: more precisely, the add, del, unclear, surplus, supplied and seg elements, which can be of interest to identify editorial interventions.

Changes from previous releases

From release 1.0.0:

  • Correction to the cts-urn structure by considering the elements seg and p (currently div, seg, p, and l are considered)
  • Addition of sentence split (on the basis of the following characters: ".", "·", ";", ":")

Cite

Cite the following work thus:

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.