TEI output error (CollateX Python 2.2) #67

djbpitt · 2018-08-19T11:35:23Z

Given:

%reload_ext autoreload
%autoreload 2
from collatex import *
collation = Collation()
collation.add_plain_witness("A","The big gray koala.")
collation.add_plain_witness("B", "The big gray koala.")
collation.add_plain_witness("C","The gray fuzzy koala lives in a tree.")
table = collate(collation)
print(table)

table alignment is correct:

+---+-----+-----+------+-------+-------+-----------------+---+
| A | The | big | gray | -     | koala | -               | . |
| B | The | big | gray | -     | koala | -               | . |
| C | The | -   | gray | fuzzy | koala | lives in a tree | . |
+---+-----+-----+------+-------+-------+-----------------+---+

but TEI alignment doesn’t recognize that all instances of “koala” agree. When we run:

tei = collate(collation, output="tei", indent=True)
print(tei)

we get:

<?xml version="1.0" ?>
<cx:apparatus xmlns="http://www.tei-c.org/ns/1.0" xmlns:cx="http://interedition.eu/collatex/ns/1.0">
	The 
	<app>
		<rdg wit="#A #B">big</rdg>
		<rdg wit="#C"/>
	</app>
	 
	gray 
	<app>
		<rdg wit="#C">fuzzy</rdg>
		<rdg wit="#A #B"/>
	</app>
	 
	<app>
		<rdg wit="#A #B">koala</rdg>
		<rdg wit="#C">koala</rdg>
	</app>
	 
	<app>
		<rdg wit="#C">lives in a tree</rdg>
		<rdg wit="#A #B"/>
	</app>
	.
</cx:apparatus>

The “koala” readings all agree, and therefore should be output as plain text, and not inside a <rdg>.

Furthermore, there should not be two <rdg> children of the same <app> that have the same textual content. If we add another witness to remove the exact equality:

%reload_ext autoreload
%autoreload 2
from collatex import *
collation = Collation()
collation.add_plain_witness("A","The big gray koala.")
collation.add_plain_witness("B", "The big gray koala.")
collation.add_plain_witness("D", "The big gray wombat.")
collation.add_plain_witness("C","The gray fuzzy koala lives in a tree.")
table = collate(collation,segmentation=False, near_match=True)
print(table)

The table output is again correct:

+---+-----+-----+------+-------+--------+-------+----+---+------+---+
| A | The | big | gray | -     | koala  | -     | -  | - | -    | . |
| B | The | big | gray | -     | koala  | -     | -  | - | -    | . |
| D | The | big | gray | -     | wombat | -     | -  | - | -    | . |
| C | The | -   | gray | fuzzy | koala  | lives | in | a | tree | . |
+---+-----+-----+------+-------+--------+-------+----+---+------+---+

but the TEI output incorrectly puts the koalas in different <rdg> elements, so that:

tei = collate(collation, output="tei", indent=True, segmentation=False, near_match=True)
print(tei)

outputs:

<?xml version="1.0" ?>
<cx:apparatus xmlns="http://www.tei-c.org/ns/1.0" xmlns:cx="http://interedition.eu/collatex/ns/1.0">
	The 
	<app>
		<rdg wit="#A #B #D">big</rdg>
		<rdg wit="#C"/>
	</app>
	 
	gray 
	<app>
		<rdg wit="#C">fuzzy</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#A #B">koala</rdg>
		<rdg wit="#C">koala</rdg>
		<rdg wit="#D">wombat</rdg>
	</app>
	 
	<app>
		<rdg wit="#C">lives</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#C">in</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#C">a</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#C">tree</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	.
</cx:apparatus>

These may be consequences of a single problem, the failure to recognize that the koalas belong together.

The text was updated successfully, but these errors were encountered:

djbpitt · 2018-08-19T14:30:42Z

Fixed in #68

djbpitt added python bug labels Aug 19, 2018

djbpitt closed this as completed Aug 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEI output error (CollateX Python 2.2) #67

TEI output error (CollateX Python 2.2) #67

djbpitt commented Aug 19, 2018

djbpitt commented Aug 19, 2018

TEI output error (CollateX Python 2.2) #67

TEI output error (CollateX Python 2.2) #67

Comments

djbpitt commented Aug 19, 2018

djbpitt commented Aug 19, 2018