Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEI output error (CollateX Python 2.2) #67

Closed
djbpitt opened this issue Aug 19, 2018 · 1 comment
Closed

TEI output error (CollateX Python 2.2) #67

djbpitt opened this issue Aug 19, 2018 · 1 comment

Comments

@djbpitt
Copy link
Collaborator

djbpitt commented Aug 19, 2018

Given:

%reload_ext autoreload
%autoreload 2
from collatex import *
collation = Collation()
collation.add_plain_witness("A","The big gray koala.")
collation.add_plain_witness("B", "The big gray koala.")
collation.add_plain_witness("C","The gray fuzzy koala lives in a tree.")
table = collate(collation)
print(table)

table alignment is correct:

+---+-----+-----+------+-------+-------+-----------------+---+
| A | The | big | gray | -     | koala | -               | . |
| B | The | big | gray | -     | koala | -               | . |
| C | The | -   | gray | fuzzy | koala | lives in a tree | . |
+---+-----+-----+------+-------+-------+-----------------+---+

but TEI alignment doesn’t recognize that all instances of “koala” agree. When we run:

tei = collate(collation, output="tei", indent=True)
print(tei)

we get:

<?xml version="1.0" ?>
<cx:apparatus xmlns="http://www.tei-c.org/ns/1.0" xmlns:cx="http://interedition.eu/collatex/ns/1.0">
	The 
	<app>
		<rdg wit="#A #B">big</rdg>
		<rdg wit="#C"/>
	</app>
	 
	gray 
	<app>
		<rdg wit="#C">fuzzy</rdg>
		<rdg wit="#A #B"/>
	</app>
	 
	<app>
		<rdg wit="#A #B">koala</rdg>
		<rdg wit="#C">koala</rdg>
	</app>
	 
	<app>
		<rdg wit="#C">lives in a tree</rdg>
		<rdg wit="#A #B"/>
	</app>
	.
</cx:apparatus>

The “koala” readings all agree, and therefore should be output as plain text, and not inside a <rdg>.

Furthermore, there should not be two <rdg> children of the same <app> that have the same textual content. If we add another witness to remove the exact equality:

%reload_ext autoreload
%autoreload 2
from collatex import *
collation = Collation()
collation.add_plain_witness("A","The big gray koala.")
collation.add_plain_witness("B", "The big gray koala.")
collation.add_plain_witness("D", "The big gray wombat.")
collation.add_plain_witness("C","The gray fuzzy koala lives in a tree.")
table = collate(collation,segmentation=False, near_match=True)
print(table)

The table output is again correct:

+---+-----+-----+------+-------+--------+-------+----+---+------+---+
| A | The | big | gray | -     | koala  | -     | -  | - | -    | . |
| B | The | big | gray | -     | koala  | -     | -  | - | -    | . |
| D | The | big | gray | -     | wombat | -     | -  | - | -    | . |
| C | The | -   | gray | fuzzy | koala  | lives | in | a | tree | . |
+---+-----+-----+------+-------+--------+-------+----+---+------+---+

but the TEI output incorrectly puts the koalas in different <rdg> elements, so that:

tei = collate(collation, output="tei", indent=True, segmentation=False, near_match=True)
print(tei)

outputs:

<?xml version="1.0" ?>
<cx:apparatus xmlns="http://www.tei-c.org/ns/1.0" xmlns:cx="http://interedition.eu/collatex/ns/1.0">
	The 
	<app>
		<rdg wit="#A #B #D">big</rdg>
		<rdg wit="#C"/>
	</app>
	 
	gray 
	<app>
		<rdg wit="#C">fuzzy</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#A #B">koala</rdg>
		<rdg wit="#C">koala</rdg>
		<rdg wit="#D">wombat</rdg>
	</app>
	 
	<app>
		<rdg wit="#C">lives</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#C">in</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#C">a</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	 
	<app>
		<rdg wit="#C">tree</rdg>
		<rdg wit="#A #B #D"/>
	</app>
	.
</cx:apparatus>

These may be consequences of a single problem, the failure to recognize that the koalas belong together.

@djbpitt
Copy link
Collaborator Author

djbpitt commented Aug 19, 2018

Fixed in #68

@djbpitt djbpitt closed this as completed Aug 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant