Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text offset errot not detected? #15

Closed
kosloot opened this issue Dec 16, 2019 · 1 comment
Closed

text offset errot not detected? #15

kosloot opened this issue Dec 16, 2019 · 1 comment
Assignees
Labels
bug Something isn't working ready Done but not released yet, pending closure on release
Milestone

Comments

@kosloot
Copy link
Collaborator

kosloot commented Dec 16, 2019

In an example given in proycon/folia#75 there seems to be an offset error, which goes undetected by foliavalidator.
full example:

<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="doc" generator="libfolia-v1.11" version="2.2">
  <metadata type="native">
    <annotations>
      <correction-annotation />
      <text-annotation />
      <sentence-annotation />
      <string-annotation />
    </annotations>
  </metadata>
  <text xml:id="bug">
    <s xml:id="s.1">
      <t>Dit is een test</t>
      <t class="ocr">D!t 1S tezt</t>
      <str xml:id="str.1">
        <t offset="0">Dit</t>
        <t offset="0" class="ocr">D!t</t>
      </str>
      <str xml:id="str.2">
        <t offset="4">is</t>
        <t offset="4" class="ocr">1S</t>
      </str>
      <str xml:id="str.4">
        <t offset="11">test</t>
        <t offset="7" class="ocr">tezt</t>
      </str>
      <str xml:id="str.3"> <!-- I'm deliberately messing with the ordering here to emphasise that it has no meaning with strings-->
        <t offset="7">een</t>
      </str>
      <!-- and below an extra string example to emphasise that strings are not tokens: this overlaps with str.1 and str.2) -->
      <str xml:id="str.bonus">
        <t offset="3">t is</t>
        <t offset="3" class="ocr">t 1S</t>
      </str>
    </s>
  </text>
</FoLiA>

foliavalidator is happy with it:

foliavalidator tests/textproblem_3.xml 
Validated successfully: tests/textproblem_3.xml

folialint rejects this:

folialint tests/textproblem_3.xml
tests/textproblem_3.xml failed: Unresolvable text: Text for str(ID=str.bonus, textclass='current'), has incorrect offset 3
	original msg=Unresolvable text: Reference (ID s.1,class='current') found, but no text match at offset=3 Expected 't is' but got ' is '

Which is as desired, while the offset should be '2'.

@proycon proycon self-assigned this Dec 16, 2019
@proycon
Copy link
Owner

proycon commented Dec 16, 2019

You're right, it should be 2 and foliavalidator should detect it.

@proycon proycon transferred this issue from proycon/folia Dec 18, 2019
@proycon proycon added the bug Something isn't working label Mar 8, 2020
@proycon proycon added this to the v2.3.0 milestone Aug 19, 2020
@proycon proycon added the ready Done but not released yet, pending closure on release label Aug 19, 2020
@proycon proycon closed this as completed Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready Done but not released yet, pending closure on release
Projects
None yet
Development

No branches or pull requests

2 participants