Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'type conversions' in corrections #77

Open
kosloot opened this issue Dec 16, 2019 · 4 comments
Open

'type conversions' in corrections #77

kosloot opened this issue Dec 16, 2019 · 4 comments

Comments

@kosloot
Copy link
Collaborator

kosloot commented Dec 16, 2019

consider this very strange FoliA file:

<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="doc" generator="libfolia-v1.11" version="2.2">
  <metadata type="native">
    <annotations>
      <correction-annotation />
      <text-annotation />
      <pos-annotation set="bla"/>
      <paragraph-annotation />
    </annotations>
  </metadata>
  <text xml:id="bug">
    <correction>
      <new>
        <p xml:id="p">
          <t>paragraaf</t>
        </p>
      </new>
      <original>
        <pos xml:id="s" class="n">
        </pos>
      </original>
    </correction>
  </text>
</FoLiA>

Both foliavalidator and folialint accept this, but I assume this is abusing the correction node.
My impression is, that we don't want a correction to modify the "type" of the subnode.
So i suggest to add some limitation here. preferable that all arguments are of the same type.
Like all <w> or all <t>

@kosloot kosloot changed the title 'type conversions in corrections 'type conversions' in corrections Dec 16, 2019
@proycon
Copy link
Owner

proycon commented Dec 16, 2019

Agreed, type conversions should probably be checked and banned. Especially if it's also a category conversion (like inline annotation to structural as in your example)

kosloot added a commit to LanguageMachines/libfolia that referenced this issue Mar 26, 2024
@kosloot
Copy link
Collaborator Author

kosloot commented Mar 26, 2024

seems solved for libfolia: I added a check on type consistency

@kosloot
Copy link
Collaborator Author

kosloot commented Mar 27, 2024

@proycon your remark: Especially if it's also a category conversion (like inline annotation to structural as in your example) got me thinking.
The solution that I implemented in libfolia is probably too harsh. It disallows changing 2 sentences into 1 paragraph with 2 embedded sentences, like in the example below:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="Walter" generator="libfolia-v2.12" version="2.5.3">
  <metadata type="native">
    <annotations>
      <token-annotation/>
      <paragraph-annotation/>
      <sentence-annotation/>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <correction-annotation/>
    </annotations>
  </metadata>
  <text xml:id="Walter.text">
    <correction xml:id="Walter.correction.1">
      <new>
	<p xml:id="par">
          <s xml:id="Walter.corr.s.1">
            <t>Dit is een zin.</t>
          </s>
          <s xml:id="Walter.corr.s.2">
            <t>Dit is nog een zin.</t>
          </s>
	</p>
      </new>
      <original auth="no">
        <s xml:id="Walter.s.1">
          <t>Dit is een zin.</t>
        </s>
        <s xml:id="Walter.s.2">
          <t>Dit is nog een zin</t>
        </s>
      </original>
    </correction>
  </text>
</FoLiA>

Correcting structure should be possible. And maybe correcting the annotation type too?
This will get rather complicated then.

BUT!!!. Bug alert!
the following file is invalid FoLiA (as it should be)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="Walter" generator="libfolia-v2.12" version="2.5.3">
  <metadata type="native">
    <annotations>
      <token-annotation/>
      <paragraph-annotation/>
      <sentence-annotation/>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <correction-annotation/>
    </annotations>
  </metadata>
  <text xml:id="Walter.text">
    <row xml:id="par">
      <cell>
	<w>
	  <t>Dit is een zin.
	  </t>
	</w>
      </cell>
    </row>
  </text>
</FoLiA>

But we can create this abomination using a correction:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="Walter" generator="libfolia-v2.12" version="2.5.3">
  <metadata type="native">
    <annotations>
      <token-annotation/>
      <paragraph-annotation/>
      <sentence-annotation/>
      <text-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>
      <correction-annotation/>
    </annotations>
  </metadata>
  <text xml:id="Walter.text">
    <correction xml:id="Walter.correction.1">
      <new>
	<row xml:id="par">
          <cell>
	    <w>
	      <t>Dit is een zin.
	      </t>
	    </w>
	  </cell>
	</row>
      </new>
      <original auth="no">
        <s xml:id="Walter.s.1">
          <t>Dit is een zin.</t>
        </s>
      </original>
    </correction>
  </text>
</FoLiA>

This is horrible!. I assume that the functions to check if a tag is appendble should look INTO the correction
Lot of work en thinking is needed!
@proycon please comment

@kosloot
Copy link
Collaborator Author

kosloot commented Mar 27, 2024

Additional questions, about WHICH corrections are acceptable.

  1. Structure to structure, seems OK to me. Like adding a Paragraph around Sentences
  2. Annotation to annotation? Like modifying a Pos to a Lemma ??? scary
  3. Annotation to structure? Or vice versa? That was the original issue, and may be ruled out, I assume

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants