Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add aliases (short names) for set definitions. #31

Closed
kosloot opened this issue Aug 16, 2017 · 5 comments
Closed

add aliases (short names) for set definitions. #31

kosloot opened this issue Aug 16, 2017 · 5 comments
Assignees
Labels
enhancement ready Implemented but not released yet
Milestone

Comments

@kosloot
Copy link
Collaborator

kosloot commented Aug 16, 2017

At the moment, having more then one annotation set in scope, leads to a lot of bloat, example:

<w xml:id="WR-P-E-J-0000000001.p.1.s.2.w.16">
  <t>genealogie</t>
  <pos class="N(soort,ev,basis,zijd,stan)" set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/frog-mbpos-cgn"/>
  <lemma class="genealogie"/>
  <morphology>
    <morpheme class="complex">
	<t>genealogie</t>
	<feat class="[[genealogisch]adjective[ie]]noun/singular" subset="structure"/>
	<pos class="N" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-clex"/>
	<morpheme class="complex">
           <feat class="N_A*" subset="applied_rule"/>
           <feat class="[[genealogisch]adjective[ie]]noun" subset="structure"/>
           <pos class="N" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-clex"/>
           <morpheme class="stem">
             <t>genealogisch</t>
             <pos class="A" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-clex"/>
           </morpheme>
           <morpheme class="affix">
             <t>ie</t>
             <feat class="[ie]" subset="structure"/>
          </morpheme>
	</morpheme>
	<morpheme class="inflection">
        <feat class="singular" subset="inflection"/>
      </morpheme>
    </morpheme>
  </morphology>
</w>

set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/frog-mbpos-cgn"/>
and especially
set="http://ilk.uvt.nl/folia/sets/frog-mbpos-clex"/>
are repeated a lot

Maybe it is a plan to introduce short-hand labels, like cgg-set and celex-set to avoid all the bloat.

Something like this:

<pos-annotation set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/frog-mbpos-cgn" annotator="frog" annotatortype="auto" label="cgn"/>
<pos-annotation annotator="frog-mbma-1.0" annotatortype="auto" datetime="2
017-04-20T16:48:45" set="http://ilk.uvt.nl/folia/sets/frog-mbpos-clex" label="celex"/>

Everywhere a set is used, you may use the label instead. When serializing the label, if provided, is preferred.
Labels must be unique of course

@proycon
Copy link
Owner

proycon commented Aug 16, 2017

Good idea, I'd suggest calling them alias rather than label perhaps, as label is something in set definitions already (the human readable label).

@proycon proycon added this to the v1.5 milestone Aug 16, 2017
@kosloot
Copy link
Collaborator Author

kosloot commented Aug 29, 2017

I added an 'alias' mechanism to libfolia. In the 'alias' branch for now, as it imposes an ABI breach.

@proycon
Copy link
Owner

proycon commented Sep 7, 2017

Still to be implemented for pynlpl (proycon/pynlpl#33)

@proycon proycon changed the title add labels (short names) for set definitions. add aliases (short names) for set definitions. Sep 25, 2017
proycon added a commit to proycon/pynlpl that referenced this issue Sep 25, 2017
proycon added a commit that referenced this issue Sep 25, 2017
@proycon proycon added ready Implemented but not released yet and removed question labels Sep 29, 2017
@proycon proycon closed this as completed Oct 8, 2017
proycon added a commit to proycon/foliapy that referenced this issue Sep 6, 2018
@kosloot
Copy link
Collaborator Author

kosloot commented Jan 21, 2019

Well....
Given this document:

<?xml version="1.0" encoding="UTF-8"?>
<FoLiA xmlns="http://ilk.uvt.nl/folia" xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="doc" version="0.8" generator="libfolia-v0.4">
  <metadata>
    <annotations>
      <division-annotation set="a-set" alias="a"/>
      <division-annotation set="b-set" alias="b"/>
      <token-annotation set="a-set" alias="b"/>
      <token-annotation set="b-set" alias="a"/>
    </annotations>
  </metadata>
  <text xml:id="text">
    <div set="a-set">
      <s id="s.1">
	<w id="w.1" class="WORD" set="b">
	  <t>test</t>
	</w>
      </s>
    </div>
    <div set="b">
      <s id="s.2">
	<w id="w.2" class="WORD" set="b-set">
	  <t>test</t>
	</w>
      </s>
    </div>
  </text>
</FoLiA>

libfolia's folialint accepts it, but pynlpl's foliavalidator says:

Error on line 5: Invalid attribute alias for element division-annotation
Error on line 5: Element annotations has extra content: division-annotation
Error on line 3: Element metadata failed to validate content
Error on line 2: Element FoLiA failed to validate content
VALIDATION ERROR against RelaxNG schema (stage 1/2), in tests/aliases.xml
Invalid attribute alias for element division-annotation, line 5

which is right here?

@kosloot kosloot reopened this Jan 21, 2019
@proycon
Copy link
Owner

proycon commented Feb 11, 2019

Right, that is addressed and solved in #65 (to be release still), so I think we can close this one.

@proycon proycon closed this as completed Feb 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ready Implemented but not released yet
Projects
None yet
Development

No branches or pull requests

2 participants