Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation Issue (coarse.meto-fine.comp) #14

Open
EmanuelaBoros opened this issue Oct 17, 2023 · 1 comment · May be fixed by #15
Open

Annotation Issue (coarse.meto-fine.comp) #14

EmanuelaBoros opened this issue Oct 17, 2023 · 1 comment · May be fixed by #15
Assignees
Labels
bug Something isn't working

Comments

@EmanuelaBoros
Copy link

Hello!

I've noticed a possible missing entity type in COARSE-METO in HIPE-2022-v2.1-hipe2020-train-fr.tsv, where M. Théodore Reinach should (possibly) be a pers.ind (line 2,141-2,150):

M	O	O	O	O	B-comp.title	O	_	_	NoSpaceAfter
.	O	O	O	O	I-comp.title	O	_	_	_
Théodore	O	O	O	O	B-comp.name	O	_	_	_
Reinach	O	O	O	O	I-comp.name	O	_	_	NoSpaceAfter
,	O	O	O	O	O	O	_	_	_
député	O	O	O	O	B-comp.function	O	_	_	_
radical	O	O	O	O	I-comp.function	O	_	_	_
de	O	O	O	O	I-comp.function	O	_	_	_
la	O	O	O	O	I-comp.function	O	_	_	EndOfLine
Savoie	B-loc	O	B-loc.adm.reg	O	I-comp.function	O	Q12745	_	NoSpaceAfter

Due to several evaluation processes on my side, I'll be checking more in depth other annotated files also, and open an issue for each (if any).

@e-maud
Copy link
Contributor

e-maud commented Nov 3, 2023

Many thanks for spotting.

After some investigations, here are some (strange) elements, for information and memo.

  • The mention is in document EXP-1908-01-21-a-i0053 line 2130 (in HIPE-2022-data file and in CLEF-HIPE-2020-internal file).

  • In INCEpTION, the mention appears correctly annotated:

  • In the exported annotations in EXP-1908-01-21-a-i0053.xmi (CLEF-HIPE-2020-internal), the annotation is not there. See lines 1075 and after (permalink), where only comp.title|comp.name|comp.functionare exported:
    <custom:ImpressoNamedEntity xmi:id="12992" sofa="1" begin="2104" end="2106" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.title"/>
    <custom:ImpressoNamedEntity xmi:id="13005" sofa="1" begin="2107" end="2123" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.name"/>
    <custom:ImpressoNamedEntity xmi:id="13018" sofa="1" begin="2125" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="comp.function"/>
    <custom:ImpressoNamedEntity xmi:id="13031" sofa="1" begin="2146" end="2152" is_NIL="false" literal="false" noisy_ocr="false" unsolvable="false" unsolvable_linking="false" value="loc.adm.reg" wikidata_id="http://www.wikidata.org/entity/Q12745"/>

We will correct this in the current data and log the change, but since it seems to come from the xmi2IOB step, I wonder if other (complex) entities are also lost in the way, and whether it is worth checking the export code.

@mromanello What do you think?

@e-maud e-maud added the bug Something isn't working label Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants