UBERON synonyms containing extended characters #1173

Closed
paolaroncaglia opened this Issue Nov 11, 2015 · 9 comments

Comments

Projects
None yet
3 participants
@paolaroncaglia

Hi,

Marking as urgent because this issue is preventing Gene Ontology editors to commit changes using OBO-Edit (we load UBERON along with GO, and we still use OBO-Edit for tasks that can't be performed easily in Protege such as adding taxon constraints). These are the culprits:

4 fatal errors:
anal fin (UBERON:4000163) generated 1 error:
Synonym 1 of UBERON:4000163 cannot contain extended characters.
caudal fin (UBERON:4000164) generated 1 error:
Synonym 3 of UBERON:4000164 cannot contain extended characters.
celiac artery (UBERON:0001640) generated 1 error:
Synonym 1 of UBERON:0001640 cannot contain extended characters.
median fin (UBERON:4000162) generated 1 error:
Synonym 1 of UBERON:4000162 cannot contain extended characters.

The fatal errors are like this - look at the synonyms at the top of this entry:

http://www.ebi.ac.uk/ols/beta/ontologies/uberon/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FUBERON_4000164

Melanie (Courtot) confirmed that the error is in the source Uberon:

<!-- http://purl.obolibrary.org/obo/UBERON_4000164 -->

<owl:Class rdf:about="http://purl.obolibrary.org/obo/UBERON_4000164">
    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">caudal fin</rdfs:label>
    <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/UBERON_4000162"/>
    <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The caudal fin is the most posterior median fin. It is composed of a complex of modified centra and modified neural and hemal arches and spines. </obo:IAO_0000115>
    <core:provenance_notes rdf:datatype="http://www.w3.org/2001/XMLSchema#string">This class was sourced from an external ontology (vertebrate_skeletal_anatomy). Its definitions, naming conventions and relationships may need to be checked for compatibility with uberon</core:provenance_notes>
    <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ZFA:0001058</oboInOwl:hasDbXref>
    <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">nageoire caudale</oboInOwl:hasExactSynonym>
    <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">tail</oboInOwl:hasRelatedSynonym>
    <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">tail fin</oboInOwl:hasRelatedSynonym>
    <oboInOwl:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">uberon/phenoscape-anatomy</oboInOwl:hasOBONamespace>
    <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">uropt�re</oboInOwl:hasExactSynonym>
    <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">uropt�rygie</oboInOwl:hasExactSynonym>
    <oboInOwl:created_by rdf:datatype="http://www.w3.org/2001/XMLSchema#string">vertebrate_skeletal_anatomy_curators</oboInOwl:created_by>
</owl:Class>

Seems like someone was editing in UTF-8 and there was an issue with conversion (potentially with OBO edit which AFAIK supports only ASCII)

Could this be fixed asap please?
Many thanks in advance.

Paola

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Nov 11, 2015

Member

These synonyms seem to be housed in the Phenoscape ext file (http://sourceforge.net/p/phenoscape/code/HEAD/tree/trunk/vocab/edit/phenoscape-ext.owl), but it looks like this text encoding problem has been there for a long time. They should be fixed though (but would still have accents).

OBO-Edit should support UTF-8, unless something has broken in the last couple of years. There is a setting in Configuration Manager to allow extended characters—I wonder if that would help you in the immediate term.

Member

balhoff commented Nov 11, 2015

These synonyms seem to be housed in the Phenoscape ext file (http://sourceforge.net/p/phenoscape/code/HEAD/tree/trunk/vocab/edit/phenoscape-ext.owl), but it looks like this text encoding problem has been there for a long time. They should be fixed though (but would still have accents).

OBO-Edit should support UTF-8, unless something has broken in the last couple of years. There is a setting in Configuration Manager to allow extended characters—I wonder if that would help you in the immediate term.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Nov 11, 2015

Member

Problem averted for GO with this fix to the module extraction pipeline: http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/ontology/extensions/Makefile?r1=29572&r2=29750

Member

cmungall commented Nov 11, 2015

Problem averted for GO with this fix to the module extraction pipeline: http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/ontology/extensions/Makefile?r1=29572&r2=29750

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Nov 11, 2015

Member

@balhoff any idea how to fix them in source?

Member

cmungall commented Nov 11, 2015

@balhoff any idea how to fix them in source?

@cmungall cmungall changed the title from URGENT - UBERON synonyms containing extended characters to UBERON synonyms containing extended characters Nov 11, 2015

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Nov 11, 2015

Member

Yes, I should be able to update them. They are all correct in TAO (e.g. uroptère) so it must have been a mistaken encoding when TAO was pulled into Uberon.

Member

balhoff commented Nov 11, 2015

Yes, I should be able to update them. They are all correct in TAO (e.g. uroptère) so it must have been a mistaken encoding when TAO was pulled into Uberon.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Nov 11, 2015

Member

Thanks Jim - I'll close this ticket if it survives the pipeline into the next release

Member

cmungall commented Nov 11, 2015

Thanks Jim - I'll close this ticket if it survives the pipeline into the next release

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Nov 11, 2015

Member

Sounds good. Should we be using language tags for non-English synonyms?

Member

balhoff commented Nov 11, 2015

Sounds good. Should we be using language tags for non-English synonyms?

@paolaroncaglia

This comment has been minimized.

Show comment
Hide comment
@paolaroncaglia

paolaroncaglia Nov 11, 2015

Just to confirm that we can save GO in OE fine now. Chris further remade the modules in r29754.
Thanks very much to both of you for your prompt help!

Paola

Just to confirm that we can save GO in OE fine now. Chris further remade the modules in r29754.
Thanks very much to both of you for your prompt help!

Paola

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Nov 20, 2015

Member
$ grep 'uroptère' uberon.obo
synonym: "uroptère" EXACT [PSPUB:0000135]
synonym: "uroptère" EXACT [PSPUB:0000135]

will be fixed in next release

Member

cmungall commented Nov 20, 2015

$ grep 'uroptère' uberon.obo
synonym: "uroptère" EXACT [PSPUB:0000135]
synonym: "uroptère" EXACT [PSPUB:0000135]

will be fixed in next release

@cmungall cmungall closed this Nov 20, 2015

balhoff added a commit to obophenotype/uberon-phenoscape-ext that referenced this issue Feb 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment