Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ontology parsing error adds extra column #32

Closed
caufieldjh opened this issue Apr 4, 2022 · 3 comments
Closed

Ontology parsing error adds extra column #32

caufieldjh opened this issue Apr 4, 2022 · 3 comments

Comments

@caufieldjh
Copy link
Collaborator

Parsing ABD results in the edges including the following:

urn:uuid:c95fa7fd-b1f0-4f35-83d4-4fce72d8bbc0	http://brd.bsvgateway.org/api/organism/?id=620	biolink:subclass_of	http://brd.bsvgateway.org/api/organism/	rdfs:subClassOf	BioPortal	Anthology of Biosurveillance Diseases
urn:uuid:7a7d352b-0ebb-42a4-a950-fcf914d72fd4	http://brd.bsvgateway.org/api	ransmission/	biolink:subclass_of	owl:Thing	rdfs:subClassOf	BioPortal	Anthology of Biosurveillance Diseases
urn:uuid:b9b23088-74d8-45be-b1d8-9ef0085b1a8d	http://brd.bsvgateway.org/api/organism/?id=628	biolink:subclass_of	http://brd.bsvgateway.org/api/organism/	rdfs:subClassOf	BioPortal	Anthology of Biosurveillance Diseases

The middle line is the issue - this breaks any merge in which these edges are included, raising a pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 1017, saw 8

It looks like there's some truncation/extra whitespace in the id. This ontology has a class named Transmission so this is likely something with the prefix http://brd.bsvgateway.org/api/transmission/
ABD has some other weirdness too, with lots of Error classes. See on Bioportal at https://bioportal.bioontology.org/ontologies/ABD/?p=classes&conceptid=root

@caufieldjh
Copy link
Collaborator Author

This same error is likely to happen with other malformed edge values though, e.g., in ACESO:

urn:uuid:0b5fa719-01c3-41c7-acc1-815dd3edee05	http://purl.bioontology.org/ontology/SNOMEDCT/247997008	biolink:subclass_of	http://ontology.apa.org/apaonto	ermsonlyOUT%20(5).owl#Bullying	rdfs:subClassOf	BioPortal	Adverse Childhood Experiences Ontology

In this case, there's some weird parsing going on with an imported SNOMED term: https://bioportal.bioontology.org/ontologies/SNOMEDCT?p=classes&conceptid=247997008

Same result - wrong number of columns in that entry, pandas raises ParserError, merge fails.

@caufieldjh
Copy link
Collaborator Author

Here's another example from ADALAB:

urn:uuid:f5b2b9c3-7ba3-45e4-8f91-8186dc2a5f7d	http://rdf.adalab-project.org/ontology/adalab	ranscriptionFactor	biolink:type	OBO:INO_0000008	type

This must have something to do with tabs - all the impacted values are missing "/t", but parsed literally rather than as a tab.

@caufieldjh
Copy link
Collaborator Author

Regenerating each transform with version as of commit 76b2709 appears to resolve the issue, e.g. with ADALAB:

urn:uuid:b3e7008f-1a2a-440b-89c6-45fdba059072	http://rdf.adalab-project.org/ontology/adalab/transcriptionFactor	biolink:type	OBO:INO_0000008	type	BioPortal	AdaLab ontology

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant