Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrieve sorted classes and class descendants #165

Closed
graybeal opened this issue Apr 29, 2020 · 11 comments
Closed

retrieve sorted classes and class descendants #165

graybeal opened this issue Apr 29, 2020 · 11 comments

Comments

@graybeal
Copy link
Contributor

Marcos has written up a detailed description of a capability CEDAR would benefit from, to be able to request sorted list of classes in an ontology, or class descendants from each class. It is described at https://docs.google.com/document/d/1rdU65dAubkRytCpXxTFZXsbqaAgLIgZoTTuaNZT4WDk/edit.

@mdorf
Copy link
Member

mdorf commented May 12, 2020

Marcos' suggested approach is indeed the most optimal here. I am implementing this feature using the search endpoint. The signatures will be as follows:

Ontology Classes

http://data.bioontology.org/search?ontologies=NCIT&sort=prefLabel

Descendants Classes

http://localhost:9393/search?subtree_ontology=NCIT&subtree_root_id=http%3A%2F%2Fncicb.nci.nih.gov%2Fxml%2Fowl%2FEVS%2FThesaurus.owl%23C16275&sort=prefLabel

As of now, prefLabel will be the only sortable parameter. If multiple ontologies are passed via the ontologies parameter, the sorting will be as follows:

acronym asc, prefLabel asc

@graybeal
Copy link
Contributor Author

Does that sort order give prefLabel prioritory (because it is last)? I expect that's what CEDAR would want, to sort by prefLabel overall, and by acronym only within that. Maybe Marcos can confirm, if he hasn't already.

@marcosmro
Copy link

@graybeal You are right, prefLabel should have the highest priority. Here is an example to clarify the desired behavior:
Ontology O1, with terms aaa1, ccc; Ontology O2, with terms aaa2, bbb.
Expected output: 1) aaa1; 2) aaa2; 3) bbb; 4) ccc

@mdorf
Copy link
Member

mdorf commented May 13, 2020

Got it. I had assumed that you wanted terms grouped by ontology. I'll change the sort order.

mdorf added a commit to ncbo/ontologies_api that referenced this issue Jun 5, 2020
@mdorf
Copy link
Member

mdorf commented Jun 26, 2020

This has been implemented and deployed to production.

@marcosmro
Copy link

marcosmro commented Jul 7, 2020

@mdorf I'm integrating this capability into CEDAR and I've found an issue:

I want to retrieve the descendants of the class "Delivery Procedures", from the NLMVS ontology. If I use the "descendants" call, I get 89 results. However, if I use the search endpoint with the "subtree_root_id" parameter, I get 88 results. The class "Breech extraction with internal podalic version (procedure)" is missing.

Descendants endpoint, 89 results:
http://data.bioontology.org/ontologies/NLMVS/classes/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FNLMVS%2F2.16.840.1.113762.1.4.1045.59/descendants?pagesize=200

Search endpoint, 88 results:
http://data.bioontology.org/search?subtree_ontology=NLMVS&subtree_root_id=http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FNLMVS%2F2.16.840.1.113762.1.4.1045.59&include=prefLabel,synonym,definition&page=1&pagesize=200&sort=prefLabel

This is a relatively small problem in this case (just one missing class), but it shows that there might be an indexing issue that might cause more serious problems in other calls.

@mdorf
Copy link
Member

mdorf commented Jul 7, 2020

There is an index record for the class "Breech extraction with internal podalic version (procedure)" (http://purl.bioontology.org/ontology/SNOMEDCT/302382009), but it doesn't include any parents:

Screen Shot 2020-07-07 at 1 13 47 PM

I plan to re-index NLMVS ontology to see if that addresses the issue.

@mdorf mdorf reopened this Jul 7, 2020
@mdorf
Copy link
Member

mdorf commented Jul 7, 2020

Re-indexing of NLMVS appears to have fixed the issue.

@mdorf mdorf closed this as completed Jul 7, 2020
@mdorf mdorf reopened this Jul 7, 2020
@marcosmro
Copy link

Thanks for fixing that, @mdorf. Unfortunately, it seems that this issue is affecting other ontologies as well. For example, the number of classes from NCIT differs substantially when using the classes endpoint vs. the search endpoint:

http://data.bioontology.org/ontologies/NCIT/classes/ -> 159453 classes
http://data.bioontology.org/search?ontologies=NCIT -> 154149 classes

I wonder if there is there is something related to when the indexing is triggered that needs to be fixed.

@mdorf
Copy link
Member

mdorf commented Jul 7, 2020

Unfortunately, the index is a copy of the data, and, like any copy, it is susceptible to discrepancies. Perhaps we need an automated way to verify that the number of records in the index corresponds with the total number of classes for each ontology.

@jonquet
Copy link

jonquet commented Feb 7, 2022

I would only suggest to add this parameter to the API documentation :
http://data.bioontology.org/documentation#nav_search

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants