New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Portal Search - EnvO #361
Comments
One actionable (and quick) task here is to store queries for later analysis. We can make that a standalone task while the rest of the issue is fleshed out. |
This is fuzzy to me... @pvangay can you clarify how this issue relates to the existing Envo browsers - https://sites.google.com/site/environmentontology/ |
The original request is to improve how the EnvO terms are displayed on the portal to reflect the hierarchy (similar to what's at https://www.ebi.ac.uk/ols/ontologies/envo). Each of the 3 terms have an underlying hierarchical structure -- what's on the portal now is flat. But before any implementation, we need to have a broader discussion about how we should expose EnvO on the portal. I have lots of ideas about how researchers would/could use EnvO for search/refinement/etc. but none of them are backed up by actual data :) -- which indicates this is definitely an opportunity to put something in front of users to get feedback. Yet, what do we put in front of users? This? Or are there alternatives? Tagging @cmungall because I thought he had some ideas. |
We should think carefully about the strategy of getting feedback from users where (a) the data we have doesn't have the range of environments we will have in the future (b) we are asking them to imagine without putting forward specific possibilities. I'll give a high level description of a general strategy for ontological faceting for now but I think there is a more detailed discussion to be had about scientific use cases, UI/UX, and ontology content. I like how the current facets are dynamic and driven by the content in the database. Here are some small changes that can improve things:
For example, right now if we click on feature we see: there are many terms like river, stream, watercourse, etc. But note you get the same results if you select one or another. No discriminatory power. You can see why if we feed these terms into a graph viewer (I am not suggesting we do this for users, this is by way of explanation for us in NMDC): The exact term that was used to annotate the samples was "river", it's good that the facet browsing uses inference such that querying for "water body" correctly gives you annotations to "river". But unless there is annotations to other "water body" concepts like say "hypoxic lake" you don't get any value from filtering by intermediate terms. If you trim out terms that are not MRCA of any pair of samples, then you get a list of terms each of which yields different sets of samples. This is also a tractable set for nesting the facets visually. For example, let's say we had drilled down to a set of samples that were collected from 3 different environments:
the subgraph induced by these terms is: when you strip to MRCAs you get:
I would say that nesting the facets by rendering as a tree in this way is a good way to provide dynamic drill down that leverages the ontology groupings, but that is a wider UI/UX decision (trimming by MRCA removes polyhierarchical aspects, there are other strategies to get a tree rendering) We can also interleave this strategy with curated subsets for intermediate nodes. E.g. we may decide that "watercourse" is not a useful grouping level, if we exclude it then the MRCA of creek and river will then be "lotic water body". We may decide that this also is not a useful grouping, in which case all roll up to 'water body' We could also combine the 3 facets into one hierarchy this way. Note that this exact same strategy could be used for any hierarchical system - e.g. KEGG for the function classification. We have code in js and python for doing some of this kind of thing (there are a few engineering challenges - eg do you load the ontology ahead of time into the client, or do pre-processing of the facets on the server side?) Straw man proposal for proceeding
|
@subdavis Spoke with Chris regarding the questions you raised yesterday (How do I proceed?). The two pieces I think you need to display the EnvO terms as nested in hierarchy are:
If you have additional questions - please comment |
I believe I've been able to describe a process for building the simplified tree in this notebook: https://observablehq.com/@jeffbaumes/ontology-directed-acyclic-graph-simplification This in JavaScript but we could implement this similarly in Python on data ingestion and make it available to the client as the static tree to use for navigating EnvO. @cmungall does this match what you had in mind? |
@cmungall I will leave this assigned to you for now and move to the August sprint. Let me know if it should be assigned to someone else. |
@jeffbaumes - as discussed briefly in the call the other day, I think this is great for a first iteration. I think the step where you make it a tree could lose information that may be relevant to the most optimal trimmed tree, but we can try more later, it will certainly be better than a flat list1 And also as discussed this should take care naturally of filtering the non-informative upper level terms |
@jeffbaumes and @subdavis do you need anything else from anyone for this issue? Let me know if I can help. |
@zachmullen see microbiomedata/nmdc-ontology#4 (comment) for the new data. |
@zachmullen and @jeffbaumes any update on this? Are you still actively working on this? I can move it to the September sprint but if you're not working on it I can remove it and add the backlog label. Let me know. Thank you. |
I think we can call this one done. |
That's great! I'll close it. Thank you. |
This is a request to work on navigation for EnvO such that we can put something in front of users and iterate. EnvO has a high learning curve and the portal has potential to be integrated into training efforts if done well. What's in the portal now does not show relationships between EnvO terms and is not intuitive.
This will require working with Chris and Aim 1 (I do not expect Kitware to solve this on their own). Integrating search with EnvO support for aliases would be really powerful. Can we track the searches people are doing? (to try to understand EnvO vs. GOLD)
Priority - High
Urgency - High
The text was updated successfully, but these errors were encountered: