Portal Search - EnvO #361

ssarrafan · 2021-05-05T17:44:07Z

This is a request to work on navigation for EnvO such that we can put something in front of users and iterate. EnvO has a high learning curve and the portal has potential to be integrated into training efforts if done well. What's in the portal now does not show relationships between EnvO terms and is not intuitive.

This will require working with Chris and Aim 1 (I do not expect Kitware to solve this on their own). Integrating search with EnvO support for aliases would be really powerful. Can we track the searches people are doing? (to try to understand EnvO vs. GOLD)

Priority - High
Urgency - High

jbeezley · 2021-05-05T18:15:14Z

One actionable (and quick) task here is to store queries for later analysis. We can make that a standalone task while the rest of the issue is fleshed out.

kfagnan · 2021-05-05T23:56:30Z

This is fuzzy to me... @pvangay can you clarify how this issue relates to the existing Envo browsers - https://sites.google.com/site/environmentontology/
https://www.ebi.ac.uk/ols/ontologies/envo

pvangay · 2021-05-06T01:01:10Z

The original request is to improve how the EnvO terms are displayed on the portal to reflect the hierarchy (similar to what's at https://www.ebi.ac.uk/ols/ontologies/envo). Each of the 3 terms have an underlying hierarchical structure -- what's on the portal now is flat.

But before any implementation, we need to have a broader discussion about how we should expose EnvO on the portal. I have lots of ideas about how researchers would/could use EnvO for search/refinement/etc. but none of them are backed up by actual data :) -- which indicates this is definitely an opportunity to put something in front of users to get feedback. Yet, what do we put in front of users? This? Or are there alternatives? Tagging @cmungall because I thought he had some ideas.

cmungall · 2021-05-06T02:16:33Z

We should think carefully about the strategy of getting feedback from users where (a) the data we have doesn't have the range of environments we will have in the future (b) we are asking them to imagine without putting forward specific possibilities.

I'll give a high level description of a general strategy for ontological faceting for now but I think there is a more detailed discussion to be had about scientific use cases, UI/UX, and ontology content.

I like how the current facets are dynamic and driven by the content in the database. Here are some small changes that can improve things:

nest the facets using the is-a/part-of graph
use an envo slim to eliminate 'astronomical body part' and the like
only include MRCAs of any directly used terms

For example, right now if we click on feature we see:

there are many terms like river, stream, watercourse, etc. But note you get the same results if you select one or another. No discriminatory power.

You can see why if we feed these terms into a graph viewer (I am not suggesting we do this for users, this is by way of explanation for us in NMDC):

The exact term that was used to annotate the samples was "river", it's good that the facet browsing uses inference such that querying for "water body" correctly gives you annotations to "river". But unless there is annotations to other "water body" concepts like say "hypoxic lake" you don't get any value from filtering by intermediate terms.

If you trim out terms that are not MRCA of any pair of samples, then you get a list of terms each of which yields different sets of samples. This is also a tractable set for nesting the facets visually.

For example, let's say we had drilled down to a set of samples that were collected from 3 different environments:

hypoxic lake (20 samples)
river (30 samples)
tidal creek (40 samples)

the subgraph induced by these terms is:

when you strip to MRCAs you get:

water body (90)
- hypoxic lake (20)
- watercourse (70)
  - river (30)
  - tidal creek (40)

I would say that nesting the facets by rendering as a tree in this way is a good way to provide dynamic drill down that leverages the ontology groupings, but that is a wider UI/UX decision (trimming by MRCA removes polyhierarchical aspects, there are other strategies to get a tree rendering)

We can also interleave this strategy with curated subsets for intermediate nodes. E.g. we may decide that "watercourse" is not a useful grouping level, if we exclude it then the MRCA of creek and river will then be "lotic water body". We may decide that this also is not a useful grouping, in which case all roll up to 'water body'

We could also combine the 3 facets into one hierarchy this way.

Note that this exact same strategy could be used for any hierarchical system - e.g. KEGG for the function classification.

We have code in js and python for doing some of this kind of thing (there are a few engineering challenges - eg do you load the ontology ahead of time into the client, or do pre-processing of the facets on the server side?)

Straw man proposal for proceeding

ontology group defines initial exclusion sets (e.g. astronomical body part). Small T-shirt
Kitware implements MRCA and exclusion set filtering Medium?
Kitware implements nesting/hierarchical layout of facets Medium/Large?
Deploy a test database instance that has many more samples (all public samples in gold that are envo-annotated) Medium/Large
Iterate within NMDC, potentially expanding inclusion/exclusion sets depending on feedback
Test with larger user group

pvangay · 2021-05-06T15:28:20Z

@cmungall - agree re: need for a broader range of data to demonstrate value. Thanks for laying it out here and for the suggestions. #1-2 seems like a reasonable start to me but I'll let others chime in.

dehays · 2021-07-14T17:26:06Z

@subdavis Spoke with Chris regarding the questions you raised yesterday (How do I proceed?). The two pieces I think you need to display the EnvO terms as nested in hierarchy are:

Only display the Most Recent Common Ancestor (MRCA) for paths in the EnvO graph
We (probably @turbomam ) will provide a list of terms to filter out of the display (i.e. the terrestrial body terms)

If you have additional questions - please comment

jeffbaumes · 2021-07-22T22:06:21Z

I believe I've been able to describe a process for building the simplified tree in this notebook:

https://observablehq.com/@jeffbaumes/ontology-directed-acyclic-graph-simplification

This in JavaScript but we could implement this similarly in Python on data ingestion and make it available to the client as the static tree to use for navigating EnvO.

@cmungall does this match what you had in mind?

ssarrafan · 2021-07-30T22:34:01Z

@cmungall I will leave this assigned to you for now and move to the August sprint. Let me know if it should be assigned to someone else.

cmungall · 2021-07-31T02:14:35Z

@jeffbaumes - as discussed briefly in the call the other day, I think this is great for a first iteration. I think the step where you make it a tree could lose information that may be relevant to the most optimal trimmed tree, but we can try more later, it will certainly be better than a flat list1

And also as discussed this should take care naturally of filtering the non-informative upper level terms

ssarrafan · 2021-08-04T17:39:48Z

@jeffbaumes and @subdavis do you need anything else from anyone for this issue? Let me know if I can help.

jeffbaumes · 2021-08-17T20:25:44Z

@zachmullen see microbiomedata/nmdc-ontology#4 (comment) for the new data.

ssarrafan · 2021-09-01T00:45:23Z

@zachmullen and @jeffbaumes any update on this? Are you still actively working on this? I can move it to the September sprint but if you're not working on it I can remove it and add the backlog label. Let me know. Thank you.

zachmullen · 2021-09-01T00:46:00Z

I think we can call this one done.

ssarrafan · 2021-09-01T00:55:24Z

I think we can call this one done.

That's great! I'll close it. Thank you.

ssarrafan created this issue from a note in NMDC May 2021 Sprint (To do) May 5, 2021

kfagnan added the type: question Further information is requested label May 5, 2021

kfagnan removed this from To do in NMDC May 2021 Sprint May 5, 2021

jeffbaumes added the type: needs discussion label May 11, 2021

subdavis added the priority: high label May 11, 2021

ssarrafan assigned subdavis Jul 8, 2021

ssarrafan added this to To do in NMDC July 2021 Sprint via automation Jul 8, 2021

ssarrafan added this to the Sprint 4 milestone Jul 8, 2021

ssarrafan assigned cmungall Jul 13, 2021

jeffbaumes removed the type: needs discussion label Jul 15, 2021

ssarrafan removed this from To do in NMDC July 2021 Sprint Jul 30, 2021

ssarrafan added this to To do in NMDC August 2021 Sprint via automation Jul 30, 2021

ssarrafan modified the milestones: Sprint 4, Sprint 5 Jul 30, 2021

jeffbaumes assigned zachmullen and unassigned cmungall Aug 17, 2021

jeffbaumes mentioned this issue Aug 17, 2021

create json file for Kitware microbiomedata/nmdc-ontology#4

Closed

ssarrafan moved this from To do to In progress in NMDC August 2021 Sprint Aug 26, 2021

ssarrafan removed this from In progress in NMDC August 2021 Sprint Sep 1, 2021

ssarrafan added this to To do in NMDC September 2021 Sprint via automation Sep 1, 2021

ssarrafan modified the milestones: Sprint 5, Sprint 6 Sep 1, 2021

ssarrafan closed this as completed Sep 1, 2021

NMDC September 2021 Sprint automation moved this from To do to Done Sep 1, 2021

cmungall mentioned this issue Feb 18, 2022

unusual top level hierarchy for ENVO search #602

Open

jeffbaumes mentioned this issue Mar 7, 2024

data portal - environmental local scale search issue #1173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Portal Search - EnvO #361

Portal Search - EnvO #361

ssarrafan commented May 5, 2021

jbeezley commented May 5, 2021

kfagnan commented May 5, 2021

pvangay commented May 6, 2021

cmungall commented May 6, 2021

pvangay commented May 6, 2021

dehays commented Jul 14, 2021

jeffbaumes commented Jul 22, 2021

ssarrafan commented Jul 30, 2021

cmungall commented Jul 31, 2021

ssarrafan commented Aug 4, 2021

jeffbaumes commented Aug 17, 2021

ssarrafan commented Sep 1, 2021

zachmullen commented Sep 1, 2021

ssarrafan commented Sep 1, 2021

Portal Search - EnvO #361

Portal Search - EnvO #361

Comments

ssarrafan commented May 5, 2021

jbeezley commented May 5, 2021

kfagnan commented May 5, 2021

pvangay commented May 6, 2021

cmungall commented May 6, 2021

pvangay commented May 6, 2021

dehays commented Jul 14, 2021

jeffbaumes commented Jul 22, 2021

ssarrafan commented Jul 30, 2021

cmungall commented Jul 31, 2021

ssarrafan commented Aug 4, 2021

jeffbaumes commented Aug 17, 2021

ssarrafan commented Sep 1, 2021

zachmullen commented Sep 1, 2021

ssarrafan commented Sep 1, 2021