Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp dataset selection UI #62

Closed
trvrb opened this issue Dec 13, 2019 · 18 comments
Closed

Revamp dataset selection UI #62

trvrb opened this issue Dec 13, 2019 · 18 comments
Labels
enhancement New feature or request priority: moderate To be resolved after high priority issues proposal Proposals that warrant further discussion

Comments

@trvrb
Copy link
Member

trvrb commented Dec 13, 2019

The current "tile" UI to navigate to datasets on the splash page works for well for a few items; it's attractive and approachable. However, I can imagine aiming for ~20 core datasets alongside as about as many non-core datasets. I'd propose a general UI for dataset selection. This would get used on the splash page, but could also be used on say nextstrain.org/groups/blab/ to display blab specific datasets.

I imagine this working by crawling the S3 bucket at some interval and collecting datasets and the meta portion of the JSON (primarily updated).

I wanted to combine datasets that begin with the same filestem the way we do now, otherwise the 40 seasonal flu datasets will overwhelm the single Zika dataset. After playing around for a little while, here's what I came up with:

dataset-selection-closed

The filestem is used to collect datasets under a single umbrella. The dataset owner is displayed alongside logo as well as date updated for the most recent dataset. The toggle on the left can be used to display a list of all datasets collected under the same umbrella:

dataset-selection-open

This reuses the "list" styling in Auspice built for filters.

This is related to issue #48 (manifest). Sketch mock can be found on Google Drive here.

@tsibley
Copy link
Member

tsibley commented Dec 13, 2019

Looks very nice! I think this is a good direction.

Instead of using this table on the splash page, I do think it'd be nicer to keep a small number of selected tiles followed by a link to "see more…" that goes to another page which uses this table view to show all builds/datasets (nextstrain.org/core or something).

@tsibley tsibley added the enhancement New feature or request label Dec 13, 2019
@trvrb
Copy link
Member Author

trvrb commented Dec 13, 2019

I'd be happy with something like this. I had thought about something like nextstrain.org/pathogens or nextstrain.org/datasets to give full listing of core and non-core, but that nextstrain.org/core could just show full listing of core datasets, the same way that nextstrain.org/groups/grubaughlab would show listing of grubaughlab datasets.

I imagine also allowing fields to be sorted and having an elastic search box to filter rows.

@jameshadfield
Copy link
Member

Seems good to be thinking about this.

@kairstenfay kairstenfay added the priority: moderate To be resolved after high priority issues label Aug 26, 2020
@trvrb trvrb added the proposal Proposals that warrant further discussion label Jan 22, 2021
@trvrb trvrb mentioned this issue Feb 10, 2021
@jameshadfield
Copy link
Member

Notes from talking to @eharkins this morning while reviewing #271. (We felt that big UI changes to the components that multiple pages will use, e.g. /builds, /community, /flu, /sars-cov-2, was beyond the scope of that PR).

General concept for the React component(s), which can be utilised across many pages:

image

Example of /flu:
image

Example of /groups:
image

@trvrb
Copy link
Member Author

trvrb commented Feb 19, 2021

Thanks for the detailed sketch @jameshadfield. I think these are really interesting thoughts. I have few small comments, but I'll try to spend more time with throwing together an alternative design. I do think spitballing a few options here is the best starting point.

Specific comments:

  1. Conceptually having a, say, 3 level hierarchy like "subtype" / "gene" / "resolution" for seasonal flu feels weird to caste to a 2D grid with level 1 as rows, level 2 as columns and level 3 as elements within a grid. What happens with a 4th level?

In the flu example our paths are:

  • flu/seasonal/h3n2/ha/2y
  • flu/seasonal/h1n1pdm/ha/2y
  • flu/avian/h5n1/ha
    etc...

I'd argue that the dataset UI should mirror URL paths in order to work correctly for arbitrary datasets uploaded to a groups's S3 bucket or posted to GitHub. In this case we actually have:

"animal flu type" / "subtype" / "gene" / "resolution" and "animal flu type" / "subtype" / "gene"

But for the flu example:

flu

you've split the top-level into separate entries. I don't see how we can generically know that this is the right cut for an arbitrary set of hierarchical paths.

  1. Similar to difficulty of having a generic hierarchical UI, I'm not sure how the above sketch would handle the diversity of hierarchies in the current neherlab Groups page:

neherlab

I'm assuming this is then effectively the current interface of a series of datasets in alphabetical order in columns, ie I don't like how attempting for a hierarchical interface breaks down when encountering a diverse array of datasets.

I want a single UI that could smoothly accommodate all datasets on nextstrain.org into a single UI. This current sketch would have one UI for neherlab group (with single datasets in columns) and a separate boxy layout for seasonal flu.

  1. I think I'd still very much prefer a "flatter" interface, due to the reasons above, but also because I think it would interact more naturally with a filter / search dialog box. In the above flu example, how would filtering to "ha" work? In the seasonal flu grid would datasets in the NA column be removed but the NA column remain?

Thanks again for sketching things out. I'll try to put something together for criticism.

@eharkins
Copy link
Contributor

Thanks for the comments @trvrb! The hierarchical UI implementation very much depends on the hierarchy being encoded in the yaml rather than being inferred from paths. So in that sense it is as flexible as the encoding in the yaml and could allow a yaml made up of a mix of hierarchical and non-hierarchical groups in the same listing, and would only display hierarchies where they are encoded.

I will say while implementing the more structured layout I had some thoughts about how much more flexible a less structured UI could potentially be. The question is do we lose anything in the name of flexibility that we care about?

As an extreme example, we could list all datasets at the same level (in something collapsible so that it didn't take up too much space on the page) with the main way of finding things being via search / filter. The only major flaw I can see with this is exploring when you don't know what you're looking for. But we can suggest filters that help with this. I would imagine this taking the best of both the auspice filter ui:

Screen Shot 2021-02-18 at 4 42 35 PM
and the sequence search ui:
Screen Shot 2021-02-18 at 5 07 03 PM

You could imagine with a really good dataset search, pages like /influenza are not necessary at all until we have more bespoke resources and ways of exploring that pathogen or group of datasets - for example, the map for sars-cov-2, or something analogous for flu which illustrates the hierarchy visually. Instead, the splash page could just feature the dataset search / filter ui, replacing cards in the "Explore pathogens" section with suggested filters/searches (could still have the nice art from the cards, but when you click them it scrolls to the dataset-search section and applies the appropriate search/filter.

I can try to implement something more on this end of the spectrum to contrast the more structured approach if you think it would be worth seeing. Although it's more difficult to implement so perhaps I could start with a sketch for feedback.

@trvrb
Copy link
Member Author

trvrb commented Feb 19, 2021

This is definitely the direction I was thinking towards. "flu" pages could just be a filter for "flu" in the dataset name that belong to "core" Nextstrain datasets. I'd want to expose some metadata (maintained by, date updated, tip count) as well as title in this flat list. Just like the Auspice intersecting filter UI you could filter to "flu" in dataset name and also belonging to "blab". Or have union filters like "blab" and "neherlab" when they apply to the same metadata element also like in Auspice filter.

And I really like the idea of keeping a few small tiles (or new UI elements) that would serve to apply suggested filters.

@trvrb
Copy link
Member Author

trvrb commented Feb 19, 2021

Okay. I've made a simple sketch with dataset-selection-v2.sketch at https://drive.google.com/drive/folders/1e97lJE8ONR9TV-kDd9XVM-S8S05mUr17.

Here, I've made a simple line list that has three elements: dataset name (synonymous with splitting JSON by _), "contributor" (Nextstrain for core, GitHub user for Community or group name for Group) and "last updated":

dataset-select

This lends itself to a simple filtering by name in dataset (matching for ha for HA genes, matching for flu for flu datasets, etc...) or by filtering to contributor. I've added carats to sort by column within the filtered set (the example is filtered by contributor).

Hovering over a dataset would reveal a bit more metadata using the same style of hover panel we do in Auspice:

dataset-select-hover

Here, in addition to a unified dataset select UI that would include core, community and groups, each Groups page would contain the same dataset select UI but only for datasets belonging to this group.

Generally, I think this would be a nicely flexible and simple approach.

@trvrb
Copy link
Member Author

trvrb commented Feb 19, 2021

A couple further considerations on this flat approach:

  1. I would suggest that URLs that point to hierarchical "collections" bring up the dataset select UI with just these options specified. For example, http://nextstrain.org/flu/seasonal/vic/ would bring up a dataset select UI with only
  • flu / seasonal / vic / ha / 2y
  • flu / seasonal / vic / ha / 3y
    etc...

or https://nextstrain.org/groups/blab/ncov would bring up a dataset select UI with only

  • ncov / 19B
  • ncov / 19B / 2021-02-11
    etc...

This would replace the current need for a manifest_guest.json file to dictate how to handle default selections. Side note here that I'd suggest populating the "Dataset" sidebar component for core Nextstrain datasets directly from the S3 bucket, the same way we populate "Dataset" sidebar component for Groups.

  1. We'd probably want a hidden flag or something of this sort as part of the Auspice JSON schema. Otherwise we'll have a bunch of ncov / global / 2020-03-01, etc... datasets crufting up the dataset UI (this an issue regardless of the UI specifics). JSONs that are flagged as hidden are not private, but wouldn't be listed in the dataset select. This would also be an opt out method for Groups or Community pages not to be surfaced on the nextstrain.org homepage.

@trvrb
Copy link
Member Author

trvrb commented Feb 19, 2021

As Eli alludes to above, I would update the splash page to keep a few highlight tiles under "Explore pathogens" and "From the community" but mostly push prominent links to new URLs /core, /community and /groups that would offer the more advanced dataset select UI with the relevant entries. And perhaps a /datasets URL that has everything together under one roof.

splash-page

@eharkins
Copy link
Contributor

Thanks for taking the time to sketch this out @trvrb! Judging by the proposal of a hidden flag and my interpretation of your design in general I'm assuming that the source/backend/database file(s) for all datasets being displayed here would be a programmatically-created one using s3 and other ways for community and groups datasets? I.e. does your proposal imply a shift from the current design of having a list of datasets (in yaml) that should be included, to a design where datasets are collected from one or more endpoints and some of them are explicitly excluded from being listed on nextstrain.org? It seems like this is at least in part what you suggested with the initial posting of this issue and in subsequent comments but I want to explicitly think through how that would work together since it seems different from our current implementation beyond just the UI.

@eharkins
Copy link
Contributor

I guess it can be a combination of approaches for sourcing a dataset listing, e.g. must be in the yaml for community datasets, vs s3 presence for nextstrain datasets (modified by hidden property).

@trvrb
Copy link
Member Author

trvrb commented Feb 25, 2021

Thanks Eli. I am indeed suggesting for these generic dataset selection UI that we'd generate the data on the fly by scraping relevant S3 buckets. This could be generated into a JSON file or a YAML file that the nextstrain.org front-end would reference when displaying datasets, but the "source of truth" would be the S3 buckets.

For Community datasets, the script to troll GitHub would be a bit more complicated, but the endpoint would be generation of a similar JSON or YAML file as to the S3 trawling.

The hidden flag in the Auspice JSON would cause the script to ignore collating this dataset. I imagine this to work similarly to robots.txt and web indexing.

@trvrb
Copy link
Member Author

trvrb commented Feb 25, 2021

I can further clarify how I imagine this view component would work (in terms of things like "infinite-scroll", etc...). Edit: In the below screenshots I've clarified that there's an internal vertical scrollbar with "infinite scroll".

@joverlee521
Copy link
Contributor

The general search bar is useful if you know what you are looking for but may not be great for general browsing. Can we have some dynamic dropdown boxes that would "guide" the dataset filtering? (This is similar to how auspice dataset selection currently works).

The "hierarchy" is hidden within these dynamic dropdown boxes, but the display of the datasets would be the infinite scroll list proposed above.

Screen Shot 2021-02-25 at 1 37 01 PM

@tsibley
Copy link
Member

tsibley commented Feb 25, 2021

Agreed re: Jover's comments about discovery.

The dynamic dropdowns remind me of faceted search interfaces, which are commonly used for filtering but also play a nice role in enabling discovery by surfacing properties and values of potential interest.

I wonder about using facet lists for each hierarchy level, so you can a) pick more than one value and b) see more values to choose from at once without having to disclose a dropdown. Facet lists could also be used for other dataset properties, like contributor, last updated date range (within last week, within last month, within last 6 months, etc.).

As a convenient, rough example, here's one example of a faceted search I've previously built:

image

You can click on any value, which immediately updates the results list (not shown) as well as the counts for the values in other facets. Values within a single facet are ORed, values across facets are ANDed. There's also a freeform text filter (not shown) which is ANDed with any facet selections.

eharkins added a commit that referenced this issue Feb 27, 2021
this offers an alternative way to
list builds that is more flexible -
listing them in a flat (non-hierarchical)
interface that can be filtered.
The listing can easily be converted
into a table or other.

see #62 (comment)
for more details
@trvrb
Copy link
Member Author

trvrb commented Feb 27, 2021

I see what both getting at Jover and Tom and I appreciate the feedback, but I think there's a couple things going on. The primary one is that we don't actually have labels for any of the levels of hierarchy, so we can't have a normal faceted interface, ie we don't know about "cohort", "tissue", etc... In the case of flu, Eli has gone in and made a manual curation of these levels of "subtype", "segment", etc... but generally we won't have this. We just have the word in the dataset file name and we have words separated by "/".

My intent here is to treat this filtering as a bag-of-words model, so that the dataset flu_seasonal_h3n2_ha_2y is treated has having 5 words associated with it ("flu", "seasonal", "h3n2", "ha", "2y"). Filtering to "seasonal" would include all datasets that possess the word "seasonal" in their bag-of-words. This could include datasets like flu_seasonal_h3n2_ha_2y, but also datasets like cov_seasonal.

I think that just having dataset filtering provides the same sort of discoverability as the series of dropdowns that Jover proposes above. Here's how I imagine that typing "h5n" into the filter box would proceed. You'd get the same sort of Auspice dropdown with autocomplete of all the words that match to "h5n...", in this case, "h5n1" and "h5nx". Selecting "h5nx" in the downdown creates a blue pillbox with "h5nx" and the same eyeball / trash can as used in the Auspice interface. This filters to just datasets possessing the word "h5nx" in their bag-of-words. Notice that it doesn't matter where in the list "h5nx" appears: it will filter to both flu_avian_h5nx_ha as well as mallard_h5nx.

dataset-selection

If one were to subsequently filter to "pb1", you'd get two pillboxes under "Filtered to" and the dataset list would contain just their intersection, in this case just flu_avian_h5nx_pb1.


However, I take the general point about discoverability and I'd imagine you could do something like the following:

dataset-selection-tabs

This is just picking the most common words across all the bag-of-words datasets and giving a list of these words as suggestions. Clicking on "flu" would add a filter to "flu" and update the word counts (ala Tom's faceted UI above, just that things are necessarily flat).

dataset-selection-tabs-flu

Sketch file of the above is available as dataset-selection-v2-datasets.sketch on GDrive.

eharkins added a commit that referenced this issue Mar 11, 2021
this offers an alternative way to
list builds that is more flexible -
listing them in a flat (non-hierarchical)
interface that can be filtered.
The listing can easily be converted
into a table or other.

see #62 (comment)
for more details
@trvrb
Copy link
Member Author

trvrb commented Apr 4, 2024

This issue of dataset UI was revisited in PR #803. The merged PR runs with the idea of a top-level "card" and then organizing datasets under this top-level. I'm going to go ahead and call this resolved by #803.

@trvrb trvrb closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority: moderate To be resolved after high priority issues proposal Proposals that warrant further discussion
Projects
None yet
Development

No branches or pull requests

6 participants