Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable a narrative to source multiple builds #1164

Merged
merged 27 commits into from
Aug 3, 2020

Conversation

eharkins
Copy link
Contributor

Description of proposed changes

This builds on an existing contribution #1071 (thanks to the amazing work of @salvatore-fxpig
!) to address issue #1050, i.e. enabling a narrative to source multiple builds (datasets). It mostly works, but needs more testing.

Some issues which might prevent this from getting merged (and I am still working to reproduce and define so that we can address them):

  1. There are a number of existing narratives where there is a typo in one of the slides specifying the dataset or they don't specify the build in exactly the same way. This wasn't an issue before, since having any of the datasets be different from the first slide doesn't have any effect, since that's the only one we used to load the dataset. With these changes, we fetch all the unique urls from all slides in a narrative, which might break such narratives as the ones I mentioned. @jameshadfield said:

All of the narratives we control (e.g. github.com/nextstrain/narratives) we can write a small script to flag where the builds differ (they should have all been the same so far). It was spelled out in the original spec that the dataset should change, so if there are problems then people will have to fix their narrative files*
* if the fetch fails for a dataset change, we should fall-back to the currently-in-use dataset & throw up a notification. This’d be nice, but not essential.

  1. Ensuring traditional narratives are not broken on mobile.
  2. Ensuring that we implement componentWillUnmount() methods which remove listeners set up in any of the react components which might be displayed and then unmounted when changing slides (and datasets) in a narrative. This is to prevent memory leaks or updates to unmounted components (this may or may not be an issue currently with the Entropy component - componentWillUnmount still not implemented, see the diff).
  3. I also am noticing a difference in some deme sizes between my branch and master on a given ncov narrative I am using to test that the existing narrative behavior is unchanged - so will want to fix that as well assuming it's both wrong and introduced by these changes.

Some goals it has inspired:

  1. The way we rerender all the main components when changing datasets in a narrative is with the react key prop. This makes it so we can avoid having to explicitly cover tons of cases of comparing props to guess if we should rerender a given component. This seems quite powerful and sets a cool precedent for other places where we'd like to do this in the app.
  2. See below about a good idea James had for testing narratives.

Some TODO items that can become separate issues (i.e. don't prevent this from being merged if we don't mind these imperfections), which we will still appreciate any help with cc @salvatore-fxpig:

  1. Ensuring multiple dataset narratives work on mobile.
  2. Improve dataset fetching performance strategy (see TODOs in the diff)
  3. Allow frequencies data to be fetched and displayed in narratives

Testing

I have been testing with this example narrative created by @jameshadfield and extending it as necessary. As James pointed out, maintaining a huge narrative (or set of narratives) that runs through many combinations of dataset and queries (and thus app states) could be immensely helpful in testing narratives manually as well as with automated tests. I will create an issue as a stub for this and as a place for discussion of how we want to use that big test narrative.

Thank you for contributing to Nextstrain!

salvatore-fxpig and others added 17 commits June 10, 2020 18:24
using redux to cache JSONs
this completes a bunch of TODOs (search TODO:1071 for remaining items):
1. clear the json cache when unmounting narrative component
2. rerender mounted react components with new dataset
upon redux state update
this is achieved using react "key" props
(https://reactjs.org/docs/lists-and-keys.html#keys)
3. wait for all datasets to be fetched using promise.all
4. fix loading narratives from not the first page
5. use the appropriate dataset when leaving
the narrative page to explore a dataset
@jameshadfield jameshadfield temporarily deployed to auspice-1050-mult-narra-sgj7hf June 11, 2020 02:26 Inactive
Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eharkins (and @salvatore-fxpig). This is looking great. I'll think a bit about how we're calling changePage over the coming days to make sure this is the right direction.

In addition to the in-line comments, In terms of to-dos before merge / auspice release we'll need:

  • A more expansive narrative to test with (e.g. using entropy panel, multiple datasets). This doesn't have to be huge for the purposes of this PR.
  • Write a small script to check all files in github.com/nextstrain/narratives and ensure each uses the same dataset for each block. We can fix these as we find them.
  • Write a test narrative with a misformed / non-existant dataset in one of it's blocks. I imagine there are narratives out there with this. Experiment with one of the incorrectly formatted narratives @eharkins identified above. This would have rendered fine on current master, but on this PR would result in the rejection of the Promise.all we use to fetch datasets. Ideally, we'd use the previous "good" dataset, but at the least we can print a helpful error notification.

// then fetching all the rest in the background so we can use them from the cache upon changing slides.
// Doing that presents the risk of a race case (if you change pages faster than a dataset can be fetched) so we are avoiding it for now.
// Instead we use Promise.all to ensure all the datasets are fetched before we render.
return Promise.all(blocks.map((block, i) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually makes a request for every block, regardless of whether the blocks use the same dataset (treeName) -- i.e. the test narrative fetches the zika dataset twice and the mumps/na dataset twice. Prior to the Promise.all I think we want to make a set of all treeName and secondTreeName across the narrative blocks, then perform a Promise.all on that (unique) set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit that should fix this. Thanks for pointing it out!

secondTreeDataset: false,
secondTreeName: false
};
// TODO:1050 A more performant fetching strategy would be fetching the dataset for the slide you land on,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this will add complexity and is not strictly necessary, thus it should be addressed as a subsequent PR. The current implementation fetches the datasets in parallel (via Promise.all) which it would be interesting to time against a fetch of a single dataset.

jsons: {} /* object with dataset names as keys and loaded dataset / narrative jsons as values */
}, action) => {
switch (action.type) {
case types.CACHE_JSONS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have the ability to return to the current narrative -- click "explore the dataset" then click "return to narrative" -- we want to preserve the cache until a new dataset is loaded*, i.e. the CLEAN_START action. Currently this series of events causes Auspice to crash. This will also remove the need for a separate action here when the narratives unmount. (I realise I may have suggested to clear out the cache when we click "explore the dataset"!!)

* This should have the effect that changing the dataset to on which happened to be cached will load it from the cache rather than fetching it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Just pushed a fix that seems to fix the bug with this behavior and only clear the cache on CLEAN_START

}));

};
if (this.props.blocks[idx].dataset !== this.props.blocks[this.props.currentInFocusBlockIdx].dataset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only section which needs to be transferred to the mobile narrative display for that to work?

@eharkins eharkins temporarily deployed to auspice-1050-mult-narra-sgj7hf June 11, 2020 20:40 Inactive
@salvatore-fxpig
Copy link
Contributor

Uuuh I was waiting for this to come up again!
Will give it a spin and see if I can help, the idea of using keys to forcing remounting is brilliant stuff. 👍

PS: I think adding uniq from lodash just changed the other-vendors chunkhash, if you're already publishing somewhere with long-term caching I would suggest using Set, it should transpile just fine anyway.

@salvatore-fxpig
Copy link
Contributor

I think the async stuff can be baked in pretty easily, something along these lines should do the trick
salvatore-fxpig@a5afd50

@salvatore-fxpig
Copy link
Contributor

Dumping some stuff here: #1167, hopefully makes it easier to play with

  • getting rid of _.uniq sets the hash of the other-vendors bundle right again.
  • Async loading
  • Fallback to landing block, and from landing block to block[0], for missing dataset (with updated 404 result returned by server rather than generic 500)

also rework narrative json fetching code, fixes bug with:
 "explore the data yourself" and "return to narrative" buttons
@eharkins eharkins temporarily deployed to auspice-1050-mult-narra-sgj7hf June 12, 2020 20:03 Inactive
@eharkins eharkins temporarily deployed to auspice-1050-mult-narra-sgj7hf June 12, 2020 20:04 Inactive
@eharkins
Copy link
Contributor Author

Thanks @salvatore-fxpig !

PS: I think adding uniq from lodash just changed the other-vendors chunkhash, if you're already publishing somewhere with long-term caching I would suggest using Set, it should transpile just fine anyway.

Makes sense. The reason I used _uniq instead of Sets was only because I wanted to use .map() on the unique list of datasets and not have to convert the Set back to an array to do that. And because when I used the jsons object keys to ensure no duplicate fetches, it didn't have that behavior because of a misunderstanding of promises on my part. I'm sure there is a better way though so I will check out 1167 and give it a test!

@eharkins eharkins temporarily deployed to auspice-1050-mult-narra-sgj7hf June 15, 2020 22:37 Inactive
@eharkins eharkins temporarily deployed to auspice-1050-mult-narra-sgj7hf June 17, 2020 01:07 Inactive
jameshadfield added a commit to nextstrain/narratives that referenced this pull request Jun 19, 2020
The original design of nextstrain narratives was that each "section" (paragraph) should define its own dataset, such that a single narrative could display different datasets (or the same one if each section specified the same dataset). This was never implemented -- i.e. in auspice currently only the dataset specified in the YAML frontmatter is ever used -- which has resulted in the inadvertent creation of narratives which specify different (and perhaps invalid) datasets.

Nearly every narrative file in this repo had inadvertently defined multiple datasets! Often this appears to be due to copy&paste of paragraphs from previous narratives etc. Auspice will soon "correctly" interpret these multiple datasets (nextstrain/auspice#1164).

This commit changes 206 narratives which previously specified multiple datasets so that each one now only specifies the dataset defined in the YAML frontmatter. This will preserve the current behavior in future versions of Auspice. The only exception is the "test_multiple-datasets.md" narrative.

There is also a small script which can be run periodically to identify narratives which define multiple narratives.

Closes #8
@jameshadfield jameshadfield mentioned this pull request Jul 29, 2020
@jameshadfield jameshadfield merged commit eb67db6 into master Aug 3, 2020
@trvrb trvrb deleted the 1050-mult-narratives branch January 11, 2021 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants