Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't bunde the data #744

Open
ivan-aksamentov opened this issue Jun 15, 2020 · 7 comments
Open

Don't bunde the data #744

ivan-aksamentov opened this issue Jun 15, 2020 · 7 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed IMPORTANT Take this immediately! s:conf Scope: related to configuration s:data Scope: related to data retrieval, parsing, transformation, storage, update s:infra Scope: related to infrastructure, continuous integration, deployment t:feat Type: request of a new feature, functionality, enchancement

Comments

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Jun 15, 2020

🙋 Feature Request

🔦 Context

Currently all the data (default scenarios, case counts, age and severity distributions)
https://github.com/neherlab/covid19_scenarios/tree/master/src/assets/data
is being bundled into the app directly with webpack, using static import. For example:

import ageDistributionRaw from '../../assets/data/ageDistribution.json'

This is an easy solution

  • one line of code to load the data
  • data is always guaranteed to be present

but is not very practical:

😯 Describe the feature

We want to evaluate different mechanisms of loading and updating the data.
The new mechanism should:

  • allow for independent releases of the app and data
  • ensure robustness of data loading
  • not change the schemas too much, to allow backwards compat for existing URLs and file imports/exports

💻 Examples

💁 Possible Solution

For example, we could load the data to the public S3 bucket and the load it with a plain HTTP request and validating it afterwards.

We are open for other proposals.

Related

@ivan-aksamentov ivan-aksamentov added t:feat Type: request of a new feature, functionality, enchancement good first issue Good for newcomers help wanted Extra attention is needed s:infra Scope: related to infrastructure, continuous integration, deployment s:data Scope: related to data retrieval, parsing, transformation, storage, update s:conf Scope: related to configuration IMPORTANT Take this immediately! labels Jun 15, 2020
@r-s-rai
Copy link
Collaborator

r-s-rai commented Jul 1, 2020

Hello,

Would it be possible to schedule a meeting about how you want the data implemented? My group and I were thinking about using an Amazon S3 bucket to store the data, but we wanted to coordinate with you guys since you'll be the ones in charge of the S3 bucket and managing it.

@rneher
Copy link
Member

rneher commented Jul 1, 2020

sure, let's discuss. How about tomorrow (Thu) late afternoon CEST, morning East Coast?

@r-s-rai
Copy link
Collaborator

r-s-rai commented Jul 1, 2020 via email

@ivan-aksamentov
Copy link
Member Author

ivan-aksamentov commented Jul 2, 2020

@r-s-rai Hello, sorry for missing the call.

I've setup a S3 bucket + cloudfront distribution + domain.

So the data is ready to be fetched from:
https://data.covid19-scenarios.org/ageDistribution.json
https://data.covid19-scenarios.org/scenarios.json
https://data.covid19-scenarios.org/caseCounts.json
https://data.covid19-scenarios.org/severityDistributions.json

This is just the contents of the src/assets/data directory.
If you replace the corresponding imports in src/io/defaults/get* with fetches (e.g. using axios) this should do the trick.

CORS is enabled in both S3 and Cloudfront. However preflight (OPTIONS) requests will probably not work. So keep that in mind.

Note however that this simple solution would introduce all kinds of new issues:

  • what if data schema needs to be changed? (like in Split app data per region and load on demand #743) Developers cannot just modify the files on the bucket, because it will take down the production site.
  • previously data was guaranteed to be always there, in the bundle. Now fetch can fail for any reason and this will require additional plumbing to mitigate.
  • requests are now done in series: first the bundle, then the data. If we proceed with Split app data per region and load on demand #743 the request chain will only become longer. Each request introduces additional latency.

So the entire adventure is probably more complicated than swapping the imports with requests.

Okay, sounds bad, but are there any other alternatives? I don't know.
So why don't you give it a try and we will see where it goes.
Please open a (draft) pull request early on to keep the discussion going.

Let me know if you have any questions or if you encounter any problems (especially with the AWS setup).

cc @rneher

@rneher
Copy link
Member

rneher commented Jul 3, 2020

@r-s-rai -- as you see in Ivan's comments above, the operation turns out to be slightly trickier than anticipated. We suggest starting first with just replacing bundling of the jsons by fetching. from there, one could then move towards fetching the case counts one-by-one.

@r-s-rai
Copy link
Collaborator

r-s-rai commented Jul 9, 2020

Hello Ivan,

Would it be possible to schedule a meeting with you sometime soon to discuss the code?

Thank you,

Rohan Rai

@DaveedaKinG
Copy link

Hello Rai

I think it’ll be a really nice idea to discuss
With you on this

thanks

DaveedaKinG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed IMPORTANT Take this immediately! s:conf Scope: related to configuration s:data Scope: related to data retrieval, parsing, transformation, storage, update s:infra Scope: related to infrastructure, continuous integration, deployment t:feat Type: request of a new feature, functionality, enchancement
Projects
None yet
Development

No branches or pull requests

4 participants