-
-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Remote Data Requirements #1939
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
As per @Cadair's request on riot.im, please see some IRISpy usecases at sunpy/sunraster#108 (comment). |
A functionality like this will also be needed for
|
Does this mean they might want to override the defaults with a local file? I don't think I envisaged that in the API designs at the top of the issue. |
Also I had a chat with @dstansby about using this to get SPICE kernels for HelioPy etc today. |
We also have goes in SunPy: https://github.com/sunpy/sunpy/blob/master/sunpy/instr/goes.py#L85 |
and apparently lyra as well: Line 453 in 094c1c1
|
This is an issue we have been kicking down the road for a while, but #1897 is pushing us to fix it.
Some functions in SunPy are going to need datafiles to work, be this instrument calibration type data like for AIA response functions or other remote data requirements, like for tutorials (see #1809) and no doubt other use cases I can't think of.
The requirements of this to my mind are:
$HOME/sunpy/data/...
when first needed.The second requirement here poses an interesting question. If the data on the remote server has changed (deliberately due to calibration change etc.) if we store a sha hash in the code, our downloads will error, and the only way to fix this is to do a new SunPy release with a bug fix. The alternative is to not store hashes and just assume the data is what we anticipated and roll with it. To my mind, I like pinning the version of data to the code, it means that in theory as long as the remote data is available one version of SunPy will always give the same answers. (Same code same data). I prefer the error hard and early approach here, but I do think we should provide a mechanism to override this behaviour (i.e. skip download verification if the user knows what they are doing.)
@dpshelio and I had a conversation about this, and for many cases, the ideal solution to this problem is working closely with the data providers. I am for the moment, however, assuming that we are working with random data on the internet where there is no way to persuade the provider to version their data properly by putting it on something like zenodo or Figshare.
Proposal
I suggest we add a data manager, which maintains a record of the cache, and can provide various features. I propose the following user code:
Define a function that needs some data:
This adds the function name to the cache, and the files, when the code is run the downloader goes out and gets the file, verifies that it matches the provided hash and it puts it in a folder which probably has the function name in it, and then gives it to the function on request.
Skip hashsum check:
Download a different file (if the user knows there is a newer version):
The functionality of the
remote_data_manager
will be reasonably complex, needs to maintain a cache on disk somewhere (probably in a json file or something) and needs to do checking etc, but I don't want to describe all that here, it's implementation details.ping @sunpy/sunpy-developers @wtbarnes
The text was updated successfully, but these errors were encountered: