New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert remote files to FSFile objects automatically #2096
Conversation
I'll be working more on this (to fix the failing tests, documentation, etc.), but waiting for possible comments until tomorrow. |
This is something I was hoping for when FSFile was first created, or rather before it was created. I wanted something similar for some of the fancier NetCDF URLs that can end with |
I should add...I was a little surprised it was added in the Scene. Is there maybe a spot lower in the code that this could work? |
|
Codecov Report
@@ Coverage Diff @@
## main #2096 +/- ##
==========================================
- Coverage 93.89% 93.83% -0.07%
==========================================
Files 283 283
Lines 42589 42860 +271
==========================================
+ Hits 39991 40219 +228
- Misses 2598 2641 +43
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Documentation added to Any thoughts on:
Already existing alternatives to 2.:
|
We should maybe consider putting the documentation somewhere outside of quickstart. Our quickstart is starting to be a "learn everything about satpy"-start. |
1 I agree with @pnuu , but we could make an explicit comment in the documentation to encourage the user to experiment with caching for speedups. 3 We don't know in advance what the user needs, it could be that the user passes the files for all the channels but in the ends just reading a couple. By not preemptively downloading the data, we can save a lot of bandwidth :) |
Added a short section on caching the remote files to documentation. |
Co-authored-by: David Hoese <david.hoese@ssec.wisc.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments and requests, but otherwise looks really good. Codecov says a lot of the stuff in satpy/utils.py
is not covered. Are those old messages or do more tests need to be added?
filenames = [ | ||
's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3*202204260855*', | ||
] | ||
storage_options = { | ||
"client_kwargs": {"endpoint_url": "https://PLACE-YOUR-SERVER-URL-HERE"}, | ||
"secret": "VERYBIGSECRET", | ||
"key": "ACCESSKEY" | ||
} | ||
scn = Scene(reader='seviri_l1b_hrit', filenames=filenames, reader_kwargs={'storage_options': storage_options}) | ||
scn.load(['WV_073']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I want regular unit tests running S3 downloads, but doctests may be "OK". I don't think we run doctests as part of CI as most of our examples use fake non-existent paths.
|
||
.. _reader_table: | ||
|
||
.. list-table:: Satpy Readers capable of reading remote files using `fsspec` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a future PR I'll really need to include this information in a big table of all the readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: David Hoese <david.hoese@ssec.wisc.edu>
Forgot to answer this. All that code is covered, it's just |
So, I tested this with filenames generated using pytroll/trollmoves#114 using the following snippet: from posttroll.message import Message
from getpass import getpass
from satpy import Scene
with open("message.txt") as fd:
msg = Message(rawstr=fd.read())
filenames = [item["uri"] for item in msg.data["dataset"]]
password = getpass()
scn = Scene(
filenames=filenames,
reader="olci_l1b",
reader_kwargs={
"storage_options": {"ssh": {"password": password}},
"engine": "h5netcdf",
},
)
scn.load(["true_color"])
scn.save_dataset("true_color") and it works fine. The message looks like this:
|
Sweet, thanks for the test! |
To make it possible to read files directly from a remote location using
fsspec
andFSFile
, I'm adding a feature that automatically converts file paths that contain the transfer protocol toFSFile
objects.With this, all the readers supporting reading for example from S3 object storage the files can be given simply like this:
The credentials etc. are automatically read from the
fsspec
configuration file, which needs to be described somewhere in the documentation. Alternately the credentials etc. can be given via Scenereader_kwargs
.