Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert remote files to FSFile objects automatically #2096

Merged
merged 37 commits into from May 9, 2022

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Apr 27, 2022

To make it possible to read files directly from a remote location using fsspec and FSFile, I'm adding a feature that automatically converts file paths that contain the transfer protocol to FSFile objects.

With this, all the readers supporting reading for example from S3 object storage the files can be given simply like this:

filenames = [
    's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3__-MSG3_RSS____-WV_073___-000006___-202204260855-__',
    's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3__-MSG3_RSS____-WV_073___-000007___-202204260855-__',
    's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3__-MSG3_RSS____-WV_073___-000008___-202204260855-__',
    's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3__-MSG3_RSS____-_________-EPI______-202204260855-__',
    's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3__-MSG3_RSS____-_________-PRO______-202204260855-__',
]
scn = Scene(reader='seviri_l1b_hrit', filenames=filenames)
scn.load(['WV_073'])

The credentials etc. are automatically read from the fsspec configuration file, which needs to be described somewhere in the documentation. Alternately the credentials etc. can be given via Scene reader_kwargs.

  • Tests added
  • Fully documented

@pnuu pnuu added enhancement code enhancements, features, improvements component:scene labels Apr 27, 2022
@pnuu pnuu self-assigned this Apr 27, 2022
@pnuu
Copy link
Member Author

pnuu commented Apr 27, 2022

I'll be working more on this (to fix the failing tests, documentation, etc.), but waiting for possible comments until tomorrow.

@djhoese
Copy link
Member

djhoese commented Apr 27, 2022

This is something I was hoping for when FSFile was first created, or rather before it was created. I wanted something similar for some of the fancier NetCDF URLs that can end with #mode=bytes in URLs. Although @mraspaud wanted to keep this functionality separate I think for UX we need to have the convenience of something like this.

@djhoese
Copy link
Member

djhoese commented Apr 27, 2022

I should add...I was a little surprised it was added in the Scene. Is there maybe a spot lower in the code that this could work?

@pnuu
Copy link
Member Author

pnuu commented Apr 27, 2022

Scene was the first place I found handling the filenames, so thought it would be the most logical place to make this work for all supported readers.

@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

Merging #2096 (c45fd89) into main (20434bf) will decrease coverage by 0.06%.
The diff coverage is 68.66%.

@@            Coverage Diff             @@
##             main    #2096      +/-   ##
==========================================
- Coverage   93.89%   93.83%   -0.07%     
==========================================
  Files         283      283              
  Lines       42589    42860     +271     
==========================================
+ Hits        39991    40219     +228     
- Misses       2598     2641      +43     
Flag Coverage Δ
behaviourtests 4.72% <10.66%> (+0.02%) ⬆️
unittests 94.49% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/__init__.py 97.51% <ø> (ø)
satpy/utils.py 25.39% <18.96%> (-1.33%) ⬇️
satpy/scene.py 93.36% <100.00%> (+0.39%) ⬆️
satpy/tests/test_scene.py 99.49% <100.00%> (+0.03%) ⬆️
satpy/tests/test_utils.py 100.00% <100.00%> (ø)
satpy/readers/cmsaf_claas2.py 100.00% <0.00%> (ø)
satpy/tests/reader_tests/test_cmsaf_claas.py 100.00% <0.00%> (ø)
satpy/readers/viirs_l1b.py 96.15% <0.00%> (+0.37%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20434bf...c45fd89. Read the comment docs.

@coveralls
Copy link

coveralls commented Apr 28, 2022

Coverage Status

Coverage increased (+0.04%) to 94.436% when pulling c45fd89 on pnuu:feature-auto-fsfile into 20434bf on pytroll:main.

@pnuu
Copy link
Member Author

pnuu commented Apr 28, 2022

Documentation added to quickstart.html.

Any thoughts on:

  1. what would be a better place for this instead of Scene?
  2. should there be a way to pass the configuration in the code via Scene kwargs?

Already existing alternatives to 2.:

  • fsspec has an environment variable FSSPEC_CONFIG_DIR to point to the relevant config directory, this could be set by the user via os.environ
  • the fsspec.config.conf dictionary can be edited by hand

@djhoese
Copy link
Member

djhoese commented Apr 28, 2022

We should maybe consider putting the documentation somewhere outside of quickstart. Our quickstart is starting to be a "learn everything about satpy"-start.

@pnuu pnuu added this to In progress in PCW Spring 2022 via automation May 2, 2022
@pnuu pnuu moved this from In progress to Ready for review in PCW Spring 2022 May 2, 2022
@mraspaud
Copy link
Member

mraspaud commented May 4, 2022

1 I agree with @pnuu , but we could make an explicit comment in the documentation to encourage the user to experiment with caching for speedups.

3 We don't know in advance what the user needs, it could be that the user passes the files for all the channels but in the ends just reading a couple. By not preemptively downloading the data, we can save a lot of bandwidth :)

@pnuu
Copy link
Member Author

pnuu commented May 5, 2022

Added a short section on caching the remote files to documentation.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments and requests, but otherwise looks really good. Codecov says a lot of the stuff in satpy/utils.py is not covered. Are those old messages or do more tests need to be added?

doc/source/remote_reading.rst Outdated Show resolved Hide resolved
Comment on lines +42 to +51
filenames = [
's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3*202204260855*',
]
storage_options = {
"client_kwargs": {"endpoint_url": "https://PLACE-YOUR-SERVER-URL-HERE"},
"secret": "VERYBIGSECRET",
"key": "ACCESSKEY"
}
scn = Scene(reader='seviri_l1b_hrit', filenames=filenames, reader_kwargs={'storage_options': storage_options})
scn.load(['WV_073'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I want regular unit tests running S3 downloads, but doctests may be "OK". I don't think we run doctests as part of CI as most of our examples use fake non-existent paths.

doc/source/remote_reading.rst Show resolved Hide resolved
doc/source/remote_reading.rst Outdated Show resolved Hide resolved

.. _reader_table:

.. list-table:: Satpy Readers capable of reading remote files using `fsspec`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR I'll really need to include this information in a big table of all the readers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot: @BENR0 started on that front in #1547 but the PR has been at draft stage since Feb 2021.

satpy/scene.py Outdated Show resolved Hide resolved
satpy/utils.py Outdated Show resolved Hide resolved
@pnuu
Copy link
Member Author

pnuu commented May 6, 2022

A couple comments and requests, but otherwise looks really good. Codecov says a lot of the stuff in satpy/utils.py is not covered. Are those old messages or do more tests need to be added?

Forgot to answer this. All that code is covered, it's just codecov being confused for some reason.

@mraspaud
Copy link
Member

mraspaud commented May 6, 2022

So, I tested this with filenames generated using pytroll/trollmoves#114 using the following snippet:

from posttroll.message import Message
from getpass import getpass
from satpy import Scene

with open("message.txt") as fd:
    msg = Message(rawstr=fd.read())
filenames = [item["uri"] for item in msg.data["dataset"]]

password = getpass()

scn = Scene(
    filenames=filenames,
    reader="olci_l1b",
    reader_kwargs={
        "storage_options": {"ssh": {"password": password}},
        "engine": "h5netcdf",
    },
)
scn.load(["true_color"])
scn.save_dataset("true_color")

and it works fine.

The message looks like this:

pytroll://segment/1b/safe-olci/s3b/ dataset myuser@myserver.smhi.se 2022-05-06T07:20:12.753644 v1.01 application/json {"dataset": [{"filesystem": {"cls": "fsspec.implementations.tar.TarFileSystem", "protocol": "tar", "args": [], "fo": "/local_disk/tellicast/received/TER-1/T01-S3B-01/S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3.tar", "target_protocol": "ssh", "target_options": {"host": "myserver.smhi.se", "protocol": "ssh"}}, "uri": "tar://S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3/Oa01_radiance.nc::ssh://myserver.smhi.se/local_disk/tellicast/received/TER-1/T01-S3B-01/S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3.tar", "uid": "tar://S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3/Oa01_radiance.nc"}, {"filesystem": {"cls": "fsspec.implementations.tar.TarFileSystem", "protocol": "tar", "args": [], "fo": "/local_disk/tellicast/received/TER-1/T01-S3B-01/S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3.tar", "target_protocol": "ssh", "target_options": {"host": "myserver.smhi.se", "protocol": "ssh"}}, "uri": "tar://S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3/Oa02_radiance.nc::ssh://myserver.smhi.se/local_disk/tellicast/received/TER-1/T01-S3B-01/S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3.tar", "uid": "tar://S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3/Oa02_radiance.nc"}, {"filesystem": {"cls": "fsspec.implementations.tar.TarFileSystem", "protocol": "tar", "args": [], "fo": "/local_disk/tellicast/received/TER-1/T01-S3B-01/S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3.tar", "target_protocol": "ssh", "target_options": {"host": "myserver.smhi.se", "protocol": "ssh"}}, "uri": "tar://S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3/Oa03_radiance.nc::ssh://myserver.smhi.se/local_disk/tellicast/received/TER-1/T01-S3B-01/S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3.tar", "uid": "tar://S3B_OL_1_EFR____20220506T050140_20220506T050440_20220506T065548_0179_065_304_3780_MAR_O_NR_002.SEN3/Oa03_radiance.nc"}
...

For the fun of it, here is the resulting image :)
true_color_20220506_111537

@pnuu
Copy link
Member Author

pnuu commented May 6, 2022

Sweet, thanks for the test!

@pnuu pnuu requested a review from djhoese May 9, 2022 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:scene enhancement code enhancements, features, improvements
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants