-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace SourceDataset and maybe Source* with Intake and SourceIntake #43
Comments
@benjaminleighton, @pbranson, I really like this idea. This would allow to dial down these Source objects to a minimum and make a better use of intake. I'd only suggest some changes in how we define the fields signature in this new SourceIntake class. I haven't been able to make this work yet though, have you got a working example? For instance, if I try this using one of the rompy test files:
I get:
But I can't seem to be able to load dask from it by doing for example |
@rafa-guedes Yes I find Catalog.from_dict somewhat mis-leading. You need to use the I spent some time playing around with dynamic generation of intake catalogs again and have come up with an example, which I think is the cleanest. The part I have struggled with the most is the intake catalogs, because the schema is so simple, expects that you just define the dictionary and dump to yaml. There are two key bits of boilerplate code that make the code cleaner:
so the example above could be changed to :
A full example is here: |
xref: intake/intake#771 |
Worth watching this if you have time, Martin gives an overview of his thinking and prototyping for intake 2 which seems to overlap a lot with some of the challenges we are attempting to tackle with intake drivers, filters and catalogs: https://discourse.pangeo.io/t/sep-27-2023-intake-2-the-future-martin-durant/3706/3 |
This looks interesting @pbranson thanks for sharing the link. |
Sorry @pbranson I let this slip by. Thanks for sharing your example / notebook, I tested here and it works. I just wonder though what would be the workflow for using the SourceIntake class this way - The way I managed to make it work based on your example and explanation does not look super straightforward (I may be missing something). For example, using one of the test files in rompy after making the changes to the
I'm happy to implement these changes to SourceIntake if these are going to be useful, also happy to review a pull request with the require changes. I think we may still want to leave other source classes such as SourceFile for example since it may be easier to use that with an existing file in some cases. |
The use case here is for circumstances where a catalog is generated by an external system and is passed in as YAML. For instance from some other database that indexes forcing files. The catalog yaml can be serialised. I agree for the single file case it's utility may not make as much sense, other than leveraging the intake layer to take the file from source to in memory container. Also note that the inferface should be more like this (not as terse):
Assuming you are using a SourceIntake:
|
That looks easier Paul. I have opened a pull request #70 to implement this, assigned it to you. |
Implemented in #70 |
Hi @rafa-guedes talking to @pbranson this afternoon he suggested that some of the functionality going into SourceDataset(s) and similar could go into intake catalogs combined with SourceIntake. This makes sense to me as well. It means that filtering and xarray wrappers can sit in intake and we don't have to rebuild functionality there.
For example we could replace
SourceFile(uri='bathy_temp.tif')
With something like:
here catalog_yaml is an alternative to catalog_uri that allows catalog_yaml to be embedded directly in a serialized SourceIntake object
and might require an extension of the SourceIntake object like:
What do you think?
The text was updated successfully, but these errors were encountered: