Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECMWF / Copernicus Climate Data Store #40

Closed
rabernat opened this issue Nov 27, 2017 · 18 comments
Closed

ECMWF / Copernicus Climate Data Store #40

rabernat opened this issue Nov 27, 2017 · 18 comments
Labels

Comments

@rabernat
Copy link
Member

I just read an article about a new "climate data store" that is being developed by ECMWF

https://www.ecmwf.int/en/newsletter/151/meteorology/climate-service-develops-user-friendly-data-store

This looks quite ambitious and very complex:

schematic

Despite the highly customized architecture, there is an explicit mention of open-source and even xarray:

It was also decided that the CDS should be based on open source software where possible, so that other instances could be deployed if necessary. This is particularly important for the development of the toolbox: there is a vibrant community developing scientific libraries in Python, such as Numpy, Scipy, Pandas, xarray, dask, matplotlib etc. These libraries provide many of the algorithms required, and users from the weather and climate communities are already familiar with them. Making use of those libraries will therefore make it easier for users to contribute new additions to the toolbox.

We should keep this on our radar. Do any of the euro folks (e.g. @lesommer) have any connections to this group? It would be great to develop connections with ECMWF, as they are one of the largest providers of weather and climate data in the world.

@mrocklin
Copy link
Member

cc @pelson

@rabernat
Copy link
Member Author

One very active python person from ECMWF is @kynan.

@shoyer
Copy link

shoyer commented Nov 29, 2017

cc @alexamici

@shoyer
Copy link

shoyer commented Nov 30, 2017

I just left the ECMWF python workshop. ECMWF seems to be building/adapting many tools to use Python with Xarray/Dask, including:

  • a new GRIB reader based on eccodes (which they want to use as a new backend for xarray)
  • ECMWF's meteorological workstation application, MetView which will support Python/xarray as a scripting language
  • The Copernicus Data Store seems to be using xarray + dask under the covers and also for an expert API (?): http://slides.com/francesconazzaro/europython-2017#/

They do seem to be a little new to open source, and none of these tools are actually public yet. I encouraged them to get involved in Pangeo and the broader community.

@darothen
Copy link
Member

a new GRIB reader based on eccodes (which they want to use as a new backend for xarray)

@shoyer do you have any specific links or details to this effort? A good alternative to PyNIO for reading GRIB/GRIB2 files into xarray is a "killer feature" which opens up the tool to broader community of researchers working in numerical weather prediction, where GRIB2 is the standard for dissemination of large-scale forecast model output from many NDCs.

@shoyer
Copy link

shoyer commented Dec 19, 2017

@alexamici is leading these efforts for ECMWF. I don't think they have much to share publicly yet but they hope to it open source it. I encouraged him to add the xarray specific backend logic into xarray proper so we can more easily maintain it.

@chiaral
Copy link
Member

chiaral commented May 24, 2018

I am very late to this issue, but very interested in learning about the evolution of the GRIB2 reader status in xarray.

I just started using xarray+PyNIO with open_mfdataset() with some pre-processing function, and some looping to do multiple dimensions concatenation (as explained by @jhamman in this SO answer).

In theory it works great. But i am getting into some issues.

Does anyone have some experience on this?

@StephanSiemen
Copy link

FYI, we plan to setup a call soon to present the CDS and our work on xarray_grib. Hopefully we can address your questions/comments then - see #302 .

@rabernat
Copy link
Member Author

FYI, the Copernicus portal has been released. It's open to the public:
https://cds.climate.copernicus.eu/

It's pretty cool! You can run python code in their environment, kind of like a notebook. Just a lot of new / unfamiliar apis.

@shoyer
Copy link

shoyer commented Jun 14, 2018

It looks like the cdxtoolbox library has a bunch of routines for climate data analysis on xarray.DataArray objects, including routines that keep track of units:
https://devpi.copernicus-climate.eu//root/master/cdstoolbox/latest/+doc/index.html

This looks pretty cool and potentially broadly useful! Is it available outside of Copernicus as a stand-alone library and/or open source project? I think a lot of folks would be excited about this.

@alexamici
Copy link

@shoyer I can give you some background as @bopen is leading the development of the CDS Toolbox.

The Toolbox is a distributed architecture, so things are not straightforward. The cdstoolbox module that you import in the applications only define the work to be done as an abstract workflow, then actual processing is done by a bunch of different tools and libraries on separate compute hosts.

On the other hand it is true that quite of bit of the tools are written in python with xarray.DataArray as the main processing data structure and @ecmwf (our contractor) intended to Open Source the code since the beginning. Unfortunately, in spite of the best intentions of @ecmwf the legal team didn't clear us yet :/

@rabernat
Copy link
Member Author

The cdstoolbox module that you import in the applications only define the work to be done as an abstract workflow, then actual processing is done by a bunch of different tools and libraries on separate compute hosts.

Hmmm...sounds a bit like an obscure library for parallel computing in python that I've been playing around with. 😄

In all seriousness, I totally agree that many of these routines would have broad interest from the community.

@shoyer - you should consider joining the Pangeo telecon with ECMFW (discussed in #302)

@fmaussion
Copy link
Member

I've been playing around with the CDS at last week-end's Hackathon together with other programmers. I would say that the CDS is a great tool but still has some rough edges.

Pros:

  • very nice interface, engaging data exploration tool
  • it is the future of data-processing: someone has to do it first and be on the frontline
  • it works very well for some use cases: mostly when extracting time-series this is looking very good

Cons:

  • the simplest of all global use cases (compute ERA5's global monthly temperature climatology) won't work (the chunking part is not up to it yet: if you want to do it you'll have to do the chunking yourself) - Note that I'm not sure if pangeo/dask would be up to it either.
  • there is no interactive console like in pangeo.pydata.org -> you'll have to write a script without being able to debug it, run it, and wait until it blows up (which can take long)
  • yet another API to learn, and learning a new API without being able to mess around in a notebook is hard

I fully understand the challenges behind the CDS @alexamici and this is not meant to be a critic - I'm looking forward to use the CDS more and more - for some use case I'm going to have to keep using MARS though.

@fmaussion
Copy link
Member

Regarding open-sourcing the climate toolbox part: I think it would be very nice to open-source the science part of the toolbox (e.g. climate indices) in order to engage confidence in the results that the CDS is producing.

@stale
Copy link

stale bot commented Aug 14, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 14, 2018
@stale
Copy link

stale bot commented Aug 21, 2018

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

@stale stale bot closed this as completed Aug 21, 2018
@cpaulik
Copy link

cpaulik commented Nov 15, 2018

Sorry for re-opening this but I would also really like to see this part of the CDS as open source. Especially since I would not really produce anything meaningful on the CDS since their data licence states the following:

6.2. All Intellectual Property Rights of new items created as a result of
modifying or adapting the Copernicus Products through the applications and
workflows accessible on the ECMWF Copernicus portals will belong to the European
Union

@jhamman
Copy link
Member

jhamman commented Nov 16, 2018

@cpaulik - I agree. This sort of language in a licence is not particularly welcoming. I wonder if @StephanSiemen could help provide some clarity on the intension here or the most productive venue for providing feedback on the CDS licence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants