Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GOES and Himawari Satellite Imagery #222

Open
jacobbieker opened this issue Feb 4, 2024 · 12 comments · Fixed by #240
Open

Add support for GOES and Himawari Satellite Imagery #222

jacobbieker opened this issue Feb 4, 2024 · 12 comments · Fixed by #240
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@jacobbieker
Copy link
Member

jacobbieker commented Feb 4, 2024

It would be great if we could support GOES and Himawari satellite imagery. Between the 2 GOES and Himawari, this would then support global geostationary coverage of satellite imagery. The idea is to essentially make Satip not EUMETSAT specific, but more akin to NWP-Consumer, but for satellite imagery.

Detailed Description

GOES-16,-17,-18 support could be fairly easy through Microsoft's Planetary Computer, which has the NetCDF or GeoTIFF easily available there, which could be opened with rioxarray, which is already part of this repo. There are NetCDF Himawari live images on AWS as well (GOES is also there) so data access should not be a problem. The public data is available from 2017 for GOES, and 2020 for Himawari.

Ideally, it would also support older GOES and Himawari imagery, Satpy supports GOES-13,-14,-15 imagery which goes back to around 2010, and Himawari is available in archives from JAXA. This could give a global, decade-ish archive of imagery which could be quite useful for a lot of studies and training models nearly anywhere in the world.

Context

I would use it in either Dagster, or planetary-datasets for processing archival imagery and making it available on Hugging Face, which in addition to the public, near-realtime archives would be quite an impactful project. Already having the entire EUMETSAT MSG RSS datset available (for just about a year) has results in one paper we know of using the entire archive for a paper on solar forecasting. Extending that to have global coverage is a natural extension.

Possible Implementation

It could be a combination of unifying accessing the satellite imagery in different ways. For example, for Himawari data, I am creating kerchunk dataset of the public archival NetCDF files for Himawari-8 and -9 here: https://huggingface.co/datasets/jacobbieker/himawari8-kerchunk so OCF could copy that and keep extending it. GOES-17,-18-19 could be pulled from Planetary Computer, or do something similar as for Himawari, and pull the data straight from the AWS NetCDF archive.

For older GOES imagery, the archive is available at NOAA's CLASS archive, which is free to download and redistribute. The data actually goes back at least 2 generations of GOES satellites, so the archive could go back to the early 2000s, or earlier. But I would propose just going back one generation of GOES satellites, so 2011ish, which would mostly match with the EUMETSAT archive (~2008).

For older Himawari imagery, it is available through DIAS, although there are more licensing restrictions, and I'm not necessarily sure if we could redistribute the data. But we could atleast include the more recent Himawari imagery that is publicly available.

@jacobbieker jacobbieker added enhancement New feature or request good first issue Good for newcomers labels Feb 4, 2024
@abhijelly
Copy link

hey @jacobbieker - for my understanding, the task is to download the GOES/Himawari data as NETCDF and upload it as a dataset on hugginface, similar to what you are trying to accoplish by creating the kerchunk dataset? Thanks

@jacobbieker
Copy link
Member Author

Hi,

Somewhat, we want to add the ability to convert the native files from Himawari and GOES to Zarr with a similar format as the Google Public Dataset version of the EUMETSAT imagery, so it can all be accessed in the same way. Ideally it would also work for the GOES 13 to 15 imagery as well, which is not available on AWS and has to be accessed from the NOAA CLASS archive.

@Rishikesh-Reddy
Copy link

Rishikesh-Reddy commented Feb 28, 2024

Hi @jacobbieker - according to understanding of the task is to add the capability for Satip to process and convert native GOES/Himawari files from NetCDF to Zarr format, similar to how it currently handles EUMETSAT data.

Additions to be made are :

  • Implement a download manager specifically for GOES and Himawari data (Goes-2-go by Brian Blaylock can be useful got GOES)
  • change app.py to accept a parameter and allow processing of all three data sources (EUMETSAT, GOES, and Himawari).
  • Enable the download of raw Himawari and GOES data in their native format and subsequent conversion to Zarr format.

As someone very new to the field of satellite image processing, I'd like to begin with a small task to gain a comprehensive understanding of the codebase. I would greatly appreciate any help you can offer in this regard.

@jacobbieker
Copy link
Member Author

Hi, Yes, that is all correct! The smallest first task would probably be to use Goes2-go to add a download manager for GOES, or alternatively just download from the AWS bucket directly. The conversion to Zarr should also be fairly straightforward, as satpy can already load the NetCDF files from Himawari and GOES, so what satip needs to do is take that output and save it in a similar format to the current EUMETSAT data.

Of the two, I would probably go with trying to get GOES-2-go to download the data first, that might be the most straightfoward.

@14Richa
Copy link
Contributor

14Richa commented Mar 22, 2024

@jacobbieker Thanks for the steps. I tried taking a stab at it. The below code snippet can download the files from the goes.

from goes2go.data import goes_latest
data = goes_latest()

This works for me locally. Would the next step be to convert this data to Zarr format? Let me know if my understanding is correct.

@jacobbieker
Copy link
Member Author

@jacobbieker Thanks for the steps. I tried taking a stab at it. The below code snippet can download the files from the goes.

from goes2go.data import goes_latest
data = goes_latest()

This works for me locally. Would the next step be to convert this data to Zarr format? Let me know if my understanding is correct.

Hi,

That is a good start, but we want to be able to give a datetime or range of dates and have the downloader download all the images during that time, not just the latest images. But once being able to select dates to download and downloading those dates, then the next step would be to convert the data to Zarr. For this, you should be able to open the NetCDF files that are downloaded with xarray, and then save them out to Zarr format. There might need to be some preprocessing that is done, but that would be the first step for that.

@14Richa
Copy link
Contributor

14Richa commented Mar 24, 2024

@jacobbieker Thanks for the steps. I tried taking a stab at it. The below code snippet can download the files from the goes.

from goes2go.data import goes_latest
data = goes_latest()

This works for me locally. Would the next step be to convert this data to Zarr format? Let me know if my understanding is correct.

Hi,

That is a good start, but we want to be able to give a datetime or range of dates and have the downloader download all the images during that time, not just the latest images. But once being able to select dates to download and downloading those dates, then the next step would be to convert the data to Zarr. For this, you should be able to open the NetCDF files that are downloaded with xarray, and then save them out to Zarr format. There might need to be some preprocessing that is done, but that would be the first step for that.

Hey @jacobbieker, Thanks for your reply.

I have raised a PR that adds the GOES Data Download Manager Script. I also have a couple of questions regarding its integration:

  1. I'm thinking about whether to incorporate the GOES Data Download Manager script into the existing download manager for EUMETSAT. Would it be more practical to merge these functionalities into a single manager, or would it be preferable to keep them separate?

  2. Additionally, we need to decide how to differentiate between commands for GOES and EUMETSAT downloads. One approach could be to use flags within the command structure to specify which satellite data to retrieve. However, I wanted to get your thoughts on whether this approach aligns with our objectives or if you have alternative ideas.

@jacobbieker
Copy link
Member Author

Thanks for the PR! I'll look over it soon. For this architecture, we want it to be the same interface for getting all the different satellite imagery, so integrating it with the current DownloadManager is my preferred way of doing it. Potentially the differentiation can be passing in which provider to the DownloadManager (i.e. goes,eumetsat, with future ones adding jaxa,or gk2a) and internally then picking the right code path for the different providers. So yes, the idea in 2. is more what I was thinking.

@suleman1412
Copy link
Contributor

hi @jacobbieker, is this issue still open? From my understanding, we have to merge eumetsat and goes in DownloadManager file itself. One way I thought of doing this is by creating a common or base class which could be used by the eumetsat and goes class individually, and then the actual DownloadManager which acts the main entry point. Please let me know if I'm in the right direction.

@iyui1223
Copy link

HI. @jacobbieker , @suleman1412 , while you are working on the common download class for different satellites,
I will be looking into the possible data sources for the Himawari satellites.
I'll start implementing Himawari downloader after the direction is set.

The best source I could find is the WorldScienceDataBank run by NICT https://sc-web.nict.go.jp/himawari/himawari-archive.html.
https://sc-nc-web.nict.go.jp/wsdb_osndisk/shareDirDownload/03ZzRnKS
This site provides an archive for entire Himawari series 1-9. According to JMA's data policy, data obtained from there can only be used for non-profit purpose but no restriction for re-distribution.

@jacobbieker
Copy link
Member Author

hi @jacobbieker, is this issue still open? From my understanding, we have to merge eumetsat and goes in DownloadManager file itself. One way I thought of doing this is by creating a common or base class which could be used by the eumetsat and goes class individually, and then the actual DownloadManager which acts the main entry point. Please let me know if I'm in the right direction.

Hi,sorry for the delayed response, but yes, this would be the way to go.

@jacobbieker
Copy link
Member Author

HI. @jacobbieker , @suleman1412 , while you are working on the common download class for different satellites,
I will be looking into the possible data sources for the Himawari satellites.
I'll start implementing Himawari downloader after the direction is set.

The best source I could find is the WorldScienceDataBank run by NICT https://sc-web.nict.go.jp/himawari/himawari-archive.html.
https://sc-nc-web.nict.go.jp/wsdb_osndisk/shareDirDownload/03ZzRnKS
This site provides an archive for entire Himawari series 1-9. According to JMA's data policy, data obtained from there can only be used for non-profit purpose but no restriction for re-distribution.

Yeah, that would be great, that data source is the same one that I found for it too. So yeah, if you want to go ahead and start adding that downloader, we can integrate it with the above later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants