Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft sections on openEO, STAC, COG, and gdalcubes #784

Merged
merged 3 commits into from May 7, 2022

Conversation

appelmar
Copy link
Contributor

This is a first draft for two sections on openEO, STAC, COG, and gdalcubes (see #755). Let me know if you have any ideas for improvement, find this too long, or miss anything.

For now, I've moved both sections into a larger section on "Bridges to cloud technologies and services", which perhaps would also relate to the upcoming rgee section?!

The openEO section does not include a code example but happy to add one if you think this is helpful.

@@ -14,6 +14,8 @@ library(terra)
library(qgisprocess)
library(Rsagacmd)
library(rgrass7)
library(rstac)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will need to add to the Suggests in geocompkg

@@ -93,7 +95,7 @@ This chapter focuses on 'bridges' to three mature open source GIS products (see
One can also use R scripts from within QGIS\index{QGIS} (see https://docs.qgis.org/3.16/en/docs/training_manual/processing/r_intro.html).
Finally, it is also possible to use R from the GRASS GIS\index{GRASS} command line (see https://grasswiki.osgeo.org/wiki/R_statistics/rgrass7).
]
To complement the R-GIS bridges, the chapter ends with a very brief introduction to interfaces to spatial libraries (Section \@ref(gdal)) and spatial databases\index{spatial database} (Section \@ref(postgis)).
To complement the R-GIS bridges, the chapter ends with a very brief introduction to interfaces to spatial libraries (Section \@ref(gdal)), spatial databases\index{spatial database} (Section \@ref(postgis)), and cloud-based processing of Earth observation data (Section \@ref(cloud)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

10-gis.Rmd Outdated


### STAC, COGs, and data cubes in the cloud
Major cloud computing providers (Amazon Web Services, Microsoft Azure / Planetary Computer, Google Cloud Platform, and others) offer huge catalogs of open Earth observation data such as the complete Sentinel-2 archive on their platforms. We can use R and directly connect to and process data from these archives, ideally from a machine in the same cloud and region.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great intro.

10-gis.Rmd Outdated
### STAC, COGs, and data cubes in the cloud
Major cloud computing providers (Amazon Web Services, Microsoft Azure / Planetary Computer, Google Cloud Platform, and others) offer huge catalogs of open Earth observation data such as the complete Sentinel-2 archive on their platforms. We can use R and directly connect to and process data from these archives, ideally from a machine in the same cloud and region.

Three promising developments that make working with such image archives on cloud platforms _easier_ and _more efficient_ are the [SpatioTemporal Asset Catalog (STAC)](https://stacspec.org), the [cloud-optimized GeoTIFF (COG)](https://www.cogeo.org/) image file format, and the concept of data cubes. Below, we introduce these individual developments and briefly describe how they can be used from R.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# Connect to the STAC-API endpoint for Sentinel-2 data
# and search for images intersecting our AOI
s = stac("https://earth-search.aws.element84.com/v0")
items = s |>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, note to self we need to switch to base pipe soon #616

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Robinlovelace
Copy link
Collaborator

Many thanks for this @appelmar, looking great!

@Nowosad
Copy link
Member

Nowosad commented Apr 29, 2022

Thank you a lot, Marius. I will review it in the next few days.

@appelmar
Copy link
Contributor Author

Thanks, looking forward to discuss and improve.

Copy link
Member

@Nowosad Nowosad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@appelmar great sections: dense, informative, but also easy to follow.

I only added some small comments. One additional thing -- can you split each sentence into a separate line? We are using the "one line per sentence" approach in the geocompr book.

10-gis.Rmd Outdated

Three promising developments that make working with such image archives on cloud platforms _easier_ and _more efficient_ are the [SpatioTemporal Asset Catalog (STAC)](https://stacspec.org), the [cloud-optimized GeoTIFF (COG)](https://www.cogeo.org/) image file format, and the concept of data cubes. Below, we introduce these individual developments and briefly describe how they can be used from R.

The SpatioTemporal Asset Catalog (STAC) is a general description format for spatiotemporal data that is used to describe a variety of datasets on cloud platforms including imagery, SAR data, and point clouds. Besides simple static catalog descriptions, STAC-API presents a web service to query items (e.g. images) of catalogs by space, time, and other properties. In R, the **rstac** package [@simoes_rstac_2021] allows to connect to STAC-API endpoints and search for items. In the example below, we request all images from the [Sentinel-2 Cloud-Optimized GeoTIFF (COG) dataset on Amazon Web Services](https://registry.opendata.aws/sentinel-2-l2a-cogs) that intersect with a predefined area and time of interest. The result contains all found images and their metadata and URLs pointing to actual files on AWS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@appelmar consider explaining the SAR abbreviation (we are not using SAR data in the book)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know of any list of the most popular STAC-APIs? We, for example, could mention them in chapter 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stacindex.org/ might be helpful. It is possible to search for public APIs only.

10-gis.Rmd Outdated


### STAC, COGs, and data cubes in the cloud
Major cloud computing providers (Amazon Web Services, Microsoft Azure / Planetary Computer, Google Cloud Platform, and others) offer huge catalogs of open Earth observation data such as the complete Sentinel-2 archive on their platforms. We can use R and directly connect to and process data from these archives, ideally from a machine in the same cloud and region.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@appelmar could you add some indexes to your new sections (\index{} -- see examples of its use in the previous chapters)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some indexes for GDAL, STAC, COG, data cube, and cloud computing.

10-gis.Rmd Outdated

For larger areas of interest, requested images are still relatively difficult to work with: they may use different map projections, may spatially overlap, and the spatial resolution often depends on the spectral band. The **gdalcubes** package [@appel_gdalcubes_2019] can be used to abstract from individual images and to create and process image collections as four-dimensional data cubes.

The example below shows a minimal example to create a lower resolution (250m) maximum NDVI composite from the Sentinel-2 images returned by the previous STAC-API search.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The example below shows a minimal example" -> "The code below shows a minimal example"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

library(gdalcubes)

# Create an image collection object from STAC items
collection = stac_image_collection(items$features, property_filter = function(x) {x[["eo:cloud_cover"]] < 10})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you shortly explain the x[["eo:cloud_cover"]] < 10 code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above now states "Filter images from STAC response by cloud cover and create an image collection object". Is this understandable or do you think I should mention some details how the function is applied in the text?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only line of code in your sections that I do not understand (I do not know where did "eo:cloud_cover" came from...). It would be great if you could add a short explanation in the text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've added a short explanation (see here) after the code chunk. Does this help to understand the function?

10-gis.Rmd Outdated
collection = stac_image_collection(items$features, property_filter = function(x) {x[["eo:cloud_cover"]] < 10})

# Define spatiotemporal extent, resolution and CRS of the target data cube
v = cube_view(srs = "EPSG:3857", extent = collection, dx = 250, dy = 250, dt = "P1D")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is P1D?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an ISO 8601 duration string, I've added this as a comment.

10-gis.Rmd Outdated
raster_cube(collection, v) |>
select_bands(c("B04", "B08")) |>
apply_pixel("(B08-B04)/(B08+B04)", "NDVI") |>
reduce_time("max(NDVI)") -> cube
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using forward pipes in the book. Can you replace it with =?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

10-gis.Rmd Outdated

### openEO

Besides hosting large data archives, numerous cloud-based services to process Earth observation data have been launched during the last years. OpenEO [@schramm_openeo_2021] is an initiative to support interoperability among cloud services by defining a common language for processing the data. The initial idea has been described in an [r-spatial.org blog post](https://r-spatial.org/2016/11/29/openeo.html) and aims at making it possible for users to change between cloud services easily with as little code changes as possible. The [standardized processes](https://processes.openeo.org) use a multidimensional data cube model as an interface to the data. Implementations are available for eight different backends (see https://hub.openeo.org) to which users can connect with R, Python, JavaScript, QGIS, or a web editor and define (and chain) processes on collections. Since the functionality and data availability differs among the backends, the **openeo** R package [@lahn_openeo_2021] dynamically loads available processes and collections from the connected backend. Afterwards, users can load image collections, apply and chain processes, submit jobs, and explore and plot results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@appelmar can you think of a small code example for this section? I think it would be great to see openeo in action (if possible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, all available services require a user account. Do you think it would be useful to include an example without connecting to a specific backend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- it could help people to do their first steps (after getting the user account).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an example, hope this is useful and easy to understand?

@appelmar
Copy link
Contributor Author

appelmar commented May 2, 2022

Thanks, Jakub and Robin for your comments and improvements. The latest commit should now follow the "one line per sentence" approach and address the other comments!

@Nowosad
Copy link
Member

Nowosad commented May 3, 2022

Great work, @appelmar.
@Robinlovelace -- can we merge this PR?

@Robinlovelace
Copy link
Collaborator

@Robinlovelace -- can we merge this PR?

Was just about to ask the same question, apologies for missing the question! Yes for sure, many thanks @appelmar and Jakub for the detailed review. And feel free to share it widely when the new section builds 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants