# Using STAC Catalogs and Accessing ARCO Data

This is the EDITO STAC API endpoint: [https://catalog.dive.edito.eu/](https://catalog.dive.edito.eu/) we will use for this example. There are other STAC endpoints available, and you can replace this URL with the one you want to use.
You can find more information [here](https://stacspec.org/en/about/datasets/#catalogs). Be aware that the structure and content of the STAC catalog may vary depending on the provider.

In [1]:
library(rstac)
library(purrr)

In [2]:
stac_endpoint_url <- 'https://catalog.dive.edito.eu/'

This is a function to filter collections based on a keyword. It will be used to filter collections that contain 'temperature' in their ID or title.

In [3]:
filter_collections_by_keyword <- function(collections, keyword) {
  Filter(function(col) {
    grepl(keyword, col$id, ignore.case = TRUE) |
      grepl(keyword, col$title, ignore.case = TRUE)
  }, collections$collections)
}

Perform a request to retrieve collections from the STAC endpoint. This will return a list of collections available in the STAC catalog.

In [4]:
collections <- stac(stac_endpoint_url) %>%
  collections() %>%
  get_request()

Then filter the collections based on the keyword 'temperature'.

In [5]:
filtered_collections <- filter_collections_by_keyword(collections, "temperature")

Print filtered collections and allow the user to choose one. 
This block prints the titles and IDs of the filtered collections.

In [6]:
print_filtered_collections <- function(filtered_collections) {
  cat("Filtered collections:\n")
  for (i in seq_along(filtered_collections)) {
    cat(i, ": ", filtered_collections[[i]]$title, " (ID: ", filtered_collections[[i]]$id, ")\n", sep = "")
  }
}

Here we will choose a collection based on the output above. 
Choose a number corresponding to the collection you want to select.

In [7]:
choose_collection <- function() {
  cat("\nEnter the number of the collection you want to choose: ")
  as.integer(readLines(n = 1))
}

If the index is in the list, retrieve and print details of the chosen collection.

In [8]:
validate_chosen_index <- function(chosen_index, filtered_collections) {
  !is.na(chosen_index) && chosen_index >= 1 && chosen_index <= length(filtered_collections)
}

Retrieve and print details of the chosen collection.

In [9]:
retrieve_and_print_collection_details <- function(chosen_collection) {
  col_id <- chosen_collection$id
  col_title <- chosen_collection$title
  cat("You chose:\n")
  cat("Collection ID:", col_id, "\n")
  cat("Collection Title:", col_title, "\n")
  list(col_id = col_id, col_title = col_title)
}

Create STAC object and retrieve items for the chosen collection.
This function creates a STAC object to interact with the STAC Search API and retrieves items for the chosen collection.

In [10]:
create_stac_object_and_retrieve_items <- function(stac_endpoint_url, col_id) {
  stac_obj <- stac(stac_endpoint_url)
  items <- stac_obj %>%
    stac_search(collections = col_id) %>%
    get_request()
  items
}

Print details of the items in the chosen collection.
This function prints details of the items in the chosen collection, including item ID, start and end datetime, and assets.

In [11]:
print_item_details <- function(items, col_id) {
  cat("Number of items:", length(items$features), "\n")
  # Create an empty data frame to store item details
  item_details <- data.frame(
    Collection_ID = character(),
    Item_ID = character(),
    Start_Datetime = character(),
    End_Datetime = character(),
    Geometry = character(),
    Arco_Asset = character(),
    stringsAsFactors = FALSE
  )
  cat("Items in the chosen collection:\n")
  # Loop through each item in the collection, stored in the items$features list
  for (item in items$features) {
    # use the item$id to get the item details
    cat("Item ID: ", item$id, "\n")
     # search the item$properties for start_datetime and end_datetime
    cat("Start Datetime: ", item$properties$start_datetime, "\n")
    cat("End Datetime: ", item$properties$end_datetime, "\n")
    cat("Assets:\n")
    arco_asset <- NA
    # Loop through each asset in the item$assets list
    for (asset_name in names(item$assets)) {
      asset <- item$assets[[asset_name]]
      cat("  Asset Name: ", asset_name, "\n")
      cat("    Href: ", asset$href, "\n")
      cat("    Type: ", asset$type, "\n")
      if (!is.null(asset$title)) {
        cat("    Title: ", asset$title, "\n")
      }
      if (!is.null(asset$description)) {
        cat("    Description: ", asset$description, "\n")
      }
      # Check if the asset is a zarr or parquet file and not from datalab
      if (grepl("\\.zarr$|\\.parquet$", asset$href) && !grepl("datalab", asset$href)) {
        arco_asset <- asset$href
      }
    }
    # Store the item details in the data frame
    start_datetime <- item$properties$start_datetime
    end_datetime <- item$properties$end_datetime
    geometry <- item$geometry
    item_details <- rbind(item_details, data.frame(
      Collection_ID = col_id,
      Item_ID = item$id,
      Start_Datetime = start_datetime,
      End_Datetime = end_datetime,
      Geometry = toString(geometry),
      Arco_Asset = arco_asset,
      stringsAsFactors = FALSE
    ))
  }
  print(item_details)
}

This is the main block that calls the functions defined above to filter collections, choose a collection, 
retrieve and print collection details, and retrieve and print item details.

In [12]:
print_filtered_collections(filtered_collections)
chosen_index <- choose_collection()

if (validate_chosen_index(chosen_index, filtered_collections)) {
  chosen_collection <- filtered_collections[[chosen_index]]
  collection_details <- retrieve_and_print_collection_details(chosen_collection)
  items <- create_stac_object_and_retrieve_items(stac_endpoint_url, collection_details$col_id)
  print_item_details(items, collection_details$col_id)
} else {
  cat("Invalid choice. Exiting.\n")
}

Filtered collections:
1: Air equivalent potential temperature (Climate Forecast convention) (ID: climate_forecast-air_equivalent_potential_temperature)
2: Air equivalent temperature (Climate Forecast convention) (ID: climate_forecast-air_equivalent_temperature)
3: Air potential temperature (Climate Forecast convention) (ID: climate_forecast-air_potential_temperature)
4: Air pseudo equivalent potential temperature (Climate Forecast convention) (ID: climate_forecast-air_pseudo_equivalent_potential_temperature)
5: Air pseudo equivalent temperature (Climate Forecast convention) (ID: climate_forecast-air_pseudo_equivalent_temperature)
6: Air temperature (Climate Forecast convention) (ID: climate_forecast-air_temperature)
7: Air temperature anomaly (Climate Forecast convention) (ID: climate_forecast-air_temperature_anomaly)
8: Air temperature at cloud top (Climate Forecast convention) (ID: climate_forecast-air_temperature_at_cloud_top)
9: Air temperature at effective cloud top defined by inf