# STAC API Data Extraction in R

This notebook demonstrates how to interact with a STAC (SpatioTemporal Asset Catalog) API to fetch and filter JSON data using R. We will explore temperature-related catalogs, collections, and items, then filter for specific items based on time ranges and asset types.

## Step 1: Load Required Libraries
We first load the required R libraries for making HTTP requests, handling JSON data, and manipulating DataFrames.

```r
library(httr)
library(jsonlite)
library(glue)
library(dplyr)
library(lubridate)
library(tidyr)

## Step 2: Define Base STAC Endpoint

Here, we define the base URL of the STAC API we are querying. We also extract the root path by removing the `catalog.json` suffix from the endpoint URL.

```r
# Base STAC endpoint
stac_endpoint_url <- "https://s3.waw3-1.cloudferro.com/emodnet/bio_oracle/stac/catalog.json"
stac_root <- dirname(stac_endpoint_url)

## Step 3: Perform API Request to Get Catalog Data

We send a GET request to the STAC API and parse the JSON response.

```r
# Perform the request to get catalog data
response <- GET(stac_endpoint_url)
json_data <- fromJSON(content(response, as = "text", encoding = "UTF-8"))


## Step 4: Filter Catalogs Based on the String 'temperature'

We filter the catalog links to find those related to 'temperature' and extract the catalog titles. This helps us narrow down to temperature-related data.

```r
# Filter catalogs within root STAC catalog based on string
catalog_selector = 'temperature'
selected_catalogs <- json_data$links[grep(catalog_selector, json_data$links, ignore.case = TRUE), ]
selected_catalogs_titles <- json_data$links[grep(catalog_selector, json_data$links$title, ignore.case = TRUE), ]
print(selected_catalogs_titles$title)

# Generate catalog links
catalog_links = glue('{stac_root}/{selected_catalogs_titles$title}/catalog.json')
print(catalog_links)


## Step 5: Loop Through and Print Catalogs

We loop through the catalog links and print each one. This gives us an overview of available catalogs.

```r
# Loop through and print each catalog link
for (i in seq_along(catalog_links)) {
  print(catalog_links[i])
}


## Step 6: Fetch and Print Each Internal Catalog's JSON Data

We fetch each catalog's JSON data and print the internal structure. This allows us to explore further into the collections within each catalog.

```r
# Loop through and print each catalog json
for (i in seq_along(catalog_links)) {
  internal_catalog_response = GET(catalog_links[i])
  internal_catalog_json <- fromJSON(content(internal_catalog_response, as = "text", encoding = "UTF-8"))
  print(internal_catalog_json)
}


## Step 7: Fetch Collections from Catalogs

Next, we focus on fetching collections from a specific catalog, `oceantemperature`, and select the `thetao_mean` collection.

```r
collection_selector = 'thetao_mean'
selected_collections <- list()
selected_collection_items <- data.frame(item_link = character(), stringsAsFactors = FALSE)

for (i in seq_along(catalog_links)){
  if (grepl('oceantemperature', catalog_links[i], ignore.case= TRUE)){
    cat_link = catalog_links[i]
    cat_response = GET(cat_link)
    cat_json <- fromJSON(content(cat_response, as = "text", encoding = "UTF-8"))
    links = cat_json$links$href
    for (j in seq_along((links))) {
        if (grepl('collection.json', links[i], ignore.case=TRUE)) {
          collection = gsub("^\\./", "", dirname(links[j]))
          selected_collections <- append(selected_collections, collection)
          print(collection)
        
        if (grepl(collection_selector, collection, ignore.case = TRUE)) {
          collection_link = glue('{dirname(cat_link)}/{collection}/collection.json')
          collection_json <- fromJSON(content(GET(collection_link), as ='text', encoding = 'UTF-8'))
          collection_links = collection_json$links
          
          matched_rows <- collection_links[grepl('item', collection_links$rel, ignore.case = TRUE), ]
          if (nrow(matched_rows) > 0) {
            for (k in seq_len(nrow(matched_rows))) {
              item_href <- matched_rows[k, "href"]
              item_href <- gsub("^\\./", "", item_href)
              item_link <- glue("{dirname(collection_link)}/{item_href}")
              selected_collection_items <- rbind(selected_collection_items, data.frame(item_link = as.character(item_link), stringsAsFactors = FALSE))
            }
          } else {
            print(glue("No matches found for item: {item}"))
          }
        }
      }
    }
  }
}
