# Monitoring Snow and Ice in Iceland
## Testing the New EOPF Data Format Zarr
by Robin Gummels, Humam Hikmat, Justin Krumböhmer, and Kian Jay Lenert

## Table of Contents
- [Introduction](#introduction)
- [Zarr Algorithm](#Zarr_Optimize_Algorithm)
- [Sentinel Native Algorithm](#sentinel_native_algorithm)
- [Algorithm Comparison](#algorithm_comparison)
- [Temporal Glacier Analysis](#temporal_glacier_analysis)
- [Conclusion](#conclusion)

## Introduction

Iceland’s glaciers are shrinking, with seasonal melt–freeze cycles superimposed on a persistent long‑term decline (see: [Icelandic glacier loss - AntarcticGlaciers.org](https://www.antarcticglaciers.org/glaciers-and-climate/glacier-recession/icelandic-glacier-loss/) and [A Revised Snow Cover Algorithm to Improve Discrimination between Snow and Clouds: A Case Study in Gran Paradiso National Park](https://www.mdpi.com/2072-4292/13/10/1957)). To monitor these changes at scale, we need methods that minimize data movement while preserving scientific fidelity. 

This poses a use case for the new EOPF Zarr format. To find out what advantages and challenges it holds compared to the SAFE format, in our project, we carried out an analysis of Iceland's glacial areas twice: first using Sentinel-2 data in EOPF Zarr format, exploiting its chunked, cloud‑native nature, and then using satellite imagery in SAFE format, processing an entire Sentinel-2 tile.

Eventually, we compare the benchmarking results of running both algorithms and give our review on the Zarr format.

## Zarr Optimize Algorithm

### Seeds

For this algorithm, we predefined seeds  using [Icelands land cover data](https://www.natt.is/en/resources/open-data/data-download) to locate glacial areas. Setting a centroid in each of them results in a dataset of 203 points. Setting one point in the center of each region of connected glaciers or glaciers located close to each other (= within 5km distance) yields a dataset of 21 points.

The seeds are distributed across Iceland as follows:

In [2]:
import folium
import geopandas as gpd
seeds_file_21 = "./data/Iceland_Seeds_21.geojson"
seeds_file_203 = "./data/Iceland_Seeds_203.geojson"
df_seeds_21 = gpd.read_file(seeds_file_21)
if df_seeds_21.crs is None:
    df_seeds_21 = df_seeds_21.set_crs("EPSG:4326")
else:
    df_seeds_21 = df_seeds_21.to_crs("EPSG:4326")

df_seeds_203 = gpd.read_file(seeds_file_203)
if df_seeds_203.crs is None:
    df_seeds_203 = df_seeds_203.set_crs("EPSG:4326")
else:
    df_seeds_203 = df_seeds_203.to_crs("EPSG:4326")

m = folium.Map(zoom_start=10)

folium.GeoJson(
    df_seeds_203,
    marker=folium.CircleMarker(
        radius=1,
        color="red",
        fill=True,
        fill_color="red",
        fill_opacity=0.8
    )
).add_to(m)

folium.GeoJson(
    df_seeds_21,
    marker=folium.CircleMarker(
        radius=1,
        color="blue",
        fill=True,
        fill_color="blue",
        fill_opacity=0.8
    )
).add_to(m)

minx, miny, maxx, maxy = df_seeds_203.total_bounds
bounds = [[miny, minx], [maxy, maxx]]
m.fit_bounds(bounds)
m

idea of this algorithm

In [1]:
# insert code here

## Sentinel Native Algorithm

code explanation

In [2]:
# insert code here

## Algorithm Comparison

Now, we can compare the benchmarks of the two algorithms. We benchmarked CPU, RAM, iterations, runtime, computation time, and storage use.

### Benchmark Zarr Optimize Algorithm

In [None]:
# code

### Benchmark Sentinel Native Algorithm

In [None]:
# code

So overall, the (probably the zarr) algorithm (clearly) has a better performance. (detailed comparison, pros and cons)

## Temporal Glacier Analysis

do/can we still actually do this?

In [None]:
# code

results

## Conclusion

So all in all, we can conclude that the EOPF Zarr format is definitely to choose over the SAFE format for an analysis like this. [insert more text]