# <center> <span style="color:red">**Accessing and Viewing the Data**</span> </center>

## Importing the Copernicus Marine toolbox :
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [None]:
pip install jupyter copernicusmarine ipykernel

In [None]:
import copernicusmarine

In [None]:
pip show copernicusmarine

## Accessing Dataset stored in GWS
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [None]:
# Lists files and folders in the group workspace directory (exclamation mark runs it as shell command)
!ls /gws

In [None]:
# View name of data file in directory location
!ls /gws/nopw/j04/co2clim/datasets/GLORYS/GLOBAL_MULTIYEAR_PHY_001_030/

In [None]:
# View name of data file in directory location
!ls /gws/nopw/j04/co2clim/datasets/GLORYS/GLOBAL_MULTIYEAR_PHY_001_030/Subset_NorthAtlantic/

In [None]:
# View name of data file in directory location
!ls /gws/nopw/j04/co2clim/datasets/GLORYS/GLOBAL_MULTIYEAR_PHY_001_030/Subset_NorthAtlantic_regrid1x1/

In [None]:
# View name of data file in directory location
!ls /gws/nopw/j04/co2clim/datasets/GLORYS/GLOBAL_MULTIYEAR_PHY_001_030/Subset_NorthAtlantic_regrid1x1/cmems_mod_glo_phy_my_0.083deg_P1D-m_mlotst_100.00W-20.00E_0.00N-80.00N_1993-01-01-2021-06-30.zarr

### **Data title**
*mlotsts*→ mixed layer depth  
*0.083deg* → spatial resolution (about 1/12°)  
*100.00W-20.00E and 0.00N-80.00N* → longitude/latitude box (North Atlantic)  
*1993-01-01-2021-06-30*→ time span of the dataset

## Manipulating the data
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [None]:
import xarray as xr

# Open with xarray
ds = xr.open_zarr('/gws/nopw/j04/co2clim/datasets/GLORYS/GLOBAL_MULTIYEAR_PHY_001_030/Subset_NorthAtlantic_regrid1x1/cmems_mod_glo_phy_my_0.083deg_P1D-m_mlotst_100.00W-20.00E_0.00N-80.00N_1993-01-01-2021-06-30.zarr')

# Print a summary of your dataset
print(ds) 

In [None]:
# Shows a summary of variables
ds.data_vars 

### **Breaking down the data summary**

#### Dimensions
latitude: 81 points → 0° to 80°N  
longitude: 121 points → 100°W to 20°E  
time: 10,408 points → daily values from 1993‑01‑01 to 2021‑06‑30  
So each data variable is a 3D array: [time, latitude, longitude].  

#### Coordinates (axes of the dataset)
latitude: evenly spaced from 0°N → 80°N  
longitude: evenly spaced from 100°W → 20°E  
time: daily steps for ~28.5 years (10,408 days)  
*These are the actual coordinate values defining where and when the data applies.*  

#### Data variable
mlotst = mixed layer thickness (temperature-based)  
Dimensions: (time, latitude, longitude) → 3D grid  
Stored as float32 (32-bit floating-point numbers)  
Total size if fully loaded: 408 MB  

#### Storage & performance
Backed by a Dask array with chunks: (31, 81, 90)  
→ Each chunk contains 31 timesteps × 81 latitudes × 90 longitudes  
*Dask allows lazy computation, so the dataset isn’t fully loaded into memory at once.*

## Analysing the data
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [None]:
# Take the mean of the dataset over the time dimension
print(ds.mean(dim='time'))

In [None]:
# annual averaged MLD for each year
print(ds.groupby('time.year').mean(dim='time'))