# <center> <span style="color:red">**Accessing and Viewing the Data**</span> </center>

## Importing the Copernicus Marine toolbox :
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [None]:
pip install jupyter copernicusmarine ipykernel

In [None]:
import copernicusmarine

In [None]:
pip show copernicusmarine

## Accessing Dataset stored in GWS
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [None]:
# Lists files and folders in the group workspace directory (exclamation mark runs it as shell command)
!ls /gws

In [None]:
# View name of data file in directory location
!ls /gws/nopw/j04/co2clim/datasets/GLORYS/GLOBAL_MULTIYEAR_PHY_001_030/Subset_NorthAtlantic/

### **Data title**
*mlotsts*→ mixed layer depth  
*0.083deg* → spatial resolution (about 1/12°)  
*100.00W-20.00E and 0.00N-80.00N* → longitude/latitude box (North Atlantic)  
*1993-01-01-2021-06-30*→ time span of the dataset

## Manipulating the data
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [None]:
import xarray as xr

# Open with xarray
ds = xr.open_zarr('/gws/nopw/j04/co2clim/datasets/GLORYS/GLOBAL_MULTIYEAR_PHY_001_030/Subset_NorthAtlantic/cmems_mod_glo_phy_my_0.083deg_P1D-m_mlotst_100.00W-20.00E_0.00N-80.00N_1993-01-01-2021-06-30.zarr')

# Print a summary of your dataset
print(ds) 

In [5]:
# Shows a summary of variables
ds.data_vars 

Data variables:
    mlotst   (time, latitude, longitude) float64 115GB dask.array<chunksize=(2081, 16, 16), meta=np.ndarray>

### **Breaking down the data summary**

##### **Dimensions**
latitude: 961 points → covers 0° to 80°N (with 0.083° resolution).  
longitude: 1441 points → covers 100°W to 20°E.  
time: 10,408 points → daily values from 1993-01-01 to 2021-06-30.  
*So each data variable is essentially a 3D array: [time, latitude, longitude].*

##### **Coordinates (axes of the dataset)** 
latitude: evenly spaced from 0°N → 80°N.  
longitude: evenly spaced from 100°W → 20°E.  
time: daily steps for ~28.5 years (10,408 days).  
*These are the actual coordinate values.* 

##### **Data variables (the actual scientific data inside)** 
mlotst = mixed layer thickness    
Dimensions: (time, latitude, longitude) → 3D grid  
float64: stored as 64-bit floating-point numbers  

Backed by a dask array with chunks:  
(2081, 16, 16) means data is stored in small blocks of 2081 timesteps × 16 latitudes × 16 longitudes.  
*This chunking is what allows parallel / lazy computation without loading the entire dataset.*

## Analysing the data
<hr style="border: solid 2px blue; margin-top: 1.5% ">

In [6]:
# Take the mean of the dataset over the time dimention
print(ds.mean(dim='time'))

<xarray.Dataset> Size: 11MB
Dimensions:    (latitude: 961, longitude: 1441)
Coordinates:
  * latitude   (latitude) float32 4kB 0.0 0.08333 0.1667 ... 79.83 79.92 80.0
  * longitude  (longitude) float32 6kB -100.0 -99.92 -99.83 ... 19.83 19.92 20.0
Data variables:
    mlotst     (latitude, longitude) float64 11MB dask.array<chunksize=(16, 16), meta=np.ndarray>


In [7]:
# annual averaged MLD for each year
print(ds.groupby('time.year').mean(dim='time'))

<xarray.Dataset> Size: 321MB
Dimensions:    (year: 29, latitude: 961, longitude: 1441)
Coordinates:
  * latitude   (latitude) float32 4kB 0.0 0.08333 0.1667 ... 79.83 79.92 80.0
  * longitude  (longitude) float32 6kB -100.0 -99.92 -99.83 ... 19.83 19.92 20.0
  * year       (year) int64 232B 1993 1994 1995 1996 ... 2018 2019 2020 2021
Data variables:
    mlotst     (year, latitude, longitude) float64 321MB dask.array<chunksize=(1, 16, 16), meta=np.ndarray>
Attributes:
    Conventions:               CF-1.4
    comment:                   CMEMS product
    copernicusmarine_version:  2.1.1
    history:                   2023/06/01 16:20:05 MERCATOR OCEAN Netcdf crea...
    institution:               MERCATOR OCEAN
    references:                http://www.mercator-ocean.fr
    source:                    MERCATOR GLORYS12V1
    title:                     daily mean fields from Global Ocean Physics An...
