![header](../figures/logos_partenaires._cmems_se.jpg)

# Read and download regional data

<div style="text-align: right"><i> 2023-04-27 READ_AND_DOWNLOAD_REGIONAL_DATA </i></div>

***
**Authors:**  CLS & Datlas <br>
**Copyright:** 2023 CLS & Datlas <br>
**License:** MIT

<div class="alert alert-block alert-success">
<h1><center>Read and download regional data</center></h1>
<h5> The notebook illustrates how to read online the global data, select the regional data of interest and locally save them in a netcdf file. The example is given here for the Gulf Stream region (see DC_2020 and DC_2021 data challenges). </h5>   
     
</div>

***
**General Note 1**: Execute each cell through the <button class="btn btn-default btn-xs"><i class="icon-play fa fa-play"></i></button> button from the top MENU (or keyboard shortcut `Shift` + `Enter`).<br>
<br>
**General Note 2**: If, for any reason, the kernel is not working anymore, in the top MENU, click on the <button class="btn btn-default btn-xs"><i class="fa fa-repeat icon-repeat"></i></button> button. Then, in the top MENU, click on "Cell" and select "Run All Above Selected Cell".<br>
***


<div class="alert alert-danger" role="alert">

<h3>Learning outcomes</h3>

At the end of this notebook you will know how you can :
<ul>
  <li>read online the global data,</li>
  <li>select the regional data of interest,</li>
  <li>locally save them in a netcdf file.</li>
</ul>
    

</div>

In [1]:
from glob import glob
import numpy as np
import os

In [2]:
import sys
sys.path.append('..')
from src.mod_plot import *
from src.mod_stat import *
from src.mod_read import *
from src.mod_spectral import *

In [3]:
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

<div class="alert alert-info" role="alert">

<h2>Experiment setup</h2>

</div>

In [4]:
# Example for Gulf Stream region
lon_min = 295                                          # domain min longitude
lon_max = 305                                          # domain max longitude
lat_min = 33.                                          # domain min latitude
lat_max = 43.                                          # domain max latitude

# Time slice if you are interested in a shorter evaluation period
time_min = '2019-01-01'                                # time min for analysis
time_max = '2019-12-31'                                # time max for analysis

# Saving directory and outputs
saving_dir = '../data/'                                # saving directory path   
name_maps = 'maps/DUACS_GS.nc'                              # regional maps file name
name_alg = 'independent_alongtrack/indep_nadir_GS.nc'                         # regional independant nadir file name
name_drift = 'independent_drifters/indep_drifters_GS.nc'

<div class="alert alert-info" role="alert">

<h2>Read online global data, select region and locally save file </h2>

</div>

## Data for reconstruction: Sea Surface Height from available nadirs (except Saral/AltiKa) 

## Data for evaluation: Sea Surface Height from Saral/AltiKa

In [5]:
%%time
path_catalog = "https://ige-meom-opendap.univ-grenoble-alpes.fr/thredds/catalog/meomopendap/extract/MEOM/OCEAN_DATA_CHALLENGES/2023a_SSH_mapping_OSE/independent_alongtrack/alg/2019/catalog.html" 
path_data =    "https://ige-meom-opendap.univ-grenoble-alpes.fr/thredds/fileServer/meomopendap/extract/MEOM/OCEAN_DATA_CHALLENGES/2023a_SSH_mapping_OSE/independent_alongtrack/alg/2019/"

     
list_of_files = retrieve_list_of_files_from_url(path_catalog, path_data)

xr.set_options(file_cache_maxsize=12)
ds_alg = xr.open_mfdataset(sorted(list_of_files)[:],chunks={'time':47151},concat_dim='time',combine='nested')
ds_alg = ds_alg.where((ds_alg.time >= np.datetime64(time_min)) & (ds_alg.time <=  np.datetime64(time_max)), drop=True)
ds_alg = ds_alg.sortby('time')
lon=np.array(ds_alg.longitude.values)
lat=np.array(ds_alg.latitude.values) 
ind_lonmax = lon<lon_max 
ind_lonmin = lon>lon_min 
ind_latmax = lat<lat_max 
ind_latmin = lat>lat_min 
ind_sel_time = (ind_lonmax*ind_lonmin*ind_latmax*ind_latmin)
ds_alg = ds_alg.isel({'time':ind_sel_time})
ds_alg.to_netcdf(saving_dir+name_alg)


CPU times: user 5min 3s, sys: 38.9 s, total: 5min 42s
Wall time: 10min 5s


## Data for evaluation: Drifters

In [5]:
%%time
path_catalog = "https://ige-meom-opendap.univ-grenoble-alpes.fr/thredds/catalog/meomopendap/extract/MEOM/OCEAN_DATA_CHALLENGES/2023a_SSH_mapping_OSE/independent_drifters/catalog.html" 
path_data =    "https://ige-meom-opendap.univ-grenoble-alpes.fr/thredds/fileServer/meomopendap/extract/MEOM/OCEAN_DATA_CHALLENGES/2023a_SSH_mapping_OSE/independent_drifters/"

     
list_of_files = retrieve_list_of_files_from_url(path_catalog, path_data,'uv_')

xr.set_options(file_cache_maxsize=12)
ds_drift = xr.open_mfdataset(sorted(list_of_files)[:],chunks={'time':6091},concat_dim='time',combine='nested')
ds_drift = ds_drift.where((ds_drift.time >= np.datetime64(time_min)) & (ds_drift.time <=  np.datetime64(time_max)), drop=True)
ds_drift = ds_drift.sortby('time')
lon=np.array(ds_drift.longitude.values)
lat=np.array(ds_drift.latitude.values)
lon[lon<0]=lon[lon<0] + 360
ind_lonmax = lon<lon_max 
ind_lonmin = lon>lon_min 
ind_latmax = lat<lat_max 
ind_latmin = lat>lat_min 
ind_sel_time = (ind_lonmax*ind_lonmin*ind_latmax*ind_latmin)
ds_drift = ds_drift.isel({'time':ind_sel_time})
ds_drift.to_netcdf(saving_dir+name_drift)


PermissionError: [Errno 13] Permission denied: b'/Users/sammymetref/Documents/DataChallenges_testing/2023a_SSH_mapping_OSE/data/independent_drifters/indep_drifters_GS.nc'

## Data for comparison: DUACS maps

In [6]:
%%time
path_catalog = "https://ige-meom-opendap.univ-grenoble-alpes.fr/thredds/catalog/meomopendap/extract/MEOM/OCEAN_DATA_CHALLENGES/2023a_SSH_mapping_OSE/maps/DUACS_global_allsat-alg/catalog.html" 
path_data =    "https://ige-meom-opendap.univ-grenoble-alpes.fr/thredds/fileServer/meomopendap/extract/MEOM/OCEAN_DATA_CHALLENGES/2023a_SSH_mapping_OSE/maps/DUACS_global_allsat-alg/"

     
list_of_files = retrieve_list_of_files_from_url(path_catalog, path_data)

xr.set_options(file_cache_maxsize=12)
ds_maps = xr.open_mfdataset(sorted(list_of_files)[:],chunks={'time':1,'latitude':720,'longitude':1440},concat_dim='time',combine='nested')
ds_maps = ds_maps.sel(time=slice(time_min, time_max))

# We select the region size 0.5° wider on each side to avoid interpolation issues later

ds_maps = ds_maps.sel({'longitude':slice(lon_min-0.5,lon_max+0.5)})
ds_maps = ds_maps.sel({'latitude':slice(lat_min-0.5,lat_max+0.5)})
ds_maps.to_netcdf(saving_dir+name_maps)

CPU times: user 1min 30s, sys: 11.7 s, total: 1min 42s
Wall time: 3min 24s


### You can now use the evaluation notebooks by changing the data directory and file names !