## Subject:
In addition to the SST (sst.mnmean.nc), I uploded on my GiHub these datasets to complete the description of the El-Nino phenomenon. 
 - precip.mon.mean.nc
 - msl_era5.nc
 - lwdn.nc
 - ewss_era5.nc
 - sla_aviso.nc
 
Using basic xarray commands as seen in TD1, could you describe which variables are in these NetCDF files? 
What is the spatial and temporal domain and resolution? 
What is the origin of the data? Are they based on "observations" or models? Why do I use quotes in "observations"?

Why did I choose these variables to illustrate El-Nino? Before doing any analyse/plot, which patterns do you expect to see in the interannual variability of these variables in the tropical Pacific? And why?

Choose 1 atmospheric and 1 oceanic dataset among the 5th datasets.

Following what was done in TD2, compute the regression and the correlation maps of these data on Nino 3.4 **SST** index.
Explain the units of the maps and how to read them. What is the conceptual difference between these 2 maps?
Give a physical interpretation of the results.

Following what was done in TD3, compute the EOFS of 1 atmospheric and 1 oceanic dataset (change dataset if possible). Plot the % of explained variance by the first 10 EOFS. Comment this plot. 

For one of the dataset, compute the variance of each PCs. Compute covariance between the 2 PCs. Compute correlation between the 2 EOFs (see bellow for details). Which results should/do you get? 

Plot the first 2 EOFS, and their PCs. Next, compute the correlation of each of the first 2 PCs with Nino3.4 **SST** index. Comment the results for the first EOF/PC. If you choose u10_era5.nc or slp.mnmean.nc as atmospheric data, you may exclude values outside of 25°S-25°N to exclude high-latitude signal and facilitate the analyse.


## Some advises/information for your python scripts:

 - [There is a pdf file](https://github.com/massonseb/TDENS/blob/main/help_exam.pdf) which explains how to use this notebook to create your own notebook.
 - If you have problems with memory size limitation, reduce the memory footprint (see TD1) by extracting the data over a smaller domain (exclude higher latitudes and/or exclude parts of the Atlantic and/or Indian Oceans). You can also restart the python kernel and re-execute only a part if the commands to limit the creation of intermediate arrays which you don't need but use memory space. 
 - When using sel(lon=...) to select the longitude, check the longitude ranges (0 -> 360 or -180 -> 180) and values (0, 1,...) or (0.5, 1.5...). Same story when selecting the latitude and check if latitudes are defined in increasing or decreasing order.
 - Don't forget to look at dataset and variable attributes do get information about the datasets. Beware: to limit the size of the datasets, I regridded some of the datasets on a 1°x1° grid. Attributes related to the grid definition and resolution may be outdated -> look at the lon/lat values themselves
 - Check what you are doing by plotting your (intermediate) data and their related information (such as their shape)
 ```python
# to get general information on your DataSet or DataArray
data
# to get the shape of the variable xxx
data.xxx.shape
# for a quick plot of the variable xxx
data.xxx.mean(dim='time').plot()
data.xxx.isem(time=0).plot()
 ```
 - You will need to compute interannual anomalies of your dataset. You must first remove the linear trend on each point and next remove a monthly climatology (see TD1).
 - For the correlation and linear regression with Nino 3.4 (see TD2), make sure that you selected the common period between your dataset and Nino3.4 SST index (even if xarray might do it automatically).
 - The datasets do not have any mask file available to built the weights (as it was the case for the SST). Select 1 time step (any one) and plot a map of the data you chose to visualize it. Some data are defined everywhere, other have a specific value over the continents (e.g nan which means Not a Number). You can [np.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html) to build the mask.
 ```python
# select 1 time step of the variable "xxx" 
data = xr.open_dataset('my_file.nc')
mask = data.xxx.isel(time=0)
# to test if mask is containing nan
print(mask.all().data)  # return True is there is no nan values
# to build the mask based on existing nan value
mask.data = np.where(np.isnan(mask.data), 0., 1.)
# to build the mask based on existing missing value
missing_value = ...  
mask.data = np.where(mask.data == missing_value, 0., 1.)
 ```
In all cases, plot a map of your weights to check if there is no error!
 ```python
weights.plot()
 ``` 
 - Weighted spatial covariance is not (yet) directly available in xarray... There is one method to compute the weighted covariance between X an Y:
 ```python 
x = ...
y = ...
xw = x.weighted(weights)
yw = y.weighted(weights)
z = ( x - xw.mean() ) * ( y - yw.mean() )
zw = z.weighted(weights2)
cov = zw.mean()
 ``` 
 - The first time step of lwdn.nc is quite strange... I did not change it. It was like that when I downloaded it. => Always have, a least, a quick look at the data you are manipulating! You can, for example, remove this date with:
 ```python
data.drop_isel(time=0)
 ``` 
