 ### <br><b> Introduction to data analysis</b>

-->This document can't be modified. You don't have the authority to save changes to this notebook. 
<br>***So be sure you download your work before exiting!***  (use the above **download** buttun)

In this TP we will introduce you to xarray; an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!

We will do that with a data set of climate variables from GRENOBLE-ST GEOIRS AEROPORT , which are downloaded from METEO FRANCE site.
>In case your curiosity strikes you:
    <u><br>https://donneespubliques.meteofrance.fr/</u>

As getting the data through this interface is kind of clumsy, a version is available from the following file: /Data/Data.nc

This datafile contains following data:

> <u>Variables:</u>
    <br>- max/min/mean temperature
    <br>- total precipitation 
  <br><u>Temporal coverage: </u>
        <br>1968/01/01-2022/01/31

<br><b>Notebook Overview:</b>

> 1) Statistical analysis of temporal changes 
  2) Distribution analysis of data
  3) Correlation analysis
  4) Analysis of the temporal evolution of snow in Grenoble 


You are here to learn don't hesitate to ask if you are stuck (instead of copying your colleague's code ;))

### >> <b> Load Libraries</b>


In [3]:
# to render your plots in your notebook %matplotlib inline 
%matplotlib inline 
import matplotlib.pyplot as plt # for plots 
import numpy as np # mathematical functions
import xarray as xr 
import pandas as pd 
from scipy import stats
import statsmodels.api as sm
from scipy.stats import gamma
from scipy.stats import norm
from scipy import signal

### >> <b> Get started with Xarray</b>

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. 
Then a variable consists of: 

* dims: A tuple of dimension names.
* data: The N-dimensional array (typically, a NumPy or Dask array) storing the Variable’s data. It must have the same number of dimensions as the length of dims.
* attrs: An ordered dictionary of metadata associated with this array.
* encoding: Another ordered dictionary used to store information about how these variable’s data is represented on disk. 


In [22]:
# Open the netcdf file using xarray 
# Don't forget to change the file path ;)
da=xr.open_dataset('/home/jomaaf/TP_master/2022/Data/Data.nc')
# da

In [26]:
da.attrs
da.encoding
da.dims
da.coords

Coordinates:
  * time     (time) datetime64[ns] 1968-01-01 1968-01-02 ... 2022-01-31

**Select Variables:**

In [20]:
# To call a variable 
pr=da.Precipitation
meanT=da.mean_T
minT=da.min_T
maxT=da.max_T

In [7]:
# select time (for instance)
da=da.sel(time=slice('1979-01-01','2021-12-31'))

In [21]:
# Annual averages 
Y=da.groupby('time.year').mean('time').Precipitation

# Monthly averages 
M=da.groupby('time.month').mean('time').Precipitation

## 1.	Statistical analysis of temporal changes 
## Tasks
-----
**1.1)**
Calculate and plot the seasonal means for all data sets. Define the seasons as winter (DJF), spring (MAM), summer (JJA) and autumn (SON).  
Calculate and plot the annual means for all data sets.  

**1.2)** 
Calculate linear trends and their significance according to Student t-test at 95% level for all mean time series.
Plot the trend lines on the same graphs in 1.1. 
What do you conclude about the temporal changes of temperature and precipitation in Grenoble for different seasons and for annual values?

**1.3)** 
Calculate linear trends and their significance (according Student t-test) for daily data for T mean and precipitation. Compare your results with annual trends for Tmean and precipitation. Conclude.


## Hints:
-------------------------------------------------------------------------------
<br>--> To define seasons, years use: [xarray.DataArray.groupby](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.groupby.html)
<br>--> To calculate linear trends use: [stats.linregress](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)
<br>--> For Student t-test use: [statsmodels.regression.linear](https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html),  you may need: [ttable](https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf)
>#### *Reminder:*
$t=\frac{B}{S_{a}}$
<br>$S_{e}^2 =\frac{1}{n-2}\sum_{i=1}^{n} e^2$
<br>$S_{a}^2 =\frac{S_{e}^2}{\sum_{i=1}^{n}(x-x^-)^2}$
<br>$Y=A+Bx+e$
<br>*where:*
<br>$e $: residuals of regression
<br>$B $: slope of regression
<br>$t $: t-statistics
<br>$S_{e}^2$: residual variance
<br>$S_{a}^2$: sigma slope

--> For plottinguse : [matplotlib](https://matplotlib.org/stable/tutorials/introductory/pyplot.html)

## 2. Analysis of the distribution of data
## Tasks
-----
<br>**2.1)** Plot the histograms for T mean and precipitation. Be careful with choosing the bins.
<br>Apply Gaussian distribution to daily Tmean. Plot it on the histogram in 2.1. and write the parameters of the distribution. 
<br>Apply Gamma distribution for daily precipitation data. Plot it on the histogram in 2.1. and write rite the parameters of of the distribution.

**2.2)** Use chi-squared test to see the goodness of fit of Gaussian distribution to Tmean. Use K-S test for precipitation data. 
<br>Make your conclusions how well these distributions fit to the data.

## Hints
-----------
-->To plot the histograms checkout: [matplotlib.pyplot.hist](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)
<br>-->Gaussian distribuation: [scipy.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html)
<br>-->Gamma distribuation: [scipy.gamma](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html)

## 3. Correlation analysis
## Tasks
-----
**3.1)** Calculate the correlation and its significance between annual maximum and minimum temperatures for the four seasons.
<br>Conclude.

**3.3)** Remove trends from the annual mean Tmax and Tmean data and calculate again the correlation and its significance.

**3.3)** Compare your results from 3.2. with 3.1. and conclude.

**3.3)** Calculate correlation and significance between mean temperature and precipitation with and without trend. 
<br>What do you conclude about the relationship between these two variables and the influence of trends to the results.

## Hints
-----------
-->To calculate the pearson correlation: [scipy.stats.pearsonr](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html)
<br> For correlation significance you will use:  [Critical Values of the Linear Correlation Coefficient ](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjG8Zis6uz9AhWLh_0HHRUJD0sQFnoECBsQAQ&url=https%3A%2F%2Fwww.me.psu.edu%2Fcimbala%2Fme345%2FExams%2FCritical_values_linear_correclation.pdf&usg=AOvVaw0aSAcnmg3OPww3PWVlMyQw)
<br>-->To detrend: [ xscale.signal.fitting.detrend(data, dim=None, type='linear')](https://xscale.readthedocs.io/en/latest/generated/xscale.signal.fitting.detrend.html)

## 4.  Analysis of the temporal evolution of snow in Grenoble
## Tasks
-----

Indicate the annual snowy days (Temperature<0, and precipiataion>0) and plot the time series. 
<br>Calculate the trend and its significance. Plot the trends.
<br>What can you conclude about the the temporal changes of snow cover in Grenoble?
