## 3. Computing the climatology

In this section, we'll explore how to compute climate normals (climatology) for the variable of interest using the `Climatology` class from the SMADI package. The class offers a range of functionalities:

- [Compute normals at various time steps](#31-compute-normals-at-various-time-steps), including monthly, bimonthly, dekadal, weekly, and daily intervals.
- Provide flexibility in [computing different metrics](#32-computing-the-normals-using-different-metrics-median-max-etc) such as mean, median, minimum, and maximum values.
- [Fill gaps in the time-series](#33-filling-the-gaps-and-smoothing-the-time-series-data-optional) with a user-defined window size, allowing for more robust analysis in the presence of missing data.
- [Smooth the data](#33-filling-the-gaps-and-smoothing-the-time-series-data-optional) by applying a rolling moving average window across the entire dataset with a user-defined size, helping to remove seasonality and identify underlying trends.
- [Compute the climatology for a subset of the data](#34-computing-the-normals-for-a-subset-of-the-data) by specifying start and end dates, enabling analysis on specific time periods of interest.

### 3.1 Compute normals at various time steps


#### Load the data

In [1]:
import pandas as pd
from smadi.data_reader import read_grid_point

# Set display options
pd.set_option("display.max_columns", 8)  # Limit the number of columns displayed
pd.set_option("display.precision", 2)  # Set precision to 2 decimal places

# Define the path to the ASCAT data
data_path = "/home/m294/ascat_dataset"

# Example: A grid point in Morocco
lon = -7.382
lat = 33.348

# Define the location of the observation point
loc = (lon, lat)  

# Extract ASCAT soil moisture time series for the given location
data = read_grid_point(loc=loc, ascat_sm_path=data_path, read_bulk=False , era5_land_path=None) #Provide the path to the ERA5-Land data if you want mask snow 
                                                                                                # and frozen soil conditions. For more information about
                                                                                                # the dataset see ERA5-Land data documentation and to download 
                                                                                                # use the CDS API or https://ecmwf-models.readthedocs.io/en/latest/

# Get the ASCAT soil moisture time series
ascat_ts = data.get("ascat_ts")


# Display the first few rows of the time series data
ascat_ts.head()

Reading ASCAT soil moisture: /home/m294/ascat_dataset
ASCAT GPI: 3611180 - distance:   23.713 m


Unnamed: 0,sm,sm_noise,as_des_pass,ssf,...,sigma40,sigma40_noise,num_sigma,sm_valid
2007-01-01 21:02:04.161,34.86,3.24,0,0,...,-12.27,0.19,3,True
2007-01-02 11:03:22.807,23.16,3.27,1,0,...,-13.05,0.19,3,True
2007-01-03 10:42:47.739,33.05,3.23,1,0,...,-12.39,0.19,3,True
2007-01-03 22:00:39.007,25.6,3.24,0,0,...,-12.88,0.19,3,True
2007-01-05 10:01:27.519,28.73,3.24,1,0,...,-12.67,0.19,3,True


#### Monthly Climatology

In [2]:
from smadi.climatology import Climatology

# Create a climatology object
cl = Climatology(df=ascat_ts, variable="sm" , agg_metric="mean") # agg_metric is the aggregation metric before computing the climatology
                                                                 # It can be "mean" , "sum", "max", "min", etc.

# Set the time step for computing the climatology
cl.time_step = (
    "month"  # Supported time steps are "month", "bimonth", "dekad","week", "day"
)


cl_df = cl.compute_normals()
cl_df.head(12)

Unnamed: 0,sm-mean,norm-mean
2007-01-31,33.69,56.54
2007-02-28,38.15,46.62
2007-03-31,25.63,36.69
2007-04-30,24.85,33.54
2007-05-31,24.21,28.73
2007-06-30,20.65,20.18
2007-07-31,18.21,17.3
2007-08-31,16.26,17.69
2007-09-30,19.23,21.68
2007-10-31,23.37,30.02


\*\*_Note: You can filter the result to get a specific date range by passing date parameters (year, month, day, etc.) to the `compute_normals` method._

For bimonth and dekad parameters, they are only eligible for use when the time_step is set to 'bimonth' or 'dekad' where :

- **Dekad:** Values range from 1 to 3 for each month, corresponding to the first, second, and third dekads of the month.
- **Bimonth:** Bimonth values are 1 or 2 for each month, corresponding to the first and second half of the month.\*\*

In [3]:
cl_df = cl.compute_normals(month=2)  # February
cl_df.head(16)

Unnamed: 0,sm-mean,norm-mean
2007-02-28,38.15,46.62
2008-02-29,44.75,46.62
2009-02-28,64.58,46.62
2010-02-28,78.51,46.62
2011-02-28,53.21,46.62
2012-02-29,25.46,46.62
2013-02-28,43.76,46.62
2014-02-28,46.8,46.62
2015-02-28,46.03,46.62
2016-02-29,42.02,46.62


#### Bimonthly Climatology

In [4]:
cl.time_step = "bimonth"
cl_df = cl.compute_normals(month=5, bimonth=2)  # The second half of May
cl_df.head(24)

Unnamed: 0,sm-mean,bimonth,norm-mean
2007-05-16,24.51,2,27.19
2008-05-17,15.44,2,27.19
2009-05-17,38.96,2,27.19
2010-05-17,40.61,2,27.19
2011-05-16,53.84,2,27.19
2012-05-16,15.11,2,27.19
2013-05-16,35.83,2,27.19
2014-05-16,15.53,2,27.19
2015-05-16,33.87,2,27.19
2016-05-16,18.85,2,27.19


In the above code:

-     cl_df : the resulted data frame  containing the SM monthly normals and average for each month
-     sm_avg : the monthly average for each month computed from the average of the daily observations
-     norm-mean: the monthly normal for each month computed from the sm-avg over the 16 years on observations (2007-2022)

### 3.2 Computing the normals using different metrics (median, max, ..etc)

To compute the normals using different metrics such as mean, median, minimum, and maximum, you can specify the desired metrics by passing a list containing the metrics of interest.

For example, to compute the normals using mean and median metrics, you can define the list of metrics as follows:

In [5]:
cl.normal_metrics = [
    "mean",
    "median",
]  # Supported metrics are "mean", "median", "std", "min", "max"

# Compute weekly-based climatology
cl.time_step = "week"
cl_df = cl.compute_normals(week=12)  # The 12th week of the year
cl_df.head(10)

Unnamed: 0,sm-mean,norm-mean,norm-median
2007-03-19,17.24,34.38,33.82
2008-03-17,35.11,34.38,33.82
2009-03-17,45.71,34.38,33.82
2010-03-22,41.76,34.38,33.82
2011-03-22,32.26,34.38,33.82
2012-03-19,10.64,34.38,33.82
2013-03-18,54.76,34.38,33.82
2014-03-17,21.66,34.38,33.82
2015-03-16,33.09,34.38,33.82
2016-03-22,50.84,34.38,33.82


In [6]:
# Compute normals with multiple metrics

# Set the metric for computing the climatology
cl.normal_metrics = ["mean", "median", "min", "max"]

cl.time_step = "dekad"
cl_df = cl.compute_normals(month=7, dekad=3)  # The third dekad of July

cl_df.head(12)

Unnamed: 0,sm-mean,dekad,norm-mean,norm-median,norm-min,norm-max
2007-07-22,16.2,3,17.24,17.43,11.74,23.73
2008-07-21,14.87,3,17.24,17.43,11.74,23.73
2009-07-21,18.01,3,17.24,17.43,11.74,23.73
2010-07-21,23.51,3,17.24,17.43,11.74,23.73
2011-07-21,17.28,3,17.24,17.43,11.74,23.73
2012-07-22,14.57,3,17.24,17.43,11.74,23.73
2013-07-22,18.76,3,17.24,17.43,11.74,23.73
2014-07-22,13.21,3,17.24,17.43,11.74,23.73
2015-07-22,17.77,3,17.24,17.43,11.74,23.73
2016-07-21,11.9,3,17.24,17.43,11.74,23.73


### 3.3 Filling the gaps and smoothing the time series data (optional)

In [7]:
# Apply filling the gaps and smoothing the time series

cl.fillna = True
cl.fillna_window_size = 3  # number of days to fill the gaps by their mean value

cl.smoothing = True
cl.smooth_window_size = 31  # The moving average window size

cl.time_step = "dekad"
cl.normal_metrics = ["mean", "median"]
cl_df = cl.compute_normals()

cl_df

Unnamed: 0,sm-mean,dekad,norm-mean,norm-median
2007-01-01,31.01,1,56.56,53.74
2007-01-11,32.91,2,56.15,55.37
2007-01-21,36.22,3,54.60,55.32
2007-02-01,37.59,1,51.10,50.80
2007-02-11,36.08,2,46.69,46.69
...,...,...,...,...
2022-11-11,40.30,2,49.92,44.94
2022-11-21,52.38,3,55.90,54.35
2022-12-01,64.36,1,59.26,59.08
2022-12-11,70.12,2,60.27,59.25


### 3.4 Computing the normals for a subset of the data

To work on a subset of the data instead of the entire historical record, users can specify the timespan, a class attribute of the Climatology class. By providing the timespan parameter, users can restrict the computation to a specific time period of interest, allowing for focused analysis within a defined timeframe.

In [8]:
# set start and end date for the climatology by providing 'timespan' parameter

cl.timespan = ("2010-01-01", "2020-12-31")  # ('start_date', 'end_date')
cl.time_step = "week"
cl.normal_metrics = ["mean", "median", "min", "max"]

cl.time_step = "month"

cl_df = cl.compute_normals(month=1)
cl_df

Unnamed: 0,sm-mean,norm-mean,norm-median,norm-min,norm-max
2010-01-31,76.25,55.39,54.38,35.16,76.25
2011-01-31,61.99,55.39,54.38,35.16,76.25
2012-01-31,40.92,55.39,54.38,35.16,76.25
2013-01-31,55.06,55.39,54.38,35.16,76.25
2014-01-31,54.38,55.39,54.38,35.16,76.25
2015-01-31,64.39,55.39,54.38,35.16,76.25
2016-01-31,35.16,55.39,54.38,35.16,76.25
2017-01-31,52.05,55.39,54.38,35.16,76.25
2018-01-31,68.01,55.39,54.38,35.16,76.25
2019-01-31,50.95,55.39,54.38,35.16,76.25
