# Definining the time periods for the creation of products

They are several ways to define the periods over which the climatologies have to be computed.     
In this notebook we present different approaches:
1. `TimeSelectorYW`,
2. `TimeSelectorYearListMonthList` and 
3. `TimeSelectorRunningAverage`.     

The most common is `TimeSelectorYearListMonthList` (which behaves similarly than the `yearlist` and `monthlist` files in the Fortran version of `DIVA`).

In [1]:
using DIVAnd
using Dates
using Statistics

## Specify lists of months and of years
Use `TimeSelectorYearListMonthList`.      
Let's work on two time periods: 1970-1990 and 1991-2010, on a monthly basis

In [2]:
yearlist = [[1970:1990], [1991:2010]];
monthlists = 1:12;
TS1 = DIVAnd.TimeSelectorYearListMonthList(yearlist, monthlists)

TimeSelectorYearListMonthList{Vector{Vector{UnitRange{Int64}}}, UnitRange{Int64}}(Vector{UnitRange{Int64}}[[1970:1990], [1991:2010]], 1:12)

In [3]:
@show length(TS1);

length(TS1) = 24


Another example:

In [4]:
yearlist = [1900:2017]
monthlist = [1:3,4:6,7:9,10:12]
TS1b = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist);
@show length(TS1b);

length(TS1b) = 4


Assume that we have a time vector with these dates:

In [5]:
obstime = [DateTime(2001,4,1),DateTime(2002,2,1),DateTime(2018,3,1)]

3-element Vector{DateTime}:
 2001-04-01T00:00:00
 2002-02-01T00:00:00
 2018-03-01T00:00:00

Which observation would be used for the first winter analysis?

In [6]:
sel = DIVAnd.select(TS1b,1,obstime)

3-element BitVector:
 0
 1
 0

In [7]:
obstime[sel]

1-element Vector{DateTime}:
 2002-02-01T00:00:00

Note that a time instance in the "center" of a given time interval is given by `DIVAnd.ctimes(TS)`.      
These dates are saved in the NetCDF file together with the `climatology_bounds` from the [NetCDF CF convention](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#climatological-statistics).

In [8]:
DIVAnd.ctimes(TS1b)

4-element Vector{DateTime}:
 1958-02-16T00:00:00
 1958-05-16T00:00:00
 1958-08-16T00:00:00
 1958-11-16T00:00:00

In [9]:
yearlist = [y:y+9 for y in 1950:10:2000]

6-element Vector{UnitRange{Int64}}:
 1950:1959
 1960:1969
 1970:1979
 1980:1989
 1990:1999
 2000:2009

In [10]:
yearlist2 = []
for y in 1950:10:2020
    push!(yearlist2, y:y+9)
end
@show yearlist2

yearlist2 = Any[1950:1959, 1960:1969, 1970:1979, 1980:1989, 1990:1999, 2000:2009, 2010:2019, 2020:2029]


8-element Vector{Any}:
 1950:1959
 1960:1969
 1970:1979
 1980:1989
 1990:1999
 2000:2009
 2010:2019
 2020:2029

Note that the duration of every year range is 10 years because the upper bound is inclusive.      
The last year range covers the 10 years:

In [11]:
collect(yearlist[end])'

TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist);

For this time selector, there are now $4 × 6=24$ time slices

In [12]:
length(TS)

DIVAnd.ctimes(TS)[1:3]

3-element Vector{DateTime}:
 1954-02-16T00:00:00
 1954-05-16T00:00:00
 1954-08-16T00:00:00

## Specify lists of months and years with a time window
Use `TimeSelectorYW`.     
Let's work with 10-year window periods centered on 1950, 1960, 1970... 

In [13]:
years = 1950:10:2010;
yearwindow = 10;
monthlists = 1:12;
TS2 = TimeSelectorYW(years,yearwindow,monthlists)

TimeSelectorYearListMonthList{Vector{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, UnitRange{Int64}}(StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}[1945.0:1.0:1955.0, 1955.0:1.0:1965.0, 1965.0:1.0:1975.0, 1975.0:1.0:1985.0, 1985.0:1.0:1995.0, 1995.0:1.0:2005.0, 2005.0:1.0:2015.0], 1:12)

Note that with `TimeSelectorYW`, we can almost obtain the same solution as the 1st case:

In [14]:
TS2b = TimeSelectorYW([1980, 2000], 20, monthlists)

TimeSelectorYearListMonthList{Vector{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, UnitRange{Int64}}(StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}[1970.0:1.0:1990.0, 1990.0:1.0:2010.0], 1:12)

## Specify the total year range and the total window
The difference with the previous case is that the minimal and maximal years are conserved in the created periods.     
Thanks to Lennert (VLIZ) for providing the example and the code.     

In [15]:
function yearlists_(dataset_range, total_window_yrs)
    # dataset_range = 2000:2012
    # total_window_yrs = 10
    # will return: [2000:2009, 2001:2010, 2002:2011, 2003:2012]
    
    n_windows = length(dataset_range) - total_window_yrs + 1
    a = Array{UnitRange{Int64}, 1}(undef, n_windows)
    
    for i = 1:n_windows
        a[i] = dataset_range[i]:(dataset_range[i] + total_window_yrs -1)
    end
    return(a)
end

yearlists_ (generic function with 1 method)

In [16]:
yearlists = yearlists_(1990:2010, 10);
TS3 = TimeSelectorYearListMonthList(yearlists,monthlists)
@show(TS3.yearlists[1]);
@show(TS3.yearlists[2])

TS3.yearlists[1] = 1990:1999
TS3.yearlists[2] = 1991:2000


1991:2000

## Time aggregation in climatologies

For a climatology, there are different ways to aggregate data in time. Common ways are:
* monthly climatology, aggregating all observations per month
* seasonal climatology
* yearly climatology
* decadal climatology

If the data coverage is sufficient, one can also make a seasonal climatology per decades which allows one to resolve the seasonal cycle and long term changes.

### Overlapping years

Sometimes is it desirable to have overlapping year range to make a climatology similar to a running average.   
This can be achieved by a suitable definition of `yearlist`:

In [17]:
yearlist = [y:y+5 for y in 1990:2000]

11-element Vector{UnitRange{Int64}}:
 1990:1995
 1991:1996
 1992:1997
 1993:1998
 1994:1999
 1995:2000
 1996:2001
 1997:2002
 1998:2003
 1999:2004
 2000:2005

Every time slice is a 6-year average form data from the same season and there are $4 × 11=44$ time slices in this example. 

In [18]:
monthlist

4-element Vector{UnitRange{Int64}}:
 1:3
 4:6
 7:9
 10:12

In [19]:
TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist);
length(TS)

44

Since the data is overlapping, the same observation is used in multiple time instances:

In [20]:
obstime = [DateTime(2000,1,1)]
for n = 1:length(TS)
    nobs = sum(DIVAnd.select(TS,n,obstime))
    if nobs > 0
        println("$nobs observation(s) are used in time slice $n")
    end
end

1 observation(s) are used in time slice 21
1 observation(s) are used in time slice 25
1 observation(s) are used in time slice 29
1 observation(s) are used in time slice 33
1 observation(s) are used in time slice 37
1 observation(s) are used in time slice 41


As expected an observation is used 6 times.