# Time aggregation

For a climatology, there are different ways to aggregate data in time. Common ways are:
* monthly climatology, aggregating all observations per month
* seasonal climatology
* yearly climatology
* decadal climatology

If the data coverage is sufficient, one can also make a seasonal climatology per decades which allows to resolve the seasonal cycle and long term changes.

In `DIVAnd`, the temporal aggregation is represented by a structure called time selector. The most common is `TimeSelectorYearListMonthList` which behaves similarly than the `yearlist` and `monthlist` files the Fortran version of DIVA.

In [7]:
using Dates
using DIVAnd

In [2]:
?TimeSelectorYearListMonthList

search: [0m[1mT[22m[0m[1mi[22m[0m[1mm[22m[0m[1me[22m[0m[1mS[22m[0m[1me[22m[0m[1ml[22m[0m[1me[22m[0m[1mc[22m[0m[1mt[22m[0m[1mo[22m[0m[1mr[22m[0m[1mY[22m[0m[1me[22m[0m[1ma[22m[0m[1mr[22m[0m[1mL[22m[0m[1mi[22m[0m[1ms[22m[0m[1mt[22m[0m[1mM[22m[0m[1mo[22m[0m[1mn[22m[0m[1mt[22m[0m[1mh[22m[0m[1mL[22m[0m[1mi[22m[0m[1ms[22m[0m[1mt[22m



```
TS = TimeSelectorYearListMonthList(yearlists,monthlists)
```

The structure `TS` handles the time aggregation based on `yearlists` and `monthlists`. `yearlists` is a vector of ranges (containing start and end years), for example `[1980:1990,1990:2000,2000:2010]`.

`monthlists` is a vector of two-element vector (containing start and end months), for example `[1:3,4:6,7:9,10:12]`

If a month range spans beyond December, then all Months must be specified, e.g. example `[2:4,5:6,7:9,[10,11,12,1]]` or `[2:4,5:6,7:9,[10:12;1]]`. However using `[2:4,5:6,7:9,10:1]` (bug!) will result in an empty month range.

## Example

```julia
# seasonal climatology using all data from 1900 to 2017
# for winter (December-February), spring, summer, autumn

TS = DIVAnd.TimeSelectorYearListMonthList([1900:2017],[[12,1,2],[3,4,5],[6,7,8],[9,10,11]])
```


In [33]:
yearlist = [1900:2017]
monthlist = [1:3,4:6,7:9,10:12]

TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist)

TimeSelectorYearListMonthList{Array{UnitRange{Int64},1},Array{UnitRange{Int64},1}}(UnitRange{Int64}[1900:2017], UnitRange{Int64}[1:3, 4:6, 7:9, 10:12])

The number of time instances defined in this time selector is 4:

In [17]:
length(TS)

4

Assume that we have a time vector with these dates:

In [9]:
obstime = [DateTime(2001,4,1),DateTime(2002,2,1),DateTime(2018,3,1)]

3-element Array{DateTime,1}:
 2001-04-01T00:00:00
 2002-02-01T00:00:00
 2018-03-01T00:00:00

Which observation would be used for the first winter analysis?

In [12]:
sel = DIVAnd.select(TS,1,obstime)

3-element BitArray{1}:
 0
 1
 0

In [13]:
obstime[sel]

1-element Array{DateTime,1}:
 2002-02-01T00:00:00

Note that 

A time instance in the "center" of a give time insterval is given by `DIVAnd.ctimes(TS)`. These dates are saved in the NetCDF file together with the `climatology_bounds` from the [NetCDF CF convention](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#climatological-statistics).

In [6]:
DIVAnd.ctimes(TS)

4-element Array{Dates.DateTime,1}:
 1958-01-16T00:00:00
 1958-04-16T00:00:00
 1958-07-16T00:00:00
 1958-10-16T00:00:00

In [34]:
yearlist = [y:y+9 for y in 1950:10:2000]


6-element Array{UnitRange{Int64},1}:
 1950:1959
 1960:1969
 1970:1979
 1980:1989
 1990:1999
 2000:2009

Note that the duration of every year range is 10 years becasue the upper bound is inclusive. The last year range coveres the 10 years:

In [30]:
collect(yearlist[end])'

1×10 LinearAlgebra.Adjoint{Int64,Array{Int64,1}}:
 2000  2001  2002  2003  2004  2005  2006  2007  2008  2009

In [35]:
TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist);


For this time selector, there are now $4 × 6=24$ time slices

In [26]:
length(TS)

24

In [27]:
DIVAnd.ctimes(TS)[1:3]

3-element Array{DateTime,1}:
 1954-01-16T00:00:00
 1954-04-16T00:00:00
 1954-07-16T00:00:00

# Overlapping years

Sometimes is it desirable to have overlapping year range to make a climatology similar to a running average. This can be achieved by a suitable definition of `yearlist`:

In [37]:
yearlist = [y:y+5 for y in 1990:2000]


11-element Array{UnitRange{Int64},1}:
 1990:1995
 1991:1996
 1992:1997
 1993:1998
 1994:1999
 1995:2000
 1996:2001
 1997:2002
 1998:2003
 1999:2004
 2000:2005

Every time slice is a 6-year average form data from the same season and there are $4 × 11=44$ time slices in this example. 

In [39]:
TS = DIVAnd.TimeSelectorYearListMonthList(yearlist,monthlist);
length(TS)

44

Since the data is overlapping, the same observation are used in multiple time instances:

In [44]:
obstime = [DateTime(2000,1,1)]
for n = 1:length(TS)
    nobs = sum(DIVAnd.select(TS,n,obstime))
    if nobs > 0
        println("$nobs observation(s) are used in time slice $n")
    end
end

1 observation(s) are used in time slice 21
1 observation(s) are used in time slice 25
1 observation(s) are used in time slice 29
1 observation(s) are used in time slice 33
1 observation(s) are used in time slice 37
1 observation(s) are used in time slice 41


As expected an observations is used 6 times.