# Time selection for gridded datasets

There are several options for selecting from a gridded dataset based on time:
- Select data within a given time range
- Conditional selection (e.g., selecting only certain seasons, only daytime data, etc). 
  This can be achieved by selecting data from certain years, months, days of year, or hours of the day.

In [1]:
import pymovebank as pmv 
import xarray as xr 
import pandas as pd 

In [2]:
# Print the start and end time in the dataset
def print_dataset_start_end(ds):
    print(f"Dataset start: {ds.time.min().values}")
    print(f"Dataset end: {ds.time.max().values}")

In [3]:
# ECMWF dataset 
filein = pmv.get_path("ECMWF_subset.nc")
ds = xr.load_dataset(filein)

In [4]:
print_dataset_start_end(ds)

Dataset start: 2008-01-01T00:00:00.000000000
Dataset end: 2008-12-31T23:00:00.000000000


## Selecting data within a certain time range 

``select_time_range`` is used to select data within a given time range by specifying the 
start and end of the time range. If the start or end are not provided, the function will 
default to using the earliest or latest time in the dataset. 

In [5]:
# Selecting a time slice 
ds2 = pmv.select_time_range(ds, start_time='2008-02-01 05:00', end_time='2008-03-01 13:00')
print_dataset_start_end(ds2)

Dataset start: 2008-02-01T05:00:00.000000000
Dataset end: 2008-03-01T13:00:00.000000000


In [6]:
# Selecting a time slice - give only start time 
ds2 = pmv.select_time_range(ds, start_time = '2008-02-01')
print_dataset_start_end(ds2)

Dataset start: 2008-02-01T00:00:00.000000000
Dataset end: 2008-12-31T23:00:00.000000000


In [7]:
# Selecting a time slice - give only end time 
ds2 = pmv.select_time_range(ds, end_time = '2008-01-11')
print_dataset_start_end(ds2)

Dataset start: 2008-01-01T00:00:00.000000000
Dataset end: 2008-01-11T23:00:00.000000000


## Conditional selection 

``select_time_cond`` is used to select data from certain years, months, days of 
year, or hours of day. These conditions can be applied in combination, and can be 
specified as either a list of specific values or as a range. 

In [8]:
# ECMWF dataset 
filein = pmv.get_path("MOD13A1.006_500m_aid0001_all.nc")
ds = xr.load_dataset(filein)
print_dataset_start_end(ds)

Dataset start: 2000-02-18 00:00:00
Dataset end: 2009-02-18 00:00:00


The function can be used to select a list of specific (non-consecutive) years:

In [9]:
ds2 = pmv.select_time_cond(ds, years=[2000, 2005])

# Years in the resulting dataset
pd.unique(ds2.time.dt.year)

array([2000, 2005])

A range of years can also be specified:

In [10]:
ds2 = pmv.select_time_cond(ds, year_range=[2001,2004])

# Years in the resulting dataset
pd.unique(ds2.time.dt.year)

array([2001, 2002, 2003, 2004])

A list of specific values and a range can be used in combination:

In [11]:
ds2 = pmv.select_time_cond(ds, months=[1, 2], month_range=[10,12])

# Months in the resulting dataset
sorted(pd.unique(ds2.time.dt.month))

[1, 2, 10, 11, 12]

In [12]:
# ECMWF dataset 
filein = pmv.get_path("ECMWF_subset.nc")
ds = xr.load_dataset(filein)

Using a combination of different variables:

In [13]:
ds2 = pmv.select_time_cond(ds, years=[2008], dayofyear_range=[209,220], hour_range=[10,15])

In [14]:
# Days of year in the resulting dataset
sorted(pd.unique(ds2.time.dt.dayofyear))

[209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220]

In [15]:
# Hours of day in the resulting dataset
sorted(pd.unique(ds2.time.dt.hour))

[10, 11, 12, 13, 14, 15]