# How to guide
This guide will show how to carry out key nchack operations. We will use a sea surface temperature data set and a depth-resolved ocean temperature data set.
The data sets can be downloaded using wget as follows:

In [None]:
import nchack as nc
import pandas as pd
import xarray as xr


In [None]:
! wget ftp://ftp.cdc.noaa.gov/Datasets/COBE2/sst.mon.mean.nc
clear_output()

## How to select years and months
If we want to select specific years and months we can use the select_years and select_months method

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.select_years(1960)
tracker.select_months(1)
tracker.times()

## How to copy a data set
If you want to make a deep copy of a data set, use the built in copy method. This method will return a new data set. Importantly, this method will also register the current state of the data set in a list of "safe files" that is only available to the module. This ensures that temporary files are deleted correctly. 

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.select_years(1960)
tracker.select_months(1)
tracker1 = tracker.copy()
del tracker
tracker1.mean()
tracker1.to_xarray().sst.plot()

## How to clip to a region 
If you want to clip the data to a specific longitude and latitude box, we can use clip, with the longitude and latitude range given by lon and lat.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.clip(lon = [-80, 20], lat = [40, 70])
tracker.to_xarray().sst.mean(dim = "time").plot()

## How to rename a variable
If we want to rename a variable we use the rename method, and supply a dictionary where the key, value pairs are the original and new names

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.variables
tracker.rename({"sst": "temperature"})
tracker.variables

## How to create new variables
New variables can be created using arithmetic operations using either mutate or transmute. The mutate method will maintain the original variables, whereas transmute will not.
This method requires a dictionary, where the key, values pairs are the names of the new variables and the arithemtic operations to perform.
The example below shows how to create a new variable with 

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.mutate({"sst_k": "sst+273.15"})
tracker.variables

## How to calculate a spatial average
You can calculate a spatial average using the spatial_mean method. There are additional methods for maximums etc.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.spatial_mean()
tracker.to_xarray().sst.plot()

## How to calculate an annual mean
You can calculate an annual mean using the annual_mean method.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.spatial_mean()
tracker.annual_mean()
tracker.to_xarray().sst.plot()

## How to calculate a rolling average
You can calculate a rolling mean using the rolling_mean method, with the window argument providing the number of time steps to average over. There are additional methods for rolling sums etc.
The code below will calculate a rolling mean of global SST using a 20 year window.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.spatial_mean()
tracker.annual_mean()
tracker.rolling_mean(20)
tracker.to_xarray().sst.plot()

## How to calculate temporal anomalies
You can calculate annual temporal anomalies using the anomaly_annual method. This requires a baseline period.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.spatial_mean()
tracker.annual_anomaly(baseline = list(range(1960, 1979)))
tracker.to_xarray().anomaly.plot()

## How to split data by year etc
Files within a dataset can be split by year (split_year), day (split_day), year and month (split_year_month) or season (split_season). If we wanted to split by year and we can use the split_year method

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.split_year()
tracker.size

## How to merge files in time
We can merge files based on time using merge_time. The code below splits the netcdf files by year and then merges them using merge_time.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.split_year()
tracker.size
tracker.merge_time()
tracker.size

## How to do variables based merging
If we have two more more files that have the same time steps, but different variables, we can merge them using merge.
The code below will first create a dataset with a netcdf file with sst in K, and it will then create a new dataset with this netcd file and the original, and then merge them.

In [None]:
tracker1 = nc.open_data("sst.mon.mean.nc")
tracker2 = nc.open_data("sst.mon.mean.nc")
tracker2.transmute({"sst_k": "sst+273.15"})
new_tracker = nc.open_data([tracker1.current, tracker2.current])
new_tracker.current
new_tracker.merge()
new_tracker.variables

In some cases we will have two or more datasets we want to merge. In this case we can use the merge function as follows:

In [None]:
tracker1 = nc.open_data("sst.mon.mean.nc")
tracker2 = nc.open_data("sst.mon.mean.nc")
tracker2.transmute({"sst_k": "sst+273.15"})
new_tracker = nc.merge(tracker1, tracker2)
new_tracker.variables

## How to horizontally regrid data 
Variables can be regridded horizontally using regrid. This method requires the new grid to be defined. This can either be a pandas data frame, with lon/lat as columns, an xarray object, a netcdfile or a dataset.
I will demonstrate all three methods by regridding SST to the North Atlantic.
Let's begin by getting a grid for the North Atlantic.

In [None]:
new_grid = nc.open_data("sst.mon.mean.nc")
new_grid.clip(lon = [-80, 20], lat = [30, 70])
new_grid.select_months(1)
new_grid.select_years( 2000)

First, we will use the new dataset itself to do the regridding. I will calculate mean SST using the original data, and then regrid to the North Atlantic.

In [None]:
%%time
tracker = nc.open_data("sst.mon.mean.nc")
tracker.mean()
tracker.regrid(grid = new_grid)
tracker.to_xarray().sst.plot()

We can also do this using the netcdf, which is new_grid.current

In [None]:
%%time
tracker = nc.open_data("sst.mon.mean.nc")
tracker.mean()
tracker.regrid(grid = new_grid.current)
tracker.to_xarray().sst.plot()

In a similar way we can read the new_grid in as an xarray data set.

In [None]:
%%time
na_grid = xr.open_dataset(new_grid.current)
tracker = nc.open_data("sst.mon.mean.nc")
tracker.mean()
tracker.regrid(grid = na_grid)
tracker.to_xarray().sst.plot()

or we can use a pandas data frame. In this case I will convert the xarray data set to a data frame.

In [None]:
%%time
na_grid = xr.open_dataset(new_grid.current)
na_grid = na_grid.to_dataframe().reset_index().loc[:,["lon", "lat"]]
tracker = nc.open_data("sst.mon.mean.nc")
tracker.mean()
tracker.regrid(grid = na_grid)
tracker.to_xarray().sst.plot()

## How to temporally interpolate
Temporal interpolation can be carried out using time_interp. This method requires a start date (start) of the format YYYY/MM/DD and an end date (end), and a temporal resolution (resolution), which is either 1 day ("daily"), 1 week ("weekly"), 1 month ("monthly"), or 1 year ("yearly"). 

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.time_interp(start = "1990/01/01", end = "1990/12/31", resolution = "daily")

## How to calculate a monthly average from daily data
If you have daily data, you can calculate a month average using monthly_mean. There are also methods for maximums etc.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.time_interp(start = "1990/01/01", end = "1990/12/31", resolution = "daily")
tracker.monthly_mean()

## How to calculate a monthly climatology
CDO outputs the date of the final month.

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.select_years(list(range(1990, 2000)))
tracker.monthly_mean_climatology()
tracker.to_xarray().sst.plot(col = "time", col_wrap = 3)

## How to calculate a seasonal climatology

In [None]:
tracker = nc.open_data("sst.mon.mean.nc")
tracker.seasonal_mean_climatology()
tracker.to_xarray().sst.plot(col = "time")