# Input/Output

In this section, we illustrate how to input and output **gridded** climate data with `cfr`.

`cfr` provides a useful class called `ClimateField` to handle **gridded** climate data.
It is essentially a wrapper of a `xarray.DataArray`, but with additional analysis and visualization functionalities added on.

Essentially, `cfr` supports below conversions:

- a netCDF file <=> `cfr.ClimateField`
- `xarray.DataArray` <=> `cfr.ClimateField`

Required data to complete this tutorial:

- GISTEMP surface temperature: [gistemp1200_GHCNv4_ERSSTv5.nc](https://data.giss.nasa.gov/pub/gistemp/gistemp1200_GHCNv4_ERSSTv5.nc.gz)
- HadCRUTv5 surface temperature: [HadCRUT.5.0.1.0.anomalies.ensemble_mean.nc](https://www.metoffice.gov.uk/hadobs/hadcrut5/data/current/non-infilled/HadCRUT.5.0.1.0.anomalies.ensemble_mean.nc)

In [21]:
%load_ext autoreload
%autoreload 2

import cfr
import pandas as pd
import numpy as np
import xarray as xr
import os
os.chdir('/glade/u/home/fengzhu/Github/cfr/docsrc/notebooks/')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## a netCDF file => `cfr.ClimateField`

A simplest case is that the netCDF file contains only one variable with standard names for time, latitude, and longitude: `time`, `lat`, `lon`.
In this case, we just create a `cfr.ClimateField` object and call its `.load_nc()` method with a path to the netCDF file as the argument.

Sometimes, however, the netCDF file comes with multiple variables, in which case we need to specify the variable name via the `vn` argument:

In [23]:
dirpath = './data'
fd = cfr.ClimateField().load_nc(
    os.path.join(dirpath, 'gistemp1200_GHCNv4_ERSSTv5.nc'),  # path to the netCDF file
    vn='tempanomaly',  # specify the name of the variable to load
)
fd.da  # check the loaded `xarray.DataArray`

If the netCDF file names the latitude and longitude dimensions with other differently, we will also need to speicify them.
For instance, below we are trying to load the HadCRUT dataset, which have different names for the coordinates:

In [24]:
ds = xr.open_dataset(os.path.join(dirpath, 'HadCRUT.5.0.1.0.analysis.anomalies.ensemble_mean.nc'))
ds

We specify the `lon_name` and `lat_name` in the arguments so that the `.load_nc()` method can load the data correctly.
`time_name` may be also specified if its named differently in the netCDF file.
Once loaded, those coordinates will be renamed to the standard `time`, `lat`, and `lon`.

In [26]:
dirpath = './data'
fd = cfr.ClimateField().load_nc(
    os.path.join(dirpath, 'HadCRUT.5.0.1.0.analysis.anomalies.ensemble_mean.nc'),
    time_name='time',       # specify the name of the time dimension
    lon_name='longitude',   # specify the name of the lontitude dimension
    lat_name='latitude',    # specify the name of the latitude dimension
    vn='tas_mean',          # specify the name of the variable to load
)
fd.da  # check the loaded `xarray.DataArray`

## `cfr.ClimateField` => a netCDF file

A `cfr.ClimateField` can be output to a netCDF file easily with the `.to_nc()` method:

In [28]:
fd.to_nc('./data/fd.nc')

[32m[1mClimateField.da["tas_mean"] saved to: ./data/fd.nc
[0m

Now let's check if the saved netCDF file looks fine:

In [30]:
da = xr.open_dataarray('./data/fd.nc')
da

## `xarray.DataArray` => `cfr.ClimateField`

Sometimes, we will load a `xarray.DataArray` first, after which we may convert it to a `cfr.ClimateField` with the `.from_da()` method:

In [31]:
fd = cfr.ClimateField().from_da(da)
fd.da

## `cfr.ClimateField` => `xarray.DataArray`

The convertion from a `cfr.ClimateField` to a `xarray.DataArray` is trivial: simply access the `.da` attribute:

In [33]:
da = fd.da
da