# Tutorial OceanDataset

If you need to install OceanSpy dependencies, set this True

In [1]:
if False:
    import sys
    !conda install --yes --prefix {sys.prefix} dask distributed bottleneck netCDF4
    !conda install --yes --prefix {sys.prefix} -c conda-forge xarray cartopy esmpy
    !{sys.executable} -m pip install geopy xgcm xmitgcm xesmf

Now install OceanSpy from my brunch

OceanSpy can be easily imported by doing `import oceanspy`. I like using the alias `ospy`.
OceanSpy is TAB-friendly, which means that you can get a lot of information using the TAB button.  

OceanDataset objects are used by all OceanSpy functions. They combine [`xarray.Dataset`](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html) with several other objects/variables needed by OceanSpy, such as [`xgcm.Grid`](https://xgcm.readthedocs.io/en/stable/grids.html#Grid-Objects), parameters, coefficients, variable names, some plotting info in the future, ...

To create an OceanDataset, you only need a `dataset`.  
Let's use one of the `xarray` tutorial datasets to initialize an `oceandataset`.

In [2]:
import oceanspy as ospy
import xarray as xr
ds = xr.tutorial.open_dataset('rasm')
od = ospy.OceanDataset(ds)

When you print an `od`, the attributes added to the `ds` are shown.  
We didn't add anything yet, so `od` only shows the original `ds`. There are several methods to add attributes. Check out the documentation!

In [3]:
od

<oceanspy.OceanDataset>

Main attributes:
   .dataset: <xarray.Dataset>

Let's add a name and a description:

In [4]:
name = 'OceanDataset #1'
desc = 'My first OceanDataset!'
od = od.set_name(name).set_description(desc)
od

<oceanspy.OceanDataset>

Main attributes:
   .name: OceanDataset #1
   .description: My first OceanDataset!
   .dataset: <xarray.Dataset>

`oceanspy.OceanDataset` follows 2 main rules.  

#### Rule #1: Spies look for secret information.   
Whenever you ask for an attribute, let's say `name`, OceanDataset is actually reading and decoding information stored as global attributes in the `dataset`.  
In this simple case, `OceanSpy_name` and `OceanSpy_description` have been added to the dataset.

In [5]:
od.dataset.attrs

OrderedDict([('title',
              '/workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc'),
             ('institution', 'U.W.'),
             ('source', 'RACM R1002RBRxaaa01a'),
             ('output_frequency', 'daily'),
             ('output_mode', 'averaged'),
             ('convention', 'CF-1.4'),
             ('references',
              'Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429.'),
             ('comment',
              'Output from the Variable Infiltration Capacity (VIC) model.'),
             ('nco_openmp_thread_number', 1),
             ('NCO', '"4.6.0"'),
             ('history',
              'Tue Dec 27 14:15:22 2016: ncatted -a dimensions,,d,, rasm.nc rasm.nc\nTue Dec 27 13:38:40 2016: ncks -3 rasm.nc rasm.nc\nhistory deleted for brevity'),
             ('OceanSpy_name', 'OceanDataset #1'),
             ('OceanSpy_description', 'My first OceanDataset!')])

If we want to change some `od` attribute, you need to change the `ds` global attributes.  
We can NOT do this:

In [6]:
# Wrong way to do it!
new_name = 'OceanDataset #2'
new_desc = 'I learned how to change attributes!'
od.name = new_name

AttributeError: Set new `name` using .set_name

We have to use the class methods:

In [None]:
new_od = od.set_name(new_name, overwrite=True).set_description(new_desc, overwrite=True)
new_od

#### Rule #2: Spies use fake identities.
The main reason behind this rule is that we want to be able to use OceanSpy with any dataset.
However, different datasets (especially products from different models) use different variable names.  
The dataset returned by `.dataset` is the fake identity! The real dataset is under `._ds` (similarly, there is `._grid`).  
`.dataset` looks at `._ds`, and returns it renaming the variables. So, OceanSpy's function always operate on `._ds` and `._grid`, while users work `.dataset` and `.grid`. 

Here is an example: `od.dataset` has dimension `x` and `y`, and coordinates `xc`, and `yc`.  
But OceanSpy reference names for this variables are capitalized. We need to inform OceanSpy that `x` is actually `X`, `y` is `Y`, and so on.

If we try to rename the variables under `.dataset`, nothing will actually change:

In [7]:
# Wrong way to do it!
od.dataset['X'] = od.dataset['x']
od.dataset

<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0001-01-01 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...
    OceanSpy_name:             OceanDataset #1
    OceanSpy_description:      My first OceanD

We have to use the proper class method:

In [8]:
name    = 'OceanDataset #3'
desc    = 'I learned how to set aliases!' 
aliases = {'X': 'x', 'Y': 'y', 'XC': 'xc', 'YC': 'yc'} 
od_aliases = od.set_name(name, overwrite=True).set_description(desc, overwrite=True).set_aliases(aliases) 

print('\n\n')
print('od_aliases.dataset has the same names of the original dataset:')
print(od_aliases.dataset)
print('\n\n')
print('od_aliases._ds has the OceanSpy reference names:')
print(od_aliases._ds)




od_aliases.dataset has the same names of the original dataset:
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0001-01-01 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...
    OceanSpy_name:         

OceanSpy internal functions should always use `_ds` and `_grid`, while users should use `dataset` and `ds`. For example, let's create two simple OceanSpy functions:
1. Round latitute and longitude.
2. Compute the difference between rounded and not rounded coordinates

In [9]:
def ospy_round_coords(od):
    # We need to do a shallow copy first because we want to keep the input od as it is.
    import copy
    od = copy.copy(od)
    # Now we can add new variables to od
    od._ds['XC_round'] = od._ds['XC'].round()
    od._ds['YC_round'] = od._ds['YC'].round()
    return od

def ospy_diff_round_coords(od):
    # We need to do a shallow copy first because we want to keep the input od as it is.
    import copy
    od = copy.copy(od)
    # Now we can add new variables to od
    od._ds['XC_diff_round'] = od._ds['XC_round'] - od._ds['XC']
    od._ds['YC_diff_round'] = od._ds['YC_round'] - od._ds['YC']
    return od
od_round_dev = ospy_round_coords(od_aliases)
od_round_dev = ospy_diff_round_coords(od_round_dev)
od_round_dev.dataset

<xarray.Dataset>
Dimensions:        (time: 36, x: 275, y: 205)
Coordinates:
  * time           (time) object 0001-01-01 00:00:00 ... 0001-01-01 00:00:00
    xc             (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    yc             (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Dimensions without coordinates: x, y
Data variables:
    Tair           (time, y, x) float64 ...
    XC_round       (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    YC_round       (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    XC_diff_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    YC_diff_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:         

This is how we can add new variables to the oceandataset:

In [10]:
ds_user = od_aliases.dataset
ds_user['xc_round'] = ds_user['xc'].round()
ds_user['yc_round'] = ds_user['yc'].round()
od_round_user = ospy.OceanDataset(ds_user)
od_round_user.dataset

<xarray.Dataset>
Dimensions:   (time: 36, x: 275, y: 205)
Coordinates:
  * time      (time) object 0001-01-01 00:00:00 ... 0001-01-01 00:00:00
    xc        (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
    yc        (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
Dimensions without coordinates: x, y
Data variables:
    Tair      (time, y, x) float64 ...
    xc_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
    yc_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltra

Or, we can use OceanSpy methods:

In [11]:
xc_round = od_aliases.dataset['xc'].round().rename('xc_round')
yc_round = od_aliases.dataset['yc'].round().rename('yc_round')
od_round_user = od_aliases.add_DataArray(xc_round).add_DataArray(yc_round)
od_round_user.dataset
# Alternatively, you could merge the two DataArray first, the use OceanDataset.merge_Dataset()

<xarray.Dataset>
Dimensions:   (time: 36, x: 275, y: 205)
Coordinates:
  * time      (time) object 0001-01-01 00:00:00 ... 0001-01-01 00:00:00
    xc        (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
    yc        (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
Dimensions without coordinates: x, y
Data variables:
    Tair      (time, y, x) float64 ...
    xc_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
    yc_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltra

We want to tell OceanSpy that `xc_round` and `yc_round` are the same variables that OceanSpy knows as `XC_round`, `YC_rount`. Then, we can use the OceanSpy function that computes the difference between rounded and not rounded coordinates

In [12]:
# Wrong way to do it!
ospy_diff_round_coords(od_round_user)

KeyError: 'XC_round'

In [13]:
# Let's set the aliases first!
od_round_user = od_round_user.set_aliases({'XC_round': 'xc_round', 'YC_round': 'yc_round'}, overwrite=False)
od_round_user = ospy_diff_round_coords(od_round_user)
od_round_user.dataset

<xarray.Dataset>
Dimensions:        (time: 36, x: 275, y: 205)
Coordinates:
  * time           (time) object 0001-01-01 00:00:00 ... 0001-01-01 00:00:00
    xc             (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    yc             (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Dimensions without coordinates: x, y
Data variables:
    Tair           (time, y, x) float64 ...
    xc_round       (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    yc_round       (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    XC_diff_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    YC_diff_round  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:         

Do you prefer having `XC_diff_round` and `YC_diff_round` lowercase? Just add some alias!  
I doesn't matter if you set the aliases before or after running the function.

We can store your `od` in NetCDF format.

In [14]:
path = './my_first_od.nc'
od_round_user.to_netcdf('./my_first_od.nc')

Writing dataset to ./my_first_od.nc
[########################################] | 100% Completed |  0.1s


And re-open it:

In [15]:
od_from_nc = ospy.open_oceandataset.from_netcdf(path)
od_from_nc

Opening dataset from [./my_first_od.nc]


<oceanspy.OceanDataset>

Main attributes:
   .name: OceanDataset #3
   .description: I learned how to set aliases!
   .dataset: <xarray.Dataset>

More attributes:
   .aliases: <class 'dict'>

There are several method that allows to initialize an `oceandataset` properly. They create grid, they create missing dimensions, ... Most of these methods are used by `oceanspy.open_oceandataset`.