# xarray

[xarray](https://xarray.pydata.org/en/stable/) [[github](https://github.com/pydata/xarray)] adds N-dimensional support to pandas

## Helper Libraries

xarray leverages other libraries for its io backends, including:

- netcdf4-python
- Pydap
- zarr
- cfgrib
- PyNIO
- scipy
- MetPy
- rasterio---see also [XArrayAndRasterio](https://github.com/robintw/XArrayAndRasterio) for some interop code.

## Additional Libraries

There are many [Xarray related projects](https://xarray.pydata.org/en/stable/related-projects.html), including:

### Geosciences

- MetPy
- salem!
- xarray-simlab
- xarray-topo

### Other

- sklearn-xarray
- psyplot!

## Libraries Using xarray

- [geoxarray](https://github.com/geoxarray/geoxarray)--immature
- ...

## Misc

- [xarray#2042](https://github.com/pydata/xarray/issues/2042) may be worth looking at for discussion of GeoTIFF, Rasterio, etc.


## xarray Tutorials

- https://geohackweek.github.io/nDarrays/ covers xarray as well as dask.
- https://github.com/jhamman/xarray_tutorial
- https://rabernat.github.io/research_computing/xarray.html covers xarray, previous lectures cover pandas and numpy.
- https://unidata.github.io/MetPy/latest/tutorials/xarray_tutorial.html


## Imports

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import pandas as pd
import xarray as xr

## xarray Overview


## xarray Data Structures

[Data Structures](https://xarray.pydata.org/en/stable/data-structures.html)

- `DataArray`

    > xarray.DataArray is xarray’s implementation of a labeled, multi-dimensional array.

- `Dataset`

    > `xarray.Dataset` is xarray’s multi-dimensional equivalent of a `DataFrame`. It is a dict-like container of labeled arrays (`DataArray` objects) with aligned dimensions. It is designed as an in-memory representation of the data model from the **netCDF** file format.

- `Coordinates`

- `Variable`

    - ref [Variable objects | xarray Internals](https://xarray.pydata.org/en/stable/internals.html#variable-objects)
    
The [API reference](https://xarray.pydata.org/en/stable/api.html) is a concise overview of functions, classes, methods, etc! It's nice since it also categorizes methods by function and points out which are from ndarray.


### DataArray

[`xarray.DataArray`](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html) [[source](https://github.com/pydata/xarray/blob/master/xarray/core/dataarray.py)]


`DataArray` objects have four key properties:

- `values`---The array’s data as a numpy.ndarray
- `dims`
- `coords`
- `attrs`---may be arbitrary

Other `DataArray` attributes include:

- `name` (optional)
- `data`---The array’s data as a dask or numpy array. Vs values? not used in docs!
- `dt`
- `dtype`

Additional attributes:

- `indexes`---relates to Pandas index for dimensions, see also `get_index()`

Note that the attributes `name`, `values`, `data`, `dt`, `variable` exist only on `DataArray` (not `Dataset`).

Likewise the ndarray attributes `ndim`, `shape`, `size`, `dtype` exist only on `DataArray`, though `nbytes` and `chunks` are also properties on `Dataset`.

`DataArray` methods include:

- `assign_coords()`
- `assign_attrs()`

...

- `rename()`---easy renaming of coordinates or array name (or variables and dimensions if used with Dataset?) clarify!?


what are?

.count()

.isel()

In [None]:
def print_dataarray(data):
    """Print properties of DataArray."""
    
    print 'dims', data.dims
    print 'coords', data.coords
    print 'attrs', data.attrs
    print 'name', data.name
    #print 'values', data.values

The `attrs` property is a dictionary that can be used to store arbitrary attributes.


[Working with Multidimensional Coordinates](https://xarray.pydata.org/en/stable/examples/multidimensional-coords.html)


### Dataset

[`xarray.Dataset`](http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html) [[source](https://github.com/pydata/xarray/blob/master/xarray/core/dataset.py)]

`Dataset` objects have four key properties:

- `dims`
- `data_vars`---Dictionary of DataArray objects corresponding to data variables
- `coords`---Dictionary of xarray.DataArray objects corresponding to coordinate variables
- `attrs`---

Other `Dataset` attributes include:

- ...

Note that the attributes `data_vars`, and `variables` exist only in `Dataset` (not in `DataArray`).

`Dataset` methods include:

- `info()`---unique to Dataset
- `assign()`---unique to Dataset

#### Notes

> Use dictionary indexing to pull out Dataset variables as DataArray objects.

> ...you can use attribute style access for reading (but not setting) variables and attributes

> Individual coordinates can be accessed from the coordinates (`coords`) by name, or even by indexing the data array itself...Coordinates can also be set or removed by using the dictionary like syntax.

The line between coordinates and variables is a little blurry. It seems like you can pull out coordinates, variables, and attributes using attribute style access.


### Coordinates


### Variable

[`xarray.Variable`](https://xarray.pydata.org/en/stable/generated/xarray.Variable.html)

[`xarray.IndexVariable`](https://xarray.pydata.org/en/stable/generated/xarray.IndexVariable.html)---used for coordinate variables, perhaps in a roundabout way, inherits from `Variable`

Defined in [`variable.py`](https://github.com/pydata/xarray/blob/master/xarray/core/variable.py)

TODO review the way DataArrayCoordinates and DatasetCoordinates (defined in [`coordinates.py`](https://github.com/pydata/xarray/blob/master/xarray/core/coordinates.py)) are used by DataArray and Dataset.



---

## Topics

### Indexing

*automatic indexing*

Note https://github.com/pydata/xarray/issues/2028

`where()` is also useful


### Masking

https://geohackweek.github.io/nDarrays/09-masking/


## Plot with xarray

[Plotting](https://xarray.pydata.org/en/stable/plotting.html)

The xarray plotting functions provide a thin wrapper around matplotlib.

`da.plot()` is convenience method

`da.plot.line()`, etc. methods accessible from the plot attribute

xarray.plot.plot(), xarray.plot.line(), etc. callable directly from xarray plot submodule

Most plotting functions require 1 or 2 dimensions, but `xr.imshow()` allows a third dimension as RGB or RGBA (specify in `rgb` argument).

What's with `infer_intervals` parameter in `pcolormesh()`? It relates to 

The [Parsing rasterio’s geocoordinates](https://xarray.pydata.org/en/stable/auto_gallery/plot_rasterio.html) example creates lat and lon coordinates from an existing crs.


## IO

xarray can work with a variety of file formats both local and networked.

[Reading and writing files](https://xarray.pydata.org/en/stable/io.html)

[`open_dataset()`](https://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html) and [`open_dataarray()`](https://xarray.pydata.org/en/stable/generated/xarray.open_dataarray.html) can open network files, incl netCDF and OpenDAP.

[`open_mfdataset()`](https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html)

`decode_cf=False` or `decode_times=False`. See [CF Metadata Conventions](http://cfconventions.org/).

The `engine` parameter may be used to specify `pynio`, `cfgrip`, `pseudonetcdf`, or `pydap` (why pydap?).

### MetPy

Simply importing metpy will activate accesors on xarray objects which adhere to CF conventions.

- x
- y
- vertical
- time

The

Additional attributes:

- `cartopy_crs`---get cartopy crs

`parse_cf()` may be necessary?

Reference:

- [Weather and climate data](https://xarray.pydata.org/en/stable/weather-climate.html)
- [xarray with MetPy Tutorial | MetPy](https://unidata.github.io/MetPy/latest/tutorials/xarray_tutorial.html)

MetPy/xarray interoperability is implemented in [metpy/xarray.py](https://github.com/Unidata/MetPy/blob/master/metpy/xarray.py).

## Topics

### Coordinates

Working with coordinates.

- `reset_coords()`
- `set_coords()` (Dataset only)

`assign_coords()`

`squeeze()`? `drop()`?

`drop_dims()` (Dataset only)

Be warned of shallow copies like from .rename(), since I observed that removing a coordinate using del affected the original! Is this a bug?


### Variables

`update()` is made more for combining Datasets, vs `assign()` which lets you use nice mapping shorthand (i.e. assignment to unquoted name).

`drop()`

### Attributes

`assign_attrs()`...

## Scratch


In [None]:
import netcdf4

In [None]:
xr.open_dataset('/Users/smathews/Downloads/N11W009.hgt')