# Xarray
`xarray` is an open source project and Python package that augments `NumPy` arrays by adding labeled dimensions, coordinates and attributes. `xarray` is based on the NetCDF data model, making it the appropriate tool to open, process, and create datasets in NetCDF format.

In this lesson we will learn about the two core objects in `xarray` the `xarray.DataArray` and the `xarray.Dataset`. We will also learn how to subset data from them.

## xarray.DataArray
The xarray.DataArray is the primary data structure of the xarray package:
- an n-dimensional array with labeled dimensions

We can think of it as representing a single variable in the NetCDF data format: it holds the variable’s values, dimensions, and attributes.

Apart from variables, dimensions, and attributes, xarray introduces one more piece of information to keep track of a dataset’s content: in xarray each dimension has at least one set of **coordinates**. A dimension’s coordinates indicate the dimension’s values(tick labels along a dimension)

For example, in our previous exercise about temperature measured in weather stations, latitude is a dimension, and the latitude’s coordinates are 30, 40, 50, 60, and 70 because those are the latitude values at which we are collecting temperature data. In that same exercise, time is a dimension, and its coordinates are 2022-09-1, 2022-09-02, and 2022-09-03.

In [4]:
#import the necessary packages:
import pandas as pd
import numpy as np
import xarray as xr


In [5]:
#Create an xarray.DataArray
#Step 1: Start by making a Numpy array of our temperature data
temp_data = np.array([np.zeros((5,5)),
         np.ones((5,5)),
         np.ones((5,5))*2]).astype(int)

**Dimensions and Coordinates**
To specify the dimensions of our upcoming xarray.DataArray, we must examine how we’ve constructed the numpy.ndarray holding the temperature data. The diagram below shows how the dimensions of temp_data are ordered: 
- the first dimension is time 
- the second is latitude
- the third is longitude

In [6]:
#names of dimensions:
dims = ('time', 'lat', 'lon')

#create coordiantes along each domension
coords = {'time': pd.date_range('2022-09-01','2022-09-03'),
         'lat': np.arange(70, 20, -10),
         'lon': np.arange(60,110, 10)}

In [8]:
#Add the Attributed (Metadata) as a dictionary
attrs = {'title': 'temp across weather station',
        'standard_name': 'air_temperature',
        'units': 'Degrees Celcius',}

In [9]:
#Initialize Xarray
temp = xr.DataArray(data = temp_data, 
                    dims = dims,
                    coords = coords,
                    attrs = attrs)
temp

## Subsetting Data from an Xarray:
An xarray.DataArray is a multi-dimensional array with laballed dimensions. To select data from it we need to specify which subsets along each dimension we are interested in. We can specify the data we need from each dimension either by relying on the dimension’s positions (dimension lookup by position) or by calling each dimension by its name (dimension lookup by name). Let’s see some examples.

**Example**

Suppose we want to know what was the temperature recorded by the weather station located at 40°0′N 80°0′E on September 1st, 2022.

There are two ways to do this:
- Dimension lookup by position
- Dimension lookup by name


In [10]:
# access dimensions by position, then use integers for indexing
temp[0,3,2]

In [11]:
# access dimensions by position, then use labels for indexing
temp.loc['2022-09-01', 40, 80]

In [12]:
# acess dimensions by name, then use integers for indexing
temp.isel(time=0, lon=2, lat=3)
# access dimensions by name, then use labels for indexing
temp.sel(time='2022-09-01', lat=40, lon=80)