# Introduction to Xarray for Working with Labeled Numerical Data Arrays

<div class="alert alert-success">
    
## This notebook covers
- NumPy multidimensional array data structure
- NumPy data types
- Array creation
- Basic and advanced indexing
- Elementwise array calculations
- Broadcasting
- Mathmatical and logic functions
- Sorting, searching, and counting functions
- Views and copies
- Array manipulation
- Reading and writing array data
</div>

<div class="alert alert-warning">

## Reminders

Remember, you can use Jupyter's built-in table of contents (hamburger on the far left) to jump from heading to heading.

---

This notebook will run in the MSUpy conda environment, which you created in a previous lesson. To load the MSUpy environment in this notebook go to the Kernel tab, select Change Kernel, then select the MSUpy kernel in the pop up window.

---

To turn on line numbers for code cells go to View menu and click Show Line Numbers.

</div>

# I. Importing Necessary Packages

In [1]:
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob

# II. Introduction to the Xarray Data Structure

Xarray builds upon many other Python packages, including Numpy, Pandas, Scipy, netCDF4, Matplotlib and more. The Xarray package is most convenient when working with multi-dimensional data stored in netcdf data files. Xarray can also handle zarr, tiff, csv, hdf, and grib files but may require additional dependencies to be installed. 

From the [Xarray documentation](https://docs.xarray.dev/en/stable): "Xarray introduces labels in the form of dimensions, coordinates, and attributes on top of raw Numpy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience".

We'll cover what all that means shortly! Another gigantic benefit of Xarray is that it integrates well with the Dask package for parallel and distributed computing, which enables fast computation on large data. This aspect of Xarray is a bit too advanced for this course, but it's worth mentioning here, regardless. 

Xarray is under active community development and pushes new updates approximately monthly. This means that developers are actively working on improvements and expanded capabilities and that there will probably be useful updates more frequently than you may be used to.

## Data Structures - DataArray and Dataset

Xarray's core data structures are called the *DataArray* and the *Dataset*. A DataArray is an N-dimensional array of a single data variable with *labels* (metadata) that describe the array *dimensions*, *coordinates*, and *attributes* of the data. A Dataset contains one or more DataArrays which share one or more dimensions and coordinates. We will cover what all this Xarray terminology means below, but you can also find more detail in the [Xarray User Guide Terminology page](https://docs.xarray.dev/en/stable/user-guide/terminology.html).

Let's look at an Xarray Dataset object and walk through all its components. We'll load data from the file ```data/nclimgrid/nclimgrid_tmax_199401-202312.nc```. This netcdf file contains the monthly averages of daily maximum surface air temperature (tmax) for 30 years on a spatial grid. The original source of this data is [NOAA Monthly U.S. Climate Gridded Dataset (NClimGrid)](https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00332) but the file we are working with has been subset in time and clipped to the state of Mississippi.

We can use the function ```xr.open_dataset()``` to easily load data from the netcdf file into an Xarray Dataset object. As you can see below, when we print the Dataset object ```ds``` to the screen, we get a ton of information. This information is what we meant above by the terms "metadata" and "labels".

In [2]:
# read contents of netcdf file into a Dataset object
ds = xr.open_dataset('data/nclimgrid/nclimgrid_tmax_199401-202312.nc')
ds

A Dataset object will contain one or more DataArrays, which are listed under the "Data variables" section of the info printed to the screen. Our Dataset object ```ds``` contains one DataArray called ```tmax```. 

Click on the paper icon to the right of the DataArray ```tmax``` and you will see even more metadata labels. These are called *variable attributes*. Variable attributes are contained in a Python dictionary where the dictionary keys are the attribute names ("units", "standard_name", etc.) and the dictionary values are the attribute values (e.g., "degree_Celsius","air_temperature"). If we were writing our own data to a netcdf file, we could include any variable attributes we want to help describe the data. But generally, when you are working with climate data, the convention is to use the Climate and Forecast (CF) Metadata Conventions.

<div class="alert alert-danger">

**Sidebar: [Climate and Forecast (CF) Metadata Conventions](https://cfconventions.org)** 

The CF Conventions are essentially a set of rules for how climate data should be described and written to data files in order to promote standardized data processing, eliminate ambiguities, and facilitate data sharing. The data file we are working with uses the CF Conventions and that is why the ```tmax``` variable attributes have those specific names (e.g., "standard_name")- they come from the list of attributes in the [CF Metadata Conventions Appendix A: Attributes](https://cfconventions.org/cf-conventions/cf-conventions.html#attribute-appendix).  
</div>

Now, the data file itself may also have attributes called *file attributes* or *global attributes* that are separate from variable attributes. We can find these attributes in the print out above under the "Attributes" section. This particular file doesn't have any file attributes as indicated by the zero next to the Attributes section. You can imagine, though, that if you were to write your own netcdf file of data, you may want to include file attributes like "institution" or "Conventions" (these examples come from Appendix A of the CF Conventions linked above) to indicate inside your data file what institution created that file and what version of the CF Conventions was used.

Other information we can see about the ```tmax``` DataArray includes the data type (float32) and the dimension names and order (time,lat,lon). Unlike NumPy arrays, where axis 0, axis 1, axis 2, etc can represent anything, there is no ambiguity about what each dimension represents with Xarray data structures because they are labeled. If we look toward the top of the print out we can see each dimension length. The ```tmax``` DataArray, which is stored inside the ```ds``` Dataset, contains temperature data for 360 times at 116 latitudes and 85 longitudes.   

Each of our three dimensions is associated with a *coordinate variable*. Coordinate variables (or simply "coordinates") enable us to include very useful metadata about the dimensions of our DataArrays. Click the data stack icon next to any of the coordinates in the print out above and we see the time, latitude, and longitude values that are associated with the data variables inside this Dataset (in this file there's only ```tmax```). Each value of a coordinate is indexed along one dimension (either time, latitude, or longitude) to a position in the data variable ```tmax```. For example, we see that the time coordinate begins in January 1994 and ends in December 2023, and since time is a coordinate of ```tmax```, we know that these times match the indexes of the ```tmax``` time dimension. Click the paper icon next to the time coordinate and we see that each coordinate variable also has its own attributes! This is because coordinates are also DataArray structures. Coordinate variables are 1-dimensional DataArrays that hold data dimension values and dimension attributes. Coordinates are super important as they allow us to select data using labels instead of index positions as we will see shortly.   

Now let's pull the ```tmax``` DataArray out of the Dataset into a new variable in our notebook called ```tx```. We can access the variables in a dataset using a dot and the variable name.

In [66]:
tx = ds.tmax
tx

The print out tells us that our new variable ```tx``` is an Xarray DataArray object. Notice that all of the variable attributes and coordinates that we saw associated with ```tmax``` inside the Dataset are still attached. We also get a little preview of some of the data values (which isn't too useful in this case since the preview is all nan).

An alternative syntax for the above looks a lot like how we accessed a single Series from a Pandas DataFrame.

In [67]:
tx = ds['tmax']
tx

We can get a dictionary-like object of the data variables within a Dataset object using the Dataset property ```.data_vars```

In [68]:
ds.data_vars

Data variables:
    tmax     (time, lat, lon) float32 14MB nan nan nan nan ... 14.38 nan nan nan

To get a list of only the variable names in a dataset we can use Python's built-in ```list()``` function.

In [69]:
list(ds.data_vars)

['tmax']

This may be helpful if you need to automate the processing (using a loop) of multiple data variables within a Dataset. You could use the code above to get a list of variable names and then use those variable names one by one to access different data variables inside the Dataset.

## Accessing the Components of DataArray Objects

As we saw above, the main components in a DataArray structure are dimensions, coordinates, attributes, and the array of data values. We can access these components using various DataArray properties.
- ```.name``` returns the string name of the DataArray as it was named in the netcdf file
- ```.dims``` returns a tuple of dimension names
- ```.sizes``` returns a dictionary of dimension names and lengths
- ```.coords``` returns a dictionary-like object of coordinate information
- ```.attrs``` returns a dictionary of attributes
- ```.data``` returns the underlying NumPy array of data values 

In [70]:
# get DataArray name string
tx.name

'tmax'

In [71]:
# get tuple of dimension names
tx.dims

('time', 'lat', 'lon')

In [72]:
# get dictionary of dimension info
tx.sizes

Frozen({'time': 360, 'lat': 116, 'lon': 85})

In [73]:
# get dictionary of coordinate info
tx.coords

Coordinates:
  * time     (time) datetime64[ns] 3kB 1994-01-01 1994-02-01 ... 2023-12-01
  * lat      (lat) float32 464B 30.19 30.23 30.27 30.31 ... 34.9 34.94 34.98
  * lon      (lon) float32 340B -91.6 -91.56 -91.52 ... -88.19 -88.15 -88.1

In [74]:
# get all variable attributes
tx.attrs

{'references': 'GHCN-Monthly Version 3 (Vose et al. 2011), NCEI/NOAA, https://www.ncdc.noaa.gov/ghcnm/v3.php',
 'standard_name': 'air_temperature',
 'units': 'degree_Celsius',
 'valid_min': -100.0,
 'valid_max': 100.0,
 'long_name': 'Temperature, monthly average of daily maximums'}

To access only the data values in a DataArray without all the additional metadata attached we can use the ```.data``` property. Run the code cell below to see that the underlying structure holding the data values is a NumPy ndarray! We can think of the ```.data``` property as a conversion from an Xarray data structure to a NumPy data structure.

In [75]:
# get the underlying NumPy array of data values
print(type(tx.data))
tx.data

<class 'numpy.ndarray'>


array([[[  nan,   nan,   nan, ...,   nan,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan],
        ...,
        [  nan,   nan,   nan, ...,  6.77,  6.79,   nan],
        [  nan,   nan,   nan, ...,  6.78,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan]],

       [[  nan,   nan,   nan, ...,   nan,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan],
        ...,
        [  nan,   nan,   nan, ..., 13.13, 13.16,   nan],
        [  nan,   nan,   nan, ..., 13.1 ,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan]],

       [[  nan,   nan,   nan, ...,   nan,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan],
        [  nan,   nan,   nan, ...,   nan,   nan,   nan],
        ...,
        [  nan,   nan,   nan, ..., 18.04, 18.06,   nan],
        [  nan,   nan,   nan, ..., 17.98,   n

Inside a DataArray structure, attributes are attached to the data and coordinate variables and all the coordinate information is indexed to the data variable. This means **we can use coordinate names and attribute names to access specific coordinates and attributes directly from the data variable.** 

For example, we can access the DataArray object for a specific coordinate by using a dot and the coordinate name on our data variable.

In [76]:
# get a specific coordinate DataArray object
tx.lat

To access the attributes of a coordinate we can use the ```.attrs``` property on a coordinate DataArray object.

In [77]:
# get coordinate attributes
tx.lat.attrs

{'standard_name': 'latitude',
 'long_name': 'latitude',
 'units': 'degrees_north',
 'axis': 'Y',
 'valid_min': 24.562532,
 'valid_max': 49.3542}

Dictionary syntax will get us the attribute value paired to a particular attribute key.

In [78]:
tx.lat.attrs['standard_name']

'latitude'

Of course this also works for accessing any of the variable attributes attached to ```tmax```.

In [79]:
# get a specific variable attribute
tx.attrs['units']

'degree_Celsius'

Quick review of the syntax we've used and what type of object is returned:
- ```ds``` is a Dataset object
- ```tx = ds.tmax``` is a DataArray object 
- ```tx.data``` is a NumPy ndarray object
- ```tx.lat``` is a DataArray object
- ```tx.lat.attrs``` is a dictionary
- ```tx.lat.attrs['standard_name']``` is a string

Before we move on, you may have noticed that there is an "Indexes" section in the print out of each DataArray and Dataset we've looked at. Indexes are associated with coordinate variables. Coordinate variables are special in that their data values are held in two underlying data structures: a NumPy array and a Pandas Index. This is so the Xarray package can build functionality on top of both the NumPy and Pandas packages. The coordinate variable indexes in a DataArray enable fast label-based indexing as we'll see next. These indexes are working behind the scenes and we don't have to worry about accessing these indexes directly.

You are probably realizing by now that the Xarray data structures are very complex as compared to the other data structures we've learned about thus far in the course. But this complexity will actually make data analysis a lot simpler and less error prone.

<div class="alert alert-info"> 

## Exercise 1: Getting Familiar with the Components of a DataArray

Use the variable ```tx``` that we've already created to complete the exercise.

A) What is the terminology for  ```insert pic with variable attributes circled```?
</div>

Type your answer here: variable attributes or attributes

<div class="alert alert-info"> 

B) What type of object will ```tx.time``` return?
</div>

Type your answer here: DataArray

<div class="alert alert-info"> 

C) What type of object will ```tx.time.data``` return?
</div>

Type your answer here: NumPy array

<div class="alert alert-info"> 

D) What type of object will ```tx.time.attrs``` return?
</div>

Type your answer here: dictionary

<div class="alert alert-info"> 

E) Save the ```tx``` longitude values to a new DataArray variable called ```lon```.
</div>

In [80]:
# add your code here
lon = tx.lon

<div class="alert alert-info"> 

F) Print the value of the ```tx``` variable attribute ```long_name```.
</div>

In [81]:
# add your code here
tx.attrs['long_name']

'Temperature, monthly average of daily maximums'

<div class="alert alert-info"> 

G) Print the value of the ```tx``` time coordinate attribute ```long_name```.
</div>

In [82]:
# add your code here
tx.time.attrs['long_name']

'Time, in monthly increments'

<div class="alert alert-info"> 

H) Convert the values of the ```tx``` time coordinate to a NumPy ndarray and save them to a new variable called ```time```.
</div>

In [83]:
# add your code here
time = tx.time.data
type(time)

numpy.ndarray

## Properties of Xarray DataArrays that Come from NumPy

The Xarray package incorporates much of the same functionality for its DataArray structure that NumPy has for its ndarray data structure. Here are some of the array properties we covered in the NumPy lesson that are also available with Xarray. These Xarray properties provide information only about an underlying NumPy array of data inside an Xarray DataArray structure. Documentation can be found in the [Xarray API Reference](https://docs.xarray.dev/en/latest/api.html#ndarray-attributes).

- ```xr.DataArray.shape```, tuple of dimension lengths
- ```xr.DataArray.ndim```, number of dimensions
- ```xr.DataArray.size```, total number of elements
- ```xr.DataArray.dtype```, data type
- ```xr.DataArray.nbytes```, total bytes consumed by the NumPy array

In [84]:
# tx is a DataArray
# get information about the underlying NumPy Array of data

print(tx.shape)   # tuple of dimension lengths
print(tx.ndim)    # number of dimensions
print(tx.size)    # total number of elements (360*116*85)
print(tx.dtype)   # data type
print(tx.nbytes)  # total bytes consumed by the NumPy array

(360, 116, 85)
3
3549600
float32
14198400


In [85]:
# the coordinate tmax.time is a DataArray
# get information about the underlying NumPy Array of data

print(tx.time.shape)   # tuple of dimension lengths
print(tx.time.ndim)    # number of dimensions
print(tx.time.size)    # total number of elements 
print(tx.time.dtype)   # data type
print(tx.time.nbytes)  # total bytes consumed by the NumPy array

(360,)
1
360
datetime64[ns]
2880


## Estimating Memory Usage

Notice how the properties above provide information about a single underyling NumPy array within an Xarray DataArray structure. What if we want to know how much memory is consumed by the entire ```tx``` DataArray structure? Because there are so many components of a DataArray structure, we would need the ```.nbytes``` of the underlying NumPy array of temperature data, the ```.nbytes``` of underlying NumPy arrays for all the coordinates, and the ```pd.Index.memory_usage()``` of the underlying Pandas Indexes of all the coordinates. We would also need to know how much memory is consumed by all of the attributes and other labels. **There is no convenient function for estimating the memory consumption of an entire DataArray object.** The best we can do is sum the ```.nbytes``` of all the underlying NumPy arrays. But this shouldn't be too much of an inconvenience since generally, attributes and other labels take up a negligible amount of memory anyway. 

In [86]:
print('tx array',tx.nbytes/1E6,'MB')
print('time array',tx.time.nbytes/1E6,'MB')
print('lat array',tx.lat.nbytes/1E6,'MB')
print('lon array',tx.lon.nbytes/1E6,'MB')
print('time index',tx.indexes['time'].memory_usage()/1E6,'MB')
print('lat index',tx.indexes['lat'].memory_usage()/1E6,'MB')
print('lon index',tx.indexes['lon'].memory_usage()/1E6,'MB')

array_MBs = (tx.nbytes + 2*tx.time.nbytes + 2*tx.lat.nbytes + 2*tx.lon.nbytes)/1E6
print('entire tx DataArray structure is approximately',array_MBs,'MB')

tx array 14.1984 MB
time array 0.00288 MB
lat array 0.000464 MB
lon array 0.00034 MB
time index 0.00288 MB
lat index 0.000464 MB
lon index 0.00034 MB
entire tx DataArray structure is approximately 14.205768 MB


Even the coordinate arrays and indexes in the ```tx``` DataArray only occupy a fraction of a megabyte. This may not be the case for variables with many more times, latitude, or longitudes, but the coordinates will always occupy much much less memory than the data array. So really, for small variables like our ```tx```, a good estimate of the total size of the complex DataArray structure is simply the size of the underlying NumPy array of data ```tx.nbytes```. 

## Data Types and Conversion

Netcdf files can contain binary, numeric, and string data, so Xarray is built to handle all these data types as well. When reading data from a netcdf file with Xarray, a lot happens behind the scenes as the file contents are divvied up into the different components of an Xarray data structure. Climate data variables and their coordinates from a netcdf file are usually interpreted as a NumPy numeric or datetime data type when they are read from a file into NumPy arrays (within the DataArray structure). Labels and attributes from a file are read into strings, tuples of strings, and dictionaries of strings as we've already seen.

Xarray has implemented the ```.astype()``` function from NumPy for Xarray DataArrays so we can easily convert data types just as we did with NumPy. The same potential pitfalls about unsafe data type conversions apply with data in Xarray structures just as they did with NumPy. We won't cover that again here, but look back to the NumPy lesson if you need a refresher.

Let's convert ```tx``` from data type float32 to type float16.

In [87]:
tx = tx.astype(np.float16)
tx

Notice how the data type of the ```tx``` array changed but the data types of its coordinates remained the same. The temperature data and the coordinates that are indexed to that data are all stored in separate NumPy arrays within the Xarray DataArray structure. This is why changing the data type of one underlying NumPy array will not affect the type of any other underlying NumPy array within a DataArray object. We could also change the data type of a coordinate if we wanted to.

In [88]:
tx.coords['lon'] = tx.lon.astype(np.float64)
tx

# III. Indexing and Slicing

Now we're getting to the good stuff! Let's look at how the dimension labels and coordinates of a DataArray allow us to use label-based indexing and slicing.

## Label-based Selection with .sel()

The DataArray function [```.sel()```](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.sel.html#xarray.DataArray.sel) allows us to use dimension and coordinate labels to select parts of a data variable.

### Indexing

Let's select temperature data at a single time.

In [89]:
tx.sel(time='2000-01-01')

Notice what was returned. We recieved a 2-dimensional DataArray: all the latitudes and longitudes of data for January 2000. If we think of our data as spatial maps of temperature arranged in a stack, where each map in the stack represents a different point in time, then we just selected a single map from the stack. This is similar to what we did in the NumPy lesson, except this time we have labels to make this selection much more clearly. We don't have to know which axis (0, 1, or 2) represents time because now there is a dimension label 'time'. And we don't have to figure out what index along the time dimension represents January 2020 because there is a coordinate label for that. Notice that when we select a single time, we do not get a singleton dimension (time dimension dissapears), but we do retain the coordinate label in case we need it later- very convenient. 

We can provide more dimension labels and coordinate values to ```.sel()``` if we want to select data in multiple dimensions.

In [90]:
# select all longitudes at a single latitude and single time
tx.sel(time='2000-01-01',lat=30.3542)

In [93]:
# select a single data value given then time, lat, and lon
tx.sel(time='2000-01-01',lat=30.3542,lon=-88.6875)

If we give ```.sel()``` an inexact latitude and longitude and also provide the parameter ```method="nearest"``` we can get the nearest data point in space.

In [103]:
# get the nearest point to inexact lat and lon
tx.sel(time='2000-01-01',lat=30.3,lon=-88.6, method="nearest")

We can select with a list of labels as well. This is helpful, for example, if we want to select multiple times that aren't sequential.

In [94]:
# select multiple times that are not consecutive
tx.sel(time=['2000-01','2000-04','2000-07','2000-10'])

The list of labels we use for selecting can be in any order. The data will be returned in the same order as the label list.

In [96]:
# select multiple times that are not consecutive in whatever order
tx.sel(time=['2000-01','2000-10','2000-07','2000-04'])

Want to select all data for a single year? Easy! (Because our times are datetimes objects) 

In [97]:
# select one full year
tx.sel(time='2000')

### Slicing

We can also slice data using labels. Notice that just like we saw with Pandas DataFrames in the Pandas lesson, label-based slicing with Xarray is inclusive of the ending label.

Let's select all data for 6 months in time.

In [98]:
# a slice of 6 months
tx.sel(time=slice('2000-02','2000-07'))

Let's continue slicing... we can slice as many dimensions as we want.

In [99]:
# a slice of every dimension
tx.sel(time=slice('2000-02','2000-07'), lat=slice(30.3542,30.6042), lon=slice(-88.9375,-88.6875))

We can also combine label-based indexing and slicing together.

In [101]:
# one time, slice of lats and lons
tx.sel(time='2020-01',lat=slice(30.3542,30.6042), lon=slice(-88.9,-88.6))

Exact matches to the coordinate values are not required for slicing. With inexact slice values like below, the latitude slice we'll get will be 30.0 <= lat <= 30.5 and the longitude slice will be -89.0 <= lon <= -88.5

In [108]:
# slice with inexact latitude, longitude values 
tx.sel(time='2020-01',lat=slice(30.,30.5), lon=slice(-89.0,-88.5))

### When Order Does and Does Not Matter with Label-based Selection

An important thing to note is that the **order of the dimension names inside ```.sel()``` is irrelevant**. For example, our ```tx``` array dimensions are ordered (time, lat, lon) but as long as we are using labels we can make selections with the dimension names in any order. This feature, in particular, can help you avoid a lot of coding mistakes.

In [109]:
tx.sel(lon=slice(-89.0,-88.5),lat=slice(30.,30.5),time='2020-01')

However, **order is important inside of the ```slice()``` function**. The slice needs to be ordered from left to right according to the order of the coordinate values. For example, because our latitudes are ordered ascending (30.187532 to 34.9792) we could not do ```slice(33,32)```. This applies to slices based on any coordinate. We may not receive an error immediately, but we will see in the print out of the metadata that zero latitudes are selected and that the data array has no contents, which will eventually throw an error in your code.

In [113]:
tx.sel(lat=slice(33,32))

If the latitude coordinate in our data was descending instead of ascending (which you will find is the case with plenty of data), the values you put into ```slice()``` would still go from left to right, meaning the larger latitude would come first, followed by the smaller latitude. Although, it's probably a better idea just to reorder all your coordinates so that they are all ascending to reduce confusion.

Let's reorder latitude descending so we can see how this works. To reverse the order your data along a coordinate you can use Xarray's ```.reindex()``` function in combination with some syntax we learned way back in the Python Language Basics lesson ```[::-1]```.

In [121]:
# reverse order or latitudes
tx_reordered = tx.reindex(lat=tx.lat[::-1])
tx_reordered

Now, we can successfully make the same selection we tried a few notebook cells up, because now ```slice(33,32)``` matches the values of the latitude coordinate from left to right.

In [122]:
tx_reordered.sel(lat=slice(33,32))

For clarity, when we use ```.reindex()``` not only is the coordinate reordered, but the temperature data values are also reordered. This is because the each coordinate includes an underlying NumPy array and Pandas Index. This means that each value of a coordinate references a particular position in the ```tx``` data array. You can think of the coordinate values as being attached to specific data values such that when you reorder a coordinate, the data values get reordered as well. This is a very convenient feature of Xarray. 

Let's verify that the data was in fact reordered when we reindexed the latitude coordinate. We can check this by looking at the temperature data values of ```tx``` and ```tx_reordered``` at a single time, single longitude, and all latitudes 

In [131]:
tx.sel(time='2020-01',lon=-89.5, method="nearest")

In [132]:
tx_reordered.sel(time='2020-01',lon=-89.5, method="nearest")

## Integer-based Selection

There are multiple other ways we can select data from Xarray data structures. 

### .isel()

The first method we'll cover is ```.isel()```. This function allows us to use dimension labels with integer positions (as opposed to coordinate labels). For example, if we want to select the first time of a DataArray without having to know the label for that particular time we could do:

In [133]:
# select with dim name and integer position
tx.isel(time=0)

Slices with ```.isel()``` work the same way. Use the dimension name and integer positions. We can get fancy and use ```slice(start, exclusive stop, step)``` to select every other month of the first year of data. And just like basic indexing in NumPy, the ending index of an integer slice in Xarray is exclusive.

In [134]:
# slice of every other month of the first year
tx.isel(time=slice(0,12,2))

### Basic Indexing with Square Brackets like NumPy

The other way to select with integer positions is to use no labels at all. This is the same as NumPy basic indexing. For example, to select the first time, first lat, first lon:

In [135]:
tx[0,0,0]

Now, let's select a slice in time. Remember, this type of indexing, just like NumPy, is exclusive of the ending index.

In [136]:
tx[0:12,0,0]

Of course, just like we learned in the NumPy lesson, the order of the dimensions matters when using this indexing method. If we want to slice times, we need to know that the time dimension is axis 0.

## Combining Label-based and Position-based Selection

Xarray is very flexible when it comes to different ways to select data within the DataArray structure. We can even combine different selection methods by stringing them together. Let's look at how to select the first time by position and latitude/longitude by label.

In [137]:
# combining .isel() and .sel()
tx.isel(time=0).sel(lat=slice(30.,30.5),lon=slice(-89.0,-88.5))

In [139]:
# combining .sel() with square bracket integer indexing
tx.sel(lat=slice(30.,30.5),lon=slice(-89.0,-88.5))[0,...]

# IV. Elementwise Operations, Broadcasting, Comparison Operators, Logic Operators, and Logic Functions

A dimension of a Xarray DataArray is the same as an axis of a NumPy ndarray except that each dimension of a DataArray is labeled with a name. For example, a 3D NumPy array would have the dimensions axis 0, axis 1, and axis 2 where each axis would represent something like time, latitude, and longitude. The equivalent Xarray 3D DataArray would have dimensions with actual names "time", "latitude", and "longitude"   


Datasets contain one or more DataArrays which share one or more dimensions and coordinates. Each variable in a Dataset has its own attributes and the Dataset itself can have its own attributes as well (which come from the file attributes in each netcdf). 

Details of all xarray functions (including what parameters to include as function inputs and what each function returns) can be found in the **[xarray API reference](https://docs.xarray.dev/en/stable/api.html)**. Xarray has pretty great documentation with usage examples, definitely check the **[xarray getting started](https://docs.xarray.dev/en/stable/getting-started-guide/index.html)** and **[xarray user guide](https://docs.xarray.dev/en/stable/user-guide/index.html)** documentation for help as you are learning. If you are stuck on something, stack overflow and xarray's issue documentation on github is also useful. I personally often end up at those sites from google searches "python xarray how to ___". 




## Data Types



# Intro to Netcdf
especially files that are written using Climate and Forecast Metadata Conventions ([CF Metadata Conventions](https://cfconventions.org)). These metadata conventions are essentially a set of rules for how climate data should be described and written to data files in order to promote standardized data processing, eliminate ambiguities, and facilitate data sharing.




With xarray, when we print a variable, instead of getting the values of that variable what we get (usually) is a view of all the metadata labels that are attached to the variable. The information above shows us that our pr data is the daily total precipitation aggregated from midnight to midnight local time each day, has units of mm per day, and is called 'prcp' in the netcdf file.  

We can also see that the data has 3 dimensions (time, lat, lon), the length of each dimension, and that each dimension is a "coordinate", which are essentially additional labels. Click on the paper and data stack icons to the right of each coordinate. Using the paper icon, you can see that each coordinate has its own attributes (standard_name, units, etc.). Using the data stack icon, you can see that each coordinate is also an array of values, similar to an index in Pandas. The beauty of coordinates is that they allow us to easily select a subset of the data variable using labels that correspond to the coordinate values. 

Definitions for xarray terminology such as DataArray, Dataset, variable, dimension, coordinate, attribute can all be found in xarray's user guide on the **[xarray terminology page](https://docs.xarray.dev/en/stable/user-guide/terminology.html)**.

## Array Attributes

Each ndarray has a number of *attributes*. These may also be called array *properties*. We've already seen one of these above with ```.shape```. The others we will cover are ```.ndim```, ```.size```, ```.dtype```, ```.itemsize```, and ```.nbytes```. The full list of attributes can be found in

# V. Math Functions, Array Creation Functions

# VI. Useful Xarray Functions for Geosciences

# VII. Array Manipulation

# VIII. Converting Numpy and Pandas Data Structures to Xarray Data Structures

Your data doesn't need to be provided in netcdf format in order to use xarray data structures. You can hold any numerical data array in an Xarray DataArray or Dataset. If you have metadata (or labels) for the dimensions of your data, you can add that into the DataArray or Dataset object. Here we'll read in data that is provided in a csv file and create an Xarray DataArray.

Why would you want to do this?

# IX. Input/Output (I/O) with Xarray

# X. Exercise: Putting it All Together

# XI. At a Glance: Language Covered

The NumPy functionality that we covered at a glance...

## NumPy Functions

```np.all()```, ```np.any()```, ```np.arange()```, ```np.argsort()```, ```np.argwhere()```,  ```np.array()```, ```np.ceil()```, ```np.cos()```, ```np.concatenate()```, ```np.cumsum()```, ```np.diff()```, ```np.empty()```, ```np.expand_dims()```, ```np.flatten()```, ```np.float16()```, ```np.floor()```, ```np.full()```, ```np.genfromtxt()```, ```np.isfinite()```, ```np.isnan()```,
```np.linspace()```,  ```np.load()```, ```np.loadtxt()```, ```np.log()```, ```np.logical_and()```, ```np.logical_or()```, ```np.logical_not()```, ```np.max()```, ```np.mean()```, ```np.median()```,  ```np.min()```, ```np.nan_to_num()```, ```np.ones()```, ```np.percentile()```, ```np.ptp()```, ```np.quantile()```, ```np.radians()```, ```np.random_default_rng()```, ```np.reshape()```, ```np.round()```, ```np.save()```, ```np.savetxt()```, ```np.savez()```, ```np.sin()```, ```np.sort()```, ```np.squeeze()```, ```np.stack()```, ```np.std()```, ```np.sum()```, ```np.trunc()```, ```np.unique()```, ```np.var()```, ```np.where()```, ```np.zeros()```,


## NumPy data structure (ndarray) methods
```.astype()```, ```.sum()```

## NumPy data structure (ndarray) attributes

```.dtype```, ```.itemsize```, ```.nbytes```, ```.ndim```, ```.shape```, ```.size```  


## NumPy random number generator object (rng) methods
```.integers()```, ```.random()```, ```.uniform()``` 


## NumPy constants

```np.inf```, ```np.nan```, ```np.newaxis```

## Functions from other packages
```glob.glob()```, ```matplotlib.pyplot.imshow()```, ```matplotlib.pyplot.text()```, ```pandas.DataFrame()```, ```pandas.read_csv()```, ```pandas.dataframe.to_csv()```, ```pandas.dataframe.to_numpy()``` 

<div class="alert alert-success">

# XV. Learning More About NumPy

For more about NumPy, start on the [NumPy website](https://numpy.org/) where you can find:

- the getting started doc, user guide, and API reference documentation https://numpy.org/doc/stable/
- beginner and advanced tutorials, book suggestions, and videos links https://numpy.org/learn/

</div>