# Build a multidimentional DataArray and Dataset


We made up some data in the simple example. Also, did you notice it's just one dimensional? Let's go through the excercise by building a multidimentional dataset.

We are going to start with some data that is just a bunch of normal numpy arrays, so we need to load numpy as well as xarray

### credit 

This lesson is from  Abernathy's book: (https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html). 


# Example: ARGO float

![](operation_park_profile.jpg)


![](statusbig.png)

Let's start by loading some real data. This is ARGO float data that contains temperature and salinity data {What is an Argo float? how does it take data?}. Those data are in the form of numpy arrays, or matricies. So, again, rows and columns. Let's draw on the board what the rows and columns are. They have coordinates like time, depth, latitude, longitude. Stuff you would expect to describe data collected in the ocean. 

Right now, when we load the data, it's going to be a collection of numpy arrays. They are all seperate objects, and what we'd like to do is stitch them together in a sensible way. To this we are going to create a DataArray, then a Dataset. 

They are in this container because of how they are saved. Let's break each component out into it's own numpy array

Remember from the previous notebook. 

The `DataArray` has these key properties:

* `data`: N-dimensional array (NumPy or dask) holding the array’s values, i.e. your actual data,

* `dims`: dimension names for each axis, just the names, like 'latitude' or 'longitude' or 'time'

* `coords`: dictionary-like container of arrays that label each point, i.e. the actual values of each axis like time or latitue or something

* `attrs`: ordered dictionary holding metadata, or 'attributes',  like the data units, person who collected, any of that stuff



Let's take the salinity `S` and create a DataArray for it

Ok, this is like the 1D fake bird data we made before, but now it's real 2D salinity data from the ocean. 

Let's see what xarray does if we ask it to make a simple plot:

Nice! That is sort of amazing. Xarray knew that the salinity data is 2d - so by default it smartly made a pcolor plot (not a line plot or something). It also knew that time is on the x axis, and the 'levels' (depth) are on the y axis because the dimensions match. It also labeled out axis and formatted the dates.

But we aren't done with out DataArray yet. Remember the four parts of a DataArray? `data`, `dims`, `coords`, `attrs`. We can add other important information into the `attrs` part of the DataArray. Can you think of some important info?

# Datasets

xarray datasets can hold multiple DataArrays. This makes particular sense if the data in those multiple DataArrays share dimensions and coordinates. 

In our ARGO float example, both the Temperature and Salinity share the same dims and cords. So let's put them together into one dataset that holds all out float observational data. 

The Dataset constructor takes three arguments:

* `data_vars` should be a dictionary with each key as the name of the variable and each can be an already constructed DataArray, or a tuple that looks like this `(dims, data[, attrs])`

* `coords` should be a dictionary of the same form as data_vars.

* `attrs` should be a dictionary.

So here is an example for our argo data:


Let's talk through what all those parts are telling us when we print out `argo`. 

What about the latitude and longitude? Those seem important and we'd like to use them for plotting and analysis later. They should be coordinates right? They should be the same size as one of the existing coordinates, either level or date. what do you think?

to add a new coordinate we can use:

What we just did was add a whole new coordinate `lon`. But actually we know that each `lon` point is at a particular `date` location. So actually we can associate `lon` and `date`. To do that we set the dim of out new coord `lon` to be `date`. We can do the same for `lat`.

# Working with labeled data

We've built our nice dataset for the ARGO float. It has Temperature, salinity and pressure data. Those data also have label dimensions / coordinates that include level, date, lat, and lon. 

Now we are going to start to see some of the power of Xarray and how those labeled dimensions / coordinates let us make our analysis easier

## Selecting data (indexing)

Let's say we want to look at some subset of the temperature data (just a slice). We can use standard numpy notation to do this by indicating the number of the row and column we are interested in. Let's say we want to look at the second row and all columns ( so this is like a timeseries at a particular level)

using standard numpy indexing this would look like:

what about a particular depth profile? 

in standard numpy indexing:


That seems easy enough. But let's say you want to look at the temperature profile from a particular day. How are you going to do that? Well, you'd need to use the `date` dimension, look up the date you want, find it's index (meaning the number/position it comes in the list of dates) then put that index into the `argo.temperature[:,1].plot()` line. 

This isn't impossible. This is the kind of thing you do all the time in matlab. It's annoying and takes a few lines of code. 

But xarray solves this problem! using the `.sel()` method you can 'select' a part of your data based on the label. 

Here is how it works. let's get the profile on Oct 22 2012 by selecting based on the dimension `date`:

# Slicing data

We can also grab a bunch of days. Grabbing a bunch of consecutive data is typically called 'slicing'. We have to tell xarray that we want a slice of the `date` dimension. Again, this is new syntax, so don't be worried that you don't know it. You'll learn as you go from examples and from reading the documents for different packages.

let's get a couple months around our previous profile:

You can also use `.sel()` on the whole dataset to, for example, grab all your data from one day:

# Math

we can do any normal math on these DataArrays and Datasets:

you can combine DataArrays of the same size to get derived products like buoyancy:

we can do standard numpy math stuff like means, standard deviations, etc on dimensions.

We can average the whole dataset. xarray is smart, and it's going to average each of the data variabiles independantly:

There are a lot more cool math/analysis functions we can do with xarray. We will see more of them later on.

The end...

# Breakout / exercise 02