# Lesson 4, Raster data

## Content
- 

## Context

Today we are getting to our core raster library - `xarray`.  `xarray` is a great library for working with raster datasets. It has many built in analysis methods and nice visualization defaults. It was built by the scientific community. `xarray` is built on top of `numpy`, so all of our work this week will support us as we dive into `xarray`. We are going to look at opening data, inspecting data and ...

`xarray` takes the `numpy` arrays we were working with yesterday and makes them easier to work with by adding labels to the axis. This is a small change but it has a huge effect on the ease of working with data.

Other xarray tutorials:
* [UW Geohackweek](https://geohackweek.github.io/nDarrays/)
* [Oceanhackweek/Scipy 2020](https://xarray-contrib.github.io/xarray-tutorial/)

# Gridded data and `xarray`

~Almost everyone using satellite images will be using raster data. **Raster data** is continuous, gridded data and in earth science a raster often represents an area in space.~

## `xarray` Data Structures

In [2]:
import xarray as xr

### 1 - `DataArray`

In [20]:
import numpy as np

In [35]:
sst_values = np.random.randint(0, high=75, size=(5, 6))

In [36]:
# Data values
sst_values

array([[51, 54, 20, 30,  3, 67],
       [50, 41, 41, 19, 70, 24],
       [ 5, 42, 49, 66, 34, 62],
       [27, 20, 54, 28, 12, 69],
       [42, 38, 28, 29, 71, 39]])

In [37]:
# Coordinate values
lats = [36, 37, 38, 39, 40]
lons = [-22, -21, -20, -19, -18, -17]

In [48]:
sst = xr.DataArray(sst_values, dims=['latitude', 'longitude'], coords=[lats, lons],)
sst

`dims` specifies the names of the coordinate axes, while `coords` specifies the actual values at those points.

In [42]:
xr.DataArray(sst_values, coords=[lats, lons])

In [47]:
xr.DataArray(sst_values, dims=['latitude', 'longitude'])

3D example

In [56]:
sst_3d_values = np.random.randint(0, high=75, size=(2, 5, 6))

In [59]:
depth = [500, 1000]

In [60]:
sst_3d = xr.DataArray(sst_3d_values, 
                      dims=['depth', 'latitude', 'longitude'], 
                      coords=[depth, lats, lons]
                     )
sst_3d

# Indexing and Selecting Values

`.sel`, `.isel`

### 2 - `Dataset`

# Real Data

The small dataset we made manually in the first part of this notebook is quite useful for learning. Usually, though, you won't be making your own data, you'll be opening other datasets. Let's try an example of that using a local AVIRIS data file. 

In [3]:
filepath = '../data/subset_f180628t01p00r02_corr_v1k1_img'

In [8]:
envi = xr.open_rasterio(filepath)

  envi = xr.open_rasterio(filepath)


In [19]:
envi

## Filepaths
The other part of that data loading statement to take note is the `'./data/englewood_3_12_21_usgs_water.tsv'` part.  This is called the **filepath** and it is a string that describes the location of the data that you want to open.  A few pieces of the anatomy of a filepath to notice:
* `/` - forward slashes signal that you have entered a new folder.
* `.tsv` - this is the file extension, which tells us what type of file format the data is stored in an informs us how we open it
* `.` - the period at the beginning tells the computer to start looking for data in the same place tht the code is being run in.  

Choosing to start your filepath with a `.` is called specificying a **relative filepath**, because you are telling the computer to start looking for the file relative to where the file is being run. If you move this file to another place on your computer and don't move the data with it the import statment won't work anymore.  The alternative to a relative filepath is an **aboslute filepath**, in which case you start your file path at the very tippy top of your computer's organizational structure (the root directory).

Other vocab notes:
* **directory** is the same thing as a folder.

To loop back to our example, we put together our filepath by defining the following directions for our computer:
1. start by specifing the current directory as the starting point: `.`
2. go into the data folder: `./data`
3. choose the file named englewood_3_12_21_usgs_water.tsv: `'./data/englewood_3_12_21_usgs_water.tsv'`

🎉 And there we have our file

<div class="alert alert-success">
    <i>File Format</i></br>
    <strong>ENVI</strong>: - Data file has no file extension, should be accompanied by a <code>.hdr</code> file with the metadata</br>
    <i>Data Access</i></br>
    Data was accessed from a <strong>local file</strong>.
</div>

### 📝 Checking In

Consider an array of the following form called `example_array`:

|   |   |   |  |  |
|---|---|---|---|---|
| 1  | 2  |3   | 4 | 5 |
|  6 |  7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 |

1. What is the value at index [0, 2]?
1. What is the result of `example_array[4]`?
2. What is the result of `example_array[1:3, 2]`?
1. Give the index to return the value 15.