# Reading with `xarray`

This notebook demonstrates some minimal raster/spatial operations with AVIRIS-NG files using the Python interface to GDAL.

<h2 id="tocheading">Table of Contents</h2>       
<br>
<div id="toc"></div>     

*This next cell calls a script to generate a TOC. It will display above when this notebook is opened in the Jupyter environment. Ignore.*

In [1]:
%%javascript
$.getScript('scripts/tocgen.js')

<IPython.core.display.Javascript object>

## Workflow

### Imports and example file
Import requirements. Minimal packages:

In [2]:
import numpy as np
import xarray as xr

#### Unzip and open tarfile
AVIRIS-NG data files are distributed in a zipped tarfile. See the document `<link>` for more details. 

You can unzip the example file with the `tarfile` module like this:
```python
import tarfile

with tarfile.open("data/ang20180814t224053rfl.tar.gz", "r:gz") as tar:
    tar.extractall()
```

See what's inside:

In [3]:
import glob
glob.glob("data/ang20180814t224053_rfl_v2r2/*")

['data/ang20180814t224053_rfl_v2r2\\ang20180814t224053_corr_v2r2_img',
 'data/ang20180814t224053_rfl_v2r2\\ang20180814t224053_corr_v2r2_img.hdr',
 'data/ang20180814t224053_rfl_v2r2\\ang20180814t224053_h2o_v2r2_img',
 'data/ang20180814t224053_rfl_v2r2\\ang20180814t224053_h2o_v2r2_img.hdr',
 'data/ang20180814t224053_rfl_v2r2\\ang20180814t224053_README_v2r2.txt']

### Reading an image

`xarray` can read an ENVI raster image through rasterio. This will be a far more convenient path to netCDF data structure for the average user. 

Open the example reflectance file:

In [4]:
img = 'data/ang20180814t224053_rfl_v2r2/ang20180814t224053_corr_v2r2_img'
hdr = 'data/ang20180814t224053_rfl_v2r2/ang20180814t224053_corr_v2r2_img.hdr'

ds = xr.open_rasterio(img)
ds

<xarray.DataArray (band: 425, y: 4207, x: 637)>
[1138940075 values with dtype=float32]
Coordinates:
  * band        (band) int32 1 2 3 4 5 6 7 8 ... 418 419 420 421 422 423 424 425
    wavelength  (band) float64 376.9 381.9 386.9 ... 2.496e+03 2.501e+03
Dimensions without coordinates: y, x
Attributes:
    transform:                (4.177675425873858, 2.9252398253903347, 447779....
    crs:                      +init=epsg:32603
    res:                      (5.1, 5.1)
    is_tiled:                 0
    nodatavals:               (-9999.0, -9999.0, -9999.0, -9999.0, -9999.0, -...
    bad_pixel_map:            /home/winstono/isat-dev/ang/cal/data/ANGv5_bad
    bands:                    425
    bbl:                       0.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 ...
    byte_order:               0
    correction_factors:        0.942615 , 1.00196 , 1.021672 , 1.042766 , 1.0...
    crosstrack_scatter_file:  /home/winstono/isat-dev/ang/cal/data/20170125_v...
    data_ignore_value:       

### Raster image shape
GDAL makes accessing the shape of the image pretty easy:

In [5]:
bands = ds.band.size # band count
cols = ds.x.size     # col count
rows = ds.y.size     # row count

print("bands:\t"+str(bands)) 
print("cols:\t"+str(cols))
print("rows:\t"+str(rows))

bands:	425
cols:	637
rows:	4207


### Geotransform
The geotransform is a tuple of parameters (6 floats) that defines the transformation from each pixel's x,y position in the image to its projected position in the reference coordinate system (affine transformation). 

**This website gives a clear, thorough explanation of affine transforms and their use in GIS:**         
http://www.quantdec.com/GIS/affine.htm

**More info on geographic transformation and GDAL's raster data model:**       
https://www.gdal.org/gdal_datamodel.html

Get the tuple (a little bit different from the one returned by GDAL with `ds.GetGeoTransform()`) from the `transform` attribute:

In [6]:
ds.attrs["transform"]

(4.177675425873858,
 2.9252398253903347,
 447779.369091,
 2.9252398253903347,
 -4.177675425873858,
 7185907.49943)

### Coordinate arrays

To make a CF compliant netCDF we need:
* 1-dimensional arrays of x and y coordinates (2)
* 2-dimensional arrays of lon and lat coordinates (2)

First generate the arrays of x and y coordinates. Unpack the geotransform into its component parts and make 1-d arrays with their origins at the top left corner of the raster like:
```
Coordinates calculated over interval equal to their resolution:
    
    xpos(i):  x_origin_in_meters + i*x_resolution
    ypos(i):  y_origin_in_meters + i*y_resolution

x_coordinate_array = xpos(sequence 0 to number_of_columns-1)
y_coordinate_array = ypos(sequence 0 to number_of_rows-1)

```

In [7]:
# get the raster geotransform as its component parts
xres, xrot, xmin, yrot, yres, ymax = ds.attrs["transform"]

# generate coordinate arrays
xarr = np.array([xmin+i*xres for i in range(0,cols)])
yarr = np.array([ymax+i*yres for i in range(0,rows)])

print("x[0]: \t"+str(xarr[0]))
print("y[0]:\t"+str(yarr[0]))

x[0]: 	447779.369091
y[0]:	7185907.49943


### Get 2d arrays of latitudes and longitudes


`pyproj` is the Python interface to libproj. Use pyproj to transform the first pixel's `utm x,y -->> lon,lat`:

In [8]:
from pyproj import Proj, transform

inproj = Proj(                  # add code to get proj4
    "+proj=utm +zone=3 +datum=WGS84 +units=m +no_defs") 
outproj = Proj(init="epsg:4326")
lon, lat = transform(inproj, outproj, xarr[0], yarr[0])

lon, lat

(-166.0989579203195, 64.79362088575823)

Permute the x and y arrays with `np.meshgrid`:

In [9]:
xarr2d, yarr2d = np.meshgrid(xarr, yarr)

print("Each array now has this shape:\t"+str(xarr2d.shape))

Each array now has this shape:	(4207, 637)


Flatten both arrays and pass to the `pyproj.transform` function:

In [10]:
lonarr, latarr = transform(
    inproj,               # input raster srs
    outproj,              # output raster srs
    xarr2d.flatten(),     # flat 2d array of x coordinates
    yarr2d.flatten())     # flat 2d array of y coordinates

print("lon[0]:\t"+str(lonarr[0]))
print("lat[0]:\t"+str(latarr[0]))

lon[0]:	-166.0989579203195
lat[0]:	64.79362088575823


Return the flat arrays to the shape of the raster:

In [11]:
lonarr2d = lonarr.reshape(xarr2d.shape)
latarr2d = latarr.reshape(yarr2d.shape)

lonarr2d.shape

(4207, 637)

### Add coordinate arrays to the `xarray.Dataset`

In [12]:
ds["y"] = xr.DataArray(
    data=yarr, 
    dims=("y"), 
    name="y",
    attrs=dict(
        units="m",
        standard_name="projection_y_coordinate",
        long_name="y coordinate of projection"))

ds["x"] = xr.DataArray(
    data=xarr, 
    dims=("x"), 
    name="x",
    attrs=dict(
        units="m",
        standard_name="projection_x_coordinate",
        long_name="x coordinate of projection"))

ds["lon"] = xr.DataArray(
    data=lonarr2d, 
    dims=("y", "x"), 
    name="lon",
    attrs=dict(
        units="degrees_east",
        standard_name="longitude",
        long_name="longitude coordinate"))

ds["lat"] = xr.DataArray(
    data=latarr2d, 
    dims=("y", "x"), 
    name="lat",
    attrs=dict(
        units="degrees_north",
        standard_name="latitude",
        long_name="latitude coordinate"))

ds.name = "reflectance"
global_atts = ds.attrs
global_atts["Conventions"] = "CF-1.6"
ds.attrs = dict(
    units="unitless",
    _FillValue=-9999.
    #coordinates="lon lat",
    grid_mapping="crs",
    standard_name="reflectance",
    long_name="atmospherically corrected surface reflectance")


ds = ds.to_dataset()
ds.attrs.update(global_atts)

### Encoding

http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html#xarray.Dataset.to_netcdf


In [13]:
ds.to_netcdf(
    path="output/xarraytest.nc",
    encoding={
        "reflectance": {
            #"dtype": "int16",
            #"scale_factor": 0.1,
            "zlib": True,
            "complevel": 4}})