## Landsat data loading

There is a general file storage pattern of encoding meaningful information in filenames or directory names. Most recently we have encountered this when working with landsat data. These data are stored in tif files where each file represents one band. The number of the band is meaningful because it specifies the spectrum that is covered.


In [1]:
import intake
cat = intake.open_catalog('catalog.yml')
list(cat)

['l5', 'l5_proposed']

In this catalog 'l5' is setup as a schema on current master and 'l5_proposed' lays out a new way of defining a schema.

## Master

Note that the band coordinates is just a list of 1s. From the files we can see that what we actually want is [1, 2, 3, 4, 5, 7] but since the data are already concatenated, we have no way of knowing which band is which.

In [2]:
cat.l5.read_chunked()

<xarray.DataArray (band: 6, y: 7241, x: 7961)>
dask.array<shape=(6, 7241, 7961), dtype=int16, chunksize=(1, 256, 256)>
Coordinates:
  * y        (y) float64 4.414e+06 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06
  * x        (x) float64 2.424e+05 2.424e+05 2.425e+05 ... 4.812e+05 4.812e+05
  * band     (band) int64 1 1 1 1 1 1
Attributes:
    transform:   (30.0, 0.0, 242385.0, 0.0, -30.0, 4414215.0)
    crs:         +init=epsg:32611
    res:         (30.0, 30.0)
    is_tiled:    0
    nodatavals:  (-9999.0,)

### Some setup 
I don't quite have this working with cache yet, so to make this example work: copy the cached files to a new dir in the example dir called data.

In [3]:
ls ~/.intake/cache/0088b75722009b0a583f65974c60bd87/

LT05_L1TP_042033_19881022_20161001_01_T1_sr_band1.tif
LT05_L1TP_042033_19881022_20161001_01_T1_sr_band2.tif
LT05_L1TP_042033_19881022_20161001_01_T1_sr_band3.tif
LT05_L1TP_042033_19881022_20161001_01_T1_sr_band4.tif
LT05_L1TP_042033_19881022_20161001_01_T1_sr_band5.tif
LT05_L1TP_042033_19881022_20161001_01_T1_sr_band7.tif


In [4]:
!mkdir data

In [5]:
!cp ~/.intake/cache/0088b75722009b0a583f65974c60bd87/* ./data

## Proposal
In this implementation an arbitrary number of fields can be specified using python format notation. These fields get added to the xarray object as coordinates with the same dimension that we are concating on. Coordinates are just sets of labels along a particular dimension so there can be many coordinates along the same dimension. By making the file fields coordinates rather than attributes, we allow for each file to have a different value. 

In [6]:
cat.l5_proposed.read_chunked()

<xarray.DataArray (band: 6, y: 7241, x: 7961)>
dask.array<shape=(6, 7241, 7961), dtype=int16, chunksize=(1, 256, 256)>
Coordinates:
  * y           (y) float64 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06
  * x           (x) float64 2.424e+05 2.424e+05 ... 4.812e+05 4.812e+05
  * band        (band) object '1' '2' '3' '4' '5' '7'
    end_date    (band) <U8 '20161001' '20161001' ... '20161001' '20161001'
    start_date  (band) <U8 '19881022' '19881022' ... '19881022' '19881022'
Attributes:
    transform:   (30.0, 0.0, 242385.0, 0.0, -30.0, 4414215.0)
    crs:         +init=epsg:32611
    res:         (30.0, 30.0)
    is_tiled:    0
    nodatavals:  (-9999.0,)