# 3.4 Stacking and interpolating data

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"></ul></div>

[[up to 3.0](Chapter3_1_GDAL.ipynb)]


## 3.4.1 Introduction

In this section, we will:

* develop code to produce a stacked dataset of spatio-temporal data on a grid
* interpolate over any missing data
* smooth the dataset

### 3.4.1.1 Test your login

Let's first test your NASA login:

In [None]:
import geog0111.nasa_requests as nasa_requests
from geog0111.cylog import cylog
%matplotlib inline

url = 'https://e4ftl01.cr.usgs.gov/MOTA/MCD15A3H.006/2018.09.30/' 
        
# grab the HTML information
try:
    html = nasa_requests.get(url).text
    # test a few lines of the html
    if html[:20] == '<!DOCTYPE HTML PUBLI':
        print('this seems to be ok ... ')
        print('use cylog().login() anywhere you need to specify the tuple (username,password)')
except:
    print('login error ... try entering your username password again')
    print('then re-run this cell until it works')
    cylog(init=True)

### 3.4.1.2 Get the datasets for today

Now let's get the datasets we need for today.

**You should run this section before the class starts to save time.**

You are given the relevant filenamnes (for 2016 and 2017) in the files [`data/lai_filelist_{year}.dat.txt`](data/lai_filelist_2016.dat.txt). The datasets have been pre-downloaded for this exercise, but you need to copy then to the local filespace:

In [None]:
from geog0111.geog_data import *

destination_folder = 'data'

# we have the filenames provided 
# in data/lai_filelist_2016.dat.txt
for year in [2016,2017]:
    control_file = f'data/lai_filelist_{year}.dat.txt'
    # read the file in
    filenames = open(control_file).read().split()

    # get the local files
    # set verbose=True if you want to see what is happening
    # in procure_dataset()
    done = [procure_dataset(f,\
                verbose=False,\
                destination_folder=destination_folder) 
                                    for f in filenames]
    # done should be all True if this has worked
    # its length should be the number of files copied

    # get the dataset names
    gdal_fnames = [f'HDF4_EOS:EOS_GRID:"{destination_folder}/{file_name:s}":MOD_Grid_MCD15A3H:Lai_500m'
                   for file_name in filenames]

    # print the first 8 juist to see it looks ok
    for f in gdal_fnames[:8]:
        print (f)


## 3.4.1 A time series

You should now know how to access and download datasets from the NASA servers and have developed functions to do this.

You should also know how to select a dataset from a set of hdf files, and mosaic, mask and crop the data to correspond to some vector boundary. This is a very common task in geospatial processing.

We now consider the case where we want to analyse a time series of data. We will use LAI over time to exemplify this.



In [None]:


#def process_data(doy_year,tiles,vector_file, vector_where):
    
    
def mosaic_and_mask_data(gdal_fnames, vector_file, vector_where):
    stitch_vrt = gdal.BuildVRT("", gdal_fnames)
    g = gdal.Warp("", stitch_vrt,
                 format = 'MEM', dstNodata=200,
                  cutlineDSName = vector_file,
                  cutlineWhere = vector_where)
    return g


# this part is to access a particular dataset in the file
gdal_fnames = [f'HDF4_EOS:EOS_GRID:"{file_name:s}":MOD_Grid_MCD15A3H:Lai_500m'
               for file_name in filenames]

g = mosaic_and_mask_data(gdal_fnames, "data/TM_WORLD_BORDERS-0.3.shp",
                         "FIPS='GM'")

lai = np.array(g.ReadAsArray()).astype(float) * 0.1 # for LAI scaling
# valid data mask
mask = np.nonzero(lai < 20)
min_y = mask[0].min()
max_y = mask[0].max() + 1

min_x = mask[1].min()
max_x = mask[1].max() + 1

lai = lai[min_y:max_y,
               min_x:max_x]

fig = plt.figure(figsize=(12,12))
im = plt.imshow(lai, interpolation="nearest", vmin=0, vmax=6,
             cmap=plt.cm.inferno_r)
plt.title('LAI'+' '+str(tiles)+' '+str((doy,year)))
plt.colorbar()