# Chapter 8 - Example: Land Data
### Changes in vegetation index through the years for an area. NDVI index inidicates the percentage of vegetation for each grid point.

In this chapter we don't use data from the cloud, but exemplify how to obtain timeseries data from the data stored in temporally separeted files in the internet and analyze it. You'll see that it is not very different from previous chapters, except that there is not a centralized repository for data. In the future (hopefully soon), when data is in the cloud on a similar data format, accessing from these data would be similar to chapters 6 and 7.

## This script reads NDVI (vegetation index) files from a `thredds` server, compile the region and time selected, and then analyze the change in vegetation index through time.

In [None]:
import warnings
warnings.simplefilter('ignore') 

import pandas as pd
import numpy as np
import xarray as xr
xr.set_options(display_style="html")  # display dataset nicely
import os
import re # regular expressions
from datetime import date
from calendar import month_abbr
import urllib as ur # library to download files online  
import requests # library to read files online 
import matplotlib.pyplot as plt 
import hvplot.pandas
import hvplot.xarray


In [None]:
# Select a region 
lat1, lat2 = 16, 18 # two latitudes for a range: lat1<lat2, resolution is 0.05 degrees
lon1, lon2 = -98,-96 # longitude range. rarnge -180:180

# Select a day of year. 
# This script obtains data for the region for the selected day for each available year
mon = 7 # month
dy = 31 # day

***
## Obtaining the data

In [None]:
# Find what years are available for chosen day 
ayrs = [*range(1989,date.today().year)] # data available since 1989. Search until current year with the date.today function.
iurl = 'https://www.ncei.noaa.gov/thredds/fileServer/' # threeds server link where data is stored
flist = list() # initialization of file list 

# Looks for the file for each year
for i in ayrs: # looks for a list of files in each year folder
    r = requests.get('https://www.ncei.noaa.gov/thredds/catalog/cdr/ndvi/files/'+str(i)+'/catalog.html')   # request download the indicated file
    x = re.split("\'",r.text) # search for symbol ' and separate the data (which comes in a html format)
    for j in x: # for each file in the text, looks for the selected day 
        dtnm = '_'+str(i)+str(mon).zfill(2)+str(dy).zfill(2)+'_' # constructs the file name with year, month and day info
        if (dtnm in j) & ("catalog.html?" in j): # and if not repeated or a information line
            flist.append(j) # add the file name to our list

In [None]:
# Download each file from our list, and select region
tdt = list() # initialize a list for time
for iy, i in enumerate(flist): 
    iurl2 = iurl + re.split('=',i)[1] # construct the url for each fille
    print(iy, re.split('/',i)[-1]) # print the file name being downloaded
    tmp = ur.request.urlretrieve(iurl2, './data/tmp.nc') # this is a different request (no s at the end), download a file that has a specific (non-text) format, saves it in a file call tmp.nc
    nvi = xr.open_dataset('./data/tmp.nc') # now we can use it as regular local file
    nvi.close()
    tdt.append(nvi.time.data[0]) # add the date to our list
    
    tmp1 = nvi.NDVI[0,:,:].sel(latitude=slice(lat2,lat1),longitude=slice(lon1,lon2)) # select region from the file data
    if iy==0: # if the first year, create an array to store the data
        nvi_dt = np.full((len(flist),tmp1.shape[0],tmp1.shape[1]), np.nan)
    nvi_dt[iy,:,:] = tmp1.data # store selected data
    os.remove('./data/tmp.nc') # delete file, so we can keep downloading using the name

In [None]:
# Build an xarray Data Array structure to make the analysis easier
ndvi = xr.DataArray(data=nvi_dt, dims=['time','latitude','longitude'], 
                    coords=dict(time=tdt,latitude=tmp1.latitude.values, longitude=tmp1.longitude.values),attrs=tmp1.attrs) 
ndvi # taking a peek

***
## The analysis

First, calculate a climatological field of the vegetation index by averaging all years.

In [None]:
mean_ndvi = ndvi.mean(axis=0) # average over the first axis (0), which is time. 
mean_ndvi.plot(cmap='YlGn') # change the color map to yellow to green for this particular data set
plt.title('NDVI Climatology for '+month_abbr[mon]+' '+str(dy).zfill(2))
plt.show()

### Now we calculate the spatial anomaly for a given year by taking the difference between the climatology and a given year

In [None]:
syr = 2018 # select a year
# first we constuct the time step index string with the selected year, month and day
(ndvi.sel(time=str(2018)+'-'+str(mon).zfill(2)+'-'+str(dy).zfill(2))-mean_ndvi).plot() 
plt.show()

***
## For a time series analysis, we average each year's field over latitute and longitude

But we add a condition: we select only locations where the vegetation index is equal or larger than 0.3 - this means we are looking at the change in vegetation for locations that at the begining of the timeseries have at least 30% vegetation coverage.

In [None]:
veg_mean = list() # list to store vegetation index %
mask0 = ndvi[0,:,:].where(ndvi[0,:,:]>=0.3) # create a mask for veg. index >= 30% in the first time step. other locations set to NaN
veg_area = mask0.count() # count the number of grid points above when the mask is applied - need it if you want to calculate area later
for i in range(len(ayrs)): 
    tmp=ndvi[i,:,:]*mask0 # apply the mask for each year
    veg_mean.append(tmp.mean())

plt.bar(ayrs,veg_mean-np.nanmean(veg_mean))
plt.title('Vegetation Index Change for '+month_abbr[mon]+' '+str(dy).zfill(2))
plt.ylabel('NDVI')
plt.grid(True, alpha=0.3)
plt.show()

***
# Resources


### Data and data sources: 
- [NDVI Normalized Difference Vegetation Index (Climate Data Record)](https://www.ncei.noaa.gov/products/climate-data-records/normalized-difference-vegetation-index) data. 
- [NDVI data list](https://www.ncei.noaa.gov/thredds/catalog/cdr/ndvi/catalog.html) 

### Other locations for MODIS and NDVI data
- [AWS](https://registry.opendata.aws/modis-astraea/)
- [AWS NASA NEX](https://registry.opendata.aws/nasanex/)
- [Earth Engine](https://developers.google.com/earth-engine/datasets/catalog/NOAA_CDR_AVHRR_NDVI_V5#description)
- [USGS](https://lpdaac.usgs.gov/products/mod13q1v006/)

### Other data in `thredds`
- [NCEI thredds](https://www.ncei.noaa.gov/thredds/catalog.html) NOAA National Centers for Environmental Information thredds catalog.
- [How to access data file ini thredds](https://www.unidata.ucar.edu/software/tds/current/tutorial/CatalogPrimer.html)

### More on the libraries:
- [A short article on how to download files from url in Python](https://betterprogramming.pub/3-simple-ways-to-download-files-with-python-569cb91acae6)
- [urllib/request](https://docs.python.org/3/library/urllib.request.html?highlight=retrieve) library 
- Regular expressions [re](https://docs.python.org/3/howto/regex.html). Useful method to manipulate strings. See this [tutorial](https://www.tutorialspoint.com/python/python_reg_expressions.htm) for a more friendly approach.
