## LCC analysis

### Methods:

1. Load geometries, co2flux amp trends, and LCC data and preprocess as necessary.
2. Clip to ROI (region of interest). Use a small region for testing puposes.
3. Reduce land cover data to the resolution of the co2flux data, creating new images with bands representing the fraction of cover per pixel per land type
4. Calculate 

### Notes

Consider doing reduction in Earth Engine first, then downloading results at lower resolution.

#### Applying same crs to all data

Using the common WGS84: EPSG 4326. This should be set to all rasters used. If the crs was WGS84 but the property was not set, then use:  
`mydata.rio.write_crs("epsg:4326", inplace=True)`  
Else, change the crs with   
`mydata.rio.reproject("EPSG:4326")`  
When adding more datasets, these can be adjusted to the first using:  
`mydata2 = mydata2.rio.reproject_match(mydata)`  

#### Check if missing data value is set
`mydata.rio.nodata` or `mydata.rio.encoded_nodata` will show the fill value if it is set
`mydata.rio.set_nodata(-9999, inplace=True)` # will set the nadata attrribute without modifying the data
`mydata.rio.write_nodata(-9999, inplace=True)` # will write to the array (I guess replacing the existing missing data value?) Need to test.

Note that the reproject_match method from above will modify the nodata value of mydata2 to match that of mydata.  

Use the following to mask the missing data:  
```
nodata = raster.rio.nodata
raster = raster.where(raster != nodata)
raster.rio.write_nodata(nodata, encoded=True, inplace=True)
```

### Land Cover Data

ENF: 1
EBF: 2
DNF: 3  
DBF: 4

In [None]:
# Import packages
# See: https://www.earthdatascience.org/courses/use-data-open-source-python/hierarchical-data-formats-hdf/open-MODIS-hdf4-files-python/
import os
import warnings
import numpy.ma as ma
from shapely.geometry import mapping, box
import geopandas as gpd
import earthpy as et
import earthpy.spatial as es
import earthpy.plot as ep
from rasterio.crs import CRS
import rasterio

import xarray as xr
from osgeo import gdal
import rioxarray as rio
import pandas as pd
import numpy as np
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import folium

%matplotlib inline
warnings.simplefilter('ignore')

In [None]:
# Load the co2 data ----
file_co2amp_trend = '../data/co2invSeasAmpTrend.nc'
co2amp = rio.open_rasterio(file_co2amp_trend)
co2amp.rio.write_crs(4326, inplace=True)
co2amp.coords['spatial_ref']

### Creating new time series rasters for each land cover

**Steps**
- Create empty xarray dataset
- Create empty dictionary to hold np arrays
- Inside loop over LC types:
    - Create an empty np.array
    - Create loop over files
    - Inside loop over files:
        - Read the data for the LC type
        - Add data to np.array
    - Create an xarray array using np.array and lat, lon, time dims
    - Add array to dataset
- Add crs info to dataset

In [None]:

# Index for MCD12C1 Land_Cover_Type_1_Percent: IGBP land cover types
lcIndex = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
lcNames = ['Water', 'ENForest', 'EBForest', 'DNForest', 'DBForest', 
    'MixForest', 'ClosedShrub', 'OpenShrub', 'WoodySavanna',
    'Savanna', 'Grassland', 'PermWetland', 'Cropland',
    'Urban', 'CropNatMosiac', 'PermSnowIce', 'Barren']

path_in = '/Users/moyanofe/BigData/GeoSpatial/LandCover/LandCover_MODIS_MCD12/MCD12C1'
path_out = '/Users/moyanofe/BigData/GeoSpatial/LandCover/LandCover_MODIS_MCD12/MCD12C1_proc'
files_in = os.listdir(path_in)
files_in.sort() # Sort to order files by year (which is part of the file name)
# files_in = files_in[0:2] # shorten for testing only
years = np.arange(stop=len(files_in), step=1) + 2001

for i in range(len(lcIndex)): # [0]: #
    print(lcNames[i])
    # Take each LC from the array and create a new array for each by combining all yearly file
    for j in range(len(files_in)): # 
        print(j)
        file = files_in[j]
        print(file)
        path = os.path.join(path_in, file)
        ds_file = rio.open_rasterio(path, masked=True, variable='Land_Cover_Type_1_Percent').squeeze()
        ar = ds_file.Land_Cover_Type_1_Percent[i]
        # ar = ar.rio.reproject_match(co2amp, resampling=rasterio.enums.Resampling.average)
        ar = ar.expand_dims('time')
        # print(ar)
        if j == 0:
            stacked = ar
        else:
            stacked = xr.concat([stacked, ar], dim='time')
    stacked = stacked.rename(lcNames[i])
    stacked.attrs['long_name'] = lcNames[i]
    stacked.attrs['units'] = 'percent in integers'
    stacked['time'] = ('time', years)
    ds_modislc_igbp = xr.Dataset()
    ds_modislc_igbp[lcNames[i]] = stacked

    # Save each land caover time series to file
    file_out = 'MCD12C1.A2001-2021.061'+ lcNames[i] +'.nc'
    filepath_out = os.path.join(path_out, file_out)
    ds_modislc_igbp.to_netcdf(filepath_out)
