# Data Preprocessing

Modis data were downloaded using the [APPEARS](https://appeears.earthdatacloud.nasa.gov/) which allows for spatiotemporal subsetting of NASA remote sensing products. 
Data were selected to apply to CONUS and processed from native format to netcdf. 
The originial [Vegetation Indices Monthly-MOD13A3.061](https://lpdaac.usgs.gov/products/mod13c2v061/) Product is at 1km x 1km resolution and the file size is ~2GB. 
To allow for upload to GitHub and processing in class, I am downsampling data to 10x10 km$^2$. 

The data download requests are reproducible using the corresponding `JSON` files in the data folder. 

In [11]:
# Import the tools we are going to need today:
import matplotlib.pyplot as plt  # plotting library
import numpy as np  # numerical library
import xarray as xr  # netCDF library
import cartopy  # Map projections libary
import cartopy.crs as ccrs  # Projections list
#import rioxarray as rxr
import glob

In [12]:
files = glob.glob(r'../data/*latlon_*.nc')
files

['../data\\MOD13A3.061_1km_aid0001_latlon_2020.nc',
 '../data\\MOD13A3.061_1km_aid0001_latlon_2020_10km.nc',
 '../data\\MOD13A3.061_1km_aid0001_latlon_2021.nc',
 '../data\\MOD13A3.061_1km_aid0001_latlon_2021_10km.nc',
 '../data\\MOD13A3.061_1km_aid0001_latlon_2022.nc',
 '../data\\MOD13A3.061_1km_aid0001_latlon_2022_10km.nc',
 '../data\\MOD13A3.061_1km_aid0001_latlon_2023.nc',
 '../data\\MOD13A3.061_1km_aid0001_latlon_2023_10km.nc']

In [7]:
for file in files:
    ds = xr.open_dataset(file)
    # reduce resolution of the dataset for storage and speed reasons
    ds_10 = ds.coarsen(lat=10, boundary='trim').mean().coarsen(lon=10, boundary='trim').mean()
    ds_10.to_netcdf(path=file.split('.nc')[0]+ '_10km.nc')

In [None]:
files = glob.glob(r'../data/*latlon_2022.nc')
files
for file in files:
    ds = xr.open_dataset(file)
    # reduce resolution of the dataset for storage and speed reasons
    ds_10 = ds.coarsen(lat=10, boundary='trim').mean().coarsen(lon=10, boundary='trim').mean()
    ds_10.to_netcdf(path=file.split('.nc')[0]+ '_10km.nc')