# Extract MODIS Site Data and Generate Samples
## MODIS Site Data
For each site in the LFMC sample data, extract the full time series of MODIS reflectance and snow-cover data, gap-fill and save to CSV files. Note: if the output csv files already exist they are assumed to be correct and are not over-written.
## MODIS Sample Data
For each sample, extract the timeseries MODIS reflectance data. The timeseries length is determined by the MODIS_TS_LENGTH value. The sample is rejected if the full timeseries cannot be extracted (start/end outside the full site time series) or the snow-cover data shows the pixel was snow-covered on the sampling date. The extracted MODIS data is combined to a single dataframe and saved. Another LFMC sample dataset containing only the valid samples is also created.
## Notes
1. This notebook should be run after running the `Extract Auxiliary Data.ipynb` notebook.
2. It will take about 8 hours to run if there are no existing site extracts.
3. 806 samples will not be extracted due to snow cover.
4. There should be no invalid sites or pixels. Occasionally extraction from GEE will fail for a site. If this happens re-run the notebook (keep the existing site CSV files so they are not re-extracted).

In [1]:
import os
import numpy as np
import pandas as pd
import time
from datetime import datetime
from datetime import timedelta

from timeseries_extractor import GeeTimeseriesExtractor
from data_extract_utils import get_sample_data, sort_key

Program parameters and constants

In [2]:
# Directories
DATA_DIR = r"G:\My Drive\LFMC from MODIS"
SAMPLE_DIR = os.path.join(DATA_DIR, "Globe-LFMC")
MODIS_DIR = os.path.join(DATA_DIR, "MCD43A4")
SNOW_DIR = os.path.join(DATA_DIR, "MOD10A1")
FINAL_DIR = os.path.join(DATA_DIR, "TrainingData")

# File Names
LFMC_SITES = os.path.join(SAMPLE_DIR, "LFMC_sites_dem.csv")
SAMPLES_INPUT = os.path.join(SAMPLE_DIR, "LFMC_samples_dem.csv")
MODIS_OUTPUT = os.path.join(FINAL_DIR, "modis.csv")
SAMPLES_OUTPUT = os.path.join(FINAL_DIR, "sample.csv")

# Other constants
MODIS_TS_LENGTH = 365
MODIS_TS_OFFSET = 1
MODIS_TS_FREQ = 1

GEE Parameters - current settings are to extract the data needed to reproduce 1st paper
- Reflectance product is MCD43A4 - daily reflectance using 8-day composites
- Snow cover product is MOD10A1 - daily snow cover as MOD10A2 is not in GEE.
- Scale set to use native MODIS resolution

In [3]:
START_DATE = "2001-01-01"
END_DATE = "2019-01-01"  # Final day retrieved will be 2018-12-31

PRODUCT = "MODIS/006/MCD43A4"
BANDS = ["Nadir_Reflectance_Band1",
         "Nadir_Reflectance_Band2",
         "Nadir_Reflectance_Band3",
         "Nadir_Reflectance_Band4",
         "Nadir_Reflectance_Band5",
         "Nadir_Reflectance_Band6",
         "Nadir_Reflectance_Band7"]
SNOW_PRODUCT = "MODIS/006/MOD10A1"
SNOW_BANDS = ["NDSI_Snow_Cover"]

Connect to GEE

In [4]:
import ee
ee.Initialize()

Check if a sample was collected in snow conditions

In [5]:
def is_snow_sample(date_str, bands_df, snow_df):
    sample_date = datetime.strptime(date_str, '%Y-%m-%d')
    return snow_df[SNOW_BANDS[0]][sample_date] >= 10

Create output directories if necessary

In [6]:
if not os.path.exists(MODIS_DIR):
    os.makedirs(MODIS_DIR)
if not os.path.exists(SNOW_DIR):
    os.makedirs(SNOW_DIR)
if not os.path.exists(FINAL_DIR):
    os.makedirs(FINAL_DIR)

Generate the MODIS sample data:

For each site
- Get the reflectance and snow data
- Then get the sample data for each sample at the site

In [7]:
sites = pd.read_csv(LFMC_SITES, float_precision="high")
samples = pd.read_csv(SAMPLES_INPUT, float_precision="high")
modis_extractor = GeeTimeseriesExtractor(PRODUCT, BANDS, START_DATE, END_DATE, dir_name=MODIS_DIR)
snow_extractor = GeeTimeseriesExtractor(SNOW_PRODUCT, SNOW_BANDS, START_DATE, END_DATE, dir_name=SNOW_DIR)
modis_data = []
valid_data = [False] * samples.shape[0]
invalid_pixels = []
snow_pixels = []
invalid_sites = []
for site_idx, site in sites.iterrows():
    print(f'Processing site {site.Site}')
    site_samples = samples[samples.Site == site.Site]
    try:
        modis_df = modis_extractor.get_and_save_data(site)
        snow_df = snow_extractor.get_and_save_data(site)
    except:
        print(f'Failed to extract data for {site.Site}')
        invalid_sites.append(site.Site)
        continue
    for index, sample in site_samples.iterrows():
        if is_snow_sample(sample["Sampling date"], modis_df, snow_df):
            print(f'Snow pixel: {sample["Sampling date"]}')
            snow_pixels.append(index)
        else:
            sample_data = get_sample_data(sample["Sampling date"], modis_df, MODIS_TS_OFFSET, MODIS_TS_LENGTH, MODIS_TS_FREQ)
            if sample_data is None or np.isnan(sample_data.sum()):
                invalid_pixels.append(index)
            else:
                modis_data.append([sample.ID] + list(sample_data))
                valid_data[index] = True

Processing site C4_1
Processing site C4_2
Processing site C4_3
Processing site C4_4
Processing site C4_5
Processing site C6_1
Snow pixel: 2012-08-15
Processing site C6_2
Processing site C6_3
Processing site C6_4
Processing site C6_5
Processing site C6_6
Processing site C6_7
Processing site C6_8
Processing site C6_9
Processing site C6_10
Processing site C6_11
Processing site C6_12
Processing site C6_13
Processing site C6_14
Processing site C6_16
Processing site C6_17
Processing site C6_18
Processing site C6_19
Processing site C6_20
Processing site C6_21
Processing site C6_22
Snow pixel: 2006-10-26
Snow pixel: 2008-02-29
Snow pixel: 2010-05-01
Processing site C6_23
Snow pixel: 2004-11-29
Processing site C6_24
Snow pixel: 2004-12-09
Snow pixel: 2005-01-10
Snow pixel: 2005-09-01
Snow pixel: 2006-01-27
Snow pixel: 2006-03-09
Snow pixel: 2006-12-01
Snow pixel: 2007-01-15
Snow pixel: 2007-01-18
Snow pixel: 2007-01-30
Snow pixel: 2007-02-15
Processing site C6_25
Snow pixel: 2004-12-09
Snow pix

Summary of sites/samples not extracted

In [8]:
print(f'Invalid sites: {len(invalid_sites)}; Invalid pixels: {len(invalid_pixels)}; Snow pixels: {len(snow_pixels)}')
print(invalid_sites)
print(invalid_pixels)
print(snow_pixels)

Invalid sites: 0; Invalid pixels: 0; Snow pixels: 806
[]
[]
[6181, 8612, 8634, 8657, 9656, 10805, 10807, 10824, 10827, 10831, 10848, 10850, 10851, 10853, 10854, 11546, 11548, 11564, 11567, 11570, 11588, 11590, 11591, 11592, 12827, 14232, 17520, 17544, 17888, 17922, 27495, 27565, 28368, 33330, 34382, 34400, 34401, 34402, 34403, 34404, 34406, 34426, 34442, 34443, 34462, 34464, 34465, 34466, 34467, 34497, 34498, 34499, 34519, 34520, 34521, 34522, 34524, 34543, 34545, 34566, 34567, 34568, 34569, 34588, 34599, 34600, 37588, 37599, 37605, 37607, 37608, 37614, 37642, 38695, 39871, 39992, 41845, 43284, 43887, 45804, 46450, 48441, 48563, 53285, 53300, 53307, 60183, 60184, 60205, 60207, 60210, 60212, 63010, 63062, 64662, 64685, 64702, 64722, 64723, 64724, 64725, 64756, 64777, 64778, 64779, 64798, 64820, 64840, 64852, 66003, 66283, 111, 120, 144, 898, 899, 916, 919, 939, 992, 993, 994, 995, 1006, 1019, 1062, 1106, 1107, 1334, 1370, 1759, 1787, 1798, 1799, 1815, 1820, 1994, 2050, 2051, 2058, 2066,

Save and display sample reflectance data

In [9]:
modis_data = pd.DataFrame(modis_data)
ts_days = (MODIS_TS_LENGTH - 1) * MODIS_TS_FREQ
modis_data.columns = ["ID"] + [f'{day-MODIS_TS_OFFSET:04}_{band+1}' for day in range(-ts_days, 1, MODIS_TS_FREQ) for band in range(len(BANDS))]
modis_data.sort_values('ID', inplace=True, key=lambda x: x.apply(sort_key))
modis_data.to_csv(MODIS_OUTPUT, index=False)
modis_data

Unnamed: 0,ID,-365_1,-365_2,-365_3,-365_4,-365_5,-365_6,-365_7,-364_1,-364_2,...,-002_5,-002_6,-002_7,-001_1,-001_2,-001_3,-001_4,-001_5,-001_6,-001_7
0,C4_1_1,1760,2497,1060,1461,3243,3128,2848,1752,2485,...,3285,3182,2553,1701,2738,900,1408,3285,3203,2568
1,C4_1_2,1757,2460,1060,1457,3251,3469,2965,1883,2640,...,3517,3526,2816,1820,2847,1015,1488,3506,3523,2816
2,C4_1_3,1966,2809,1124,1599,3497,3705,3108,1983,2826,...,3439,3521,2841,1896,2833,1036,1504,3452,3528,2847
3,C4_1_4,2003,2837,1147,1617,3407,3549,3155,2038,2891,...,3436,3480,2765,1849,2728,1039,1472,3404,3462,2754
4,C4_1_5,1929,2710,1135,1580,3368,3381,3080,1922,2710,...,3297,3368,2683,1815,2702,976,1420,3287,3326,2642
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65650,C13_4_14,490,1881,267,477,2053,1623,788,493,1888,...,2358,1922,981,652,2062,351,589,2421,1931,1005
65651,C13_4_15,538,1912,298,515,2163,1690,839,536,1907,...,2383,1840,948,625,2039,342,574,2369,1811,925
65652,C13_4_16,522,1846,287,491,2060,1665,801,524,1839,...,2294,1798,960,640,1977,345,562,2294,1805,961
65653,C13_4_17,500,1796,267,451,1953,1610,741,495,1785,...,2208,1492,830,601,1924,320,524,2197,1489,824


Save and display the valid samples

In [11]:
valid_samples = samples[valid_data].sort_values('ID', key=lambda x: x.apply(sort_key))
valid_samples.to_csv(SAMPLES_OUTPUT, index=False)
valid_samples

Unnamed: 0,ID,Latitude,Longitude,Sampling date,Sampling year,Land Cover,LFMC value,Site,Day_sin,Day_cos,Long_sin,Long_cos,Lat_norm,Elevation,Slope,Aspect_sin,Aspect_cos
73,C4_1_1,40.21458,-112.21868,2005-06-20,2005,120,156.76300,C4_1,-0.21352,0.97694,-0.92575,-0.37814,0.72341,0.26200,0.02222,-0.03490,0.99939
74,C4_1_2,40.21458,-112.21868,2005-07-05,2005,120,128.27700,C4_1,0.04302,0.99907,-0.92575,-0.37814,0.72341,0.26200,0.02222,-0.03490,0.99939
75,C4_1_3,40.21458,-112.21868,2005-07-21,2005,120,92.48200,C4_1,0.31311,0.94972,-0.92575,-0.37814,0.72341,0.26200,0.02222,-0.03490,0.99939
76,C4_1_4,40.21458,-112.21868,2005-08-08,2005,120,82.09300,C4_1,0.58779,0.80902,-0.92575,-0.37814,0.72341,0.26200,0.02222,-0.03490,0.99939
77,C4_1_5,40.21458,-112.21868,2005-08-23,2005,120,78.95300,C4_1,0.77488,0.63210,-0.92575,-0.37814,0.72341,0.26200,0.02222,-0.03490,0.99939
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,C13_4_14,46.89791,-113.43535,2012-08-28,2012,71,102.44207,C13_4,0.83593,0.54884,-0.91751,-0.39771,0.76054,0.21083,0.07778,0.54464,0.83867
61,C13_4_15,46.89791,-113.43535,2012-09-04,2012,71,88.76436,C13_4,0.89584,0.44438,-0.91751,-0.39771,0.76054,0.21083,0.07778,0.54464,0.83867
62,C13_4_16,46.89791,-113.43535,2012-09-11,2012,71,88.79382,C13_4,0.94276,0.33347,-0.91751,-0.39771,0.76054,0.21083,0.07778,0.54464,0.83867
63,C13_4_17,46.89791,-113.43535,2012-09-18,2012,71,81.72345,C13_4,0.97601,0.21772,-0.91751,-0.39771,0.76054,0.21083,0.07778,0.54464,0.83867
