# LFMC Estimation - Comparison Models
Runs the comparison tests
1. Train the out-of-site architecture using the within-site scenario
2. Train the within-site model architecture the out-of-site scenario
3. Train the Modis-tempCNN architecture using the within-site scenario
4. Train the Modis-tempCNN architecture using the out-of-site scenario

In [1]:
import os
import json
import numpy as np
import pandas as pd

import initialise
import common
from model_utils import reshape_data
from modelling_functions import create_models, run_experiment
from model_parameters import ModelParams

## Directories and Input files
Change these settings as required
- `modis_csv`: The file containing extracted MODIS data for each sample, created by `Extract MODIS Data.ipynb`
- `prism_csv`: The file containing extracted PRISM data for each sample, created by `Extract PRISM Data.ipynb`
- `aux_csv`: The file containing extracted sample labels, DEM, climate zone and other auxiliary data, created by `Extract Auxiliary Data.ipynb`.

In [2]:
modis_csv = os.path.join(common.DATASETS_DIR, 'modis_365days.csv')
prism_csv = os.path.join(common.DATASETS_DIR, 'prism_365days.csv')
aux_csv = os.path.join(common.DATASETS_DIR, 'samples_365days.csv')

## Set up experiment parameters
If the experiment dictionary contains a 'tests' key that is not 'falsy' (False, None, 0, empty list) it is assumed to be a list of tests to run. Each test will run with the specified model parameters. Model parameters not specified will be the same for each test, as set in the main model_params dictionary. A failed run can be restarted by setting the 'restart' key to the test that failed. This test and the remaining tests will then be run.

If 'tests' is 'falsy' then a single test will be run using the parameters in the main model_params dictionary.

Other settings are:
- layerTypes: specifies which layers to include in the model
- Layer parameters should be specified as a list. The first entry in the list will be used for the first layer, etc.
- If the experiment includes changes to the layers, all non-default layer parameters need to be included. The parameters that are kept constant can be specified by including a key for the layer type in the experiment dictionary, and the value set to a dictionary of the constant parameters.

Model_parameters that cannot be changed in tests are:
- \*Filename
- \*Channels
- targetColumn

Example of setting layer parameters:  
```
{'name': 'Filters',  
 'description': 'Test effect of different filter sizes on conv layers',  
 'tests': [{'conv': {'filters': [32, 32, 32]}},  
           {'conv': {'filters': [8, 8, 8]}},  
           {'conv': {'filters': [32, 8, 8]}},  
           {'conv': {'filters': [8, 32, 8]}},   
           {'conv': {'filters': [8, 8, 32]}},  
           {'conv': {'filters': [8, 16, 32]}},  
           {'conv': {'filters': [32, 16, 8]}}],  
 'conv': {'numLayers': 3, 'poolSize': [2, 3, 4]},  
 'restart': 0}
 ``` 

#### Note
As this experiment uses different architectures for each test, the notebook doesn't import the model architecture parameters. Instead, model architecture parameters are set in each test in the experiment parameters. 

In [4]:
experiment = {
    'name': 'comparison_models',
    'description': 'Generate comparison models for scenarios using the "wrong" and the Modis-tempCNN architecture',
    'layerTypes': ['modisConv', 'prismConv', 'fc'],
    'tests': [
        {'dataSources': ['modis', 'prism', 'aux'], 'batchSize': 512, 'dropoutRate': 0, 'epochs': 50,
         'splitMethod': 'byYear', 'splitFolds': 4, 'splitYear': 2014,
         'auxColumns': ['Elevation', 'Slope', 'Aspect_sin', 'Aspect_cos', 'Long_sin', 'Long_cos', 'Lat_norm'],
         'auxOneHotCols': ['Czone3'], 'auxAugment': True,
         'fc': {'numLayers': 1, 'units': [128]},
         'modisConv': {'numLayers': 3, 'filters': [8, 8, 8], 'poolSize': [2, 3, 4]},
         'prismConv': {'numLayers': 3, 'filters': [8, 8, 8], 'poolSize': [2, 3, 4]}
        },
        {'dataSources': ['modis', 'prism', 'aux'], 'batchSize': 512, 'dropoutRate': 0.1, 'epochs': 100,
         'splitMethod': 'bySite', 'splitFolds': 10,
         'auxColumns': ['Elevation', 'Slope', 'Aspect_sin', 'Aspect_cos', 'Long_sin', 'Long_cos', 'Lat_norm'],
         'auxOneHotCols': ['Czone3'], 'auxAugment': True,
         'fc': {'numLayers': 3, 'units': [512, 512, 512]},
         'modisConv': {'numLayers': 5, 'filters': [8, 8, 8, 8, 8], 'poolSize': [0, 5, 2, 3, 4]},
         'prismConv': {'numLayers': 5, 'filters': [8, 8, 8, 8, 8], 'poolSize': [0, 5, 2, 3, 4]}
        },
        {'dataSources': ['modis', 'aux'], 'batchSize': 32, 'dropoutRate': 0.5, 'epochs': 100,
         'splitMethod': 'byYear', 'splitFolds': 4, 'splitYear': 2014,
         'auxColumns': ['Day_sin', 'Day_cos', 'Elevation', 'Slope', 'Aspect_sin', 'Aspect_cos', 'Long_sin', 'Long_cos', 'Lat_norm'], 
         'auxOneHotCols': [], 'auxAugment': False,
         'fc': {'numLayers': 2, 'units': [256, 256]},
         'modisConv': {'numLayers': 3, 'filters': [32, 32, 32], 'poolSize': [2, 3, 4]},
        },
        {'dataSources': ['modis', 'aux'], 'batchSize': 32, 'dropoutRate': 0.5, 'epochs': 100,
         'splitMethod': 'bySite', 'splitFolds': 10,
         'auxColumns': ['Day_sin', 'Day_cos', 'Elevation', 'Slope', 'Aspect_sin', 'Aspect_cos', 'Long_sin', 'Long_cos', 'Lat_norm'], 
         'auxOneHotCols': [], 'auxAugment': False,
         'fc': {'numLayers': 2, 'units': [256, 256]},
         'modisConv': {'numLayers': 3, 'filters': [32, 32, 32], 'poolSize': [2, 3, 4]},
        },
    ],
    'restart': None,
    'testNames': [
        'Out-of-site model within-site scenario',
        'Within-site model out-of-site scenario',
        'Modis-tempCNN within-site scenario',
        'Modis-tempCNN out-of-site scenario',
    ]
}

# Save and display experiment details
experiment_dir = os.path.join(common.MODELS_DIR, experiment['name'])
restart = experiment.get('restart')
if not os.path.exists(experiment_dir):
    os.makedirs(experiment_dir)
elif not restart:
    raise FileExistsError(f'{experiment_dir} exists but restart not requested')
experiment_file = f'experiment{restart}.json' if restart else 'experiment.json'
with open(os.path.join(experiment_dir, experiment_file), 'w') as f:
    json.dump(experiment, f, indent=2)
experiment

{'name': 'comparison_models',
 'description': 'Generate comparison models for scenarios using the "wrong" and the Modis-tempCNN architecture',
 'layerTypes': ['modisConv', 'prismConv', 'fc'],
 'tests': [{'dataSources': ['modis', 'prism', 'aux'],
   'batchSize': 512,
   'dropoutRate': 0,
   'epochs': 50,
   'splitMethod': 'byYear',
   'splitFolds': 4,
   'splitYear': 2014,
   'auxColumns': ['Elevation',
    'Slope',
    'Aspect_sin',
    'Aspect_cos',
    'Long_sin',
    'Long_cos',
    'Lat_norm'],
   'auxOneHotCols': ['Czone3'],
   'auxAugment': True,
   'fc': {'numLayers': 1, 'units': [128]},
   'modisConv': {'numLayers': 3, 'filters': [8, 8, 8], 'poolSize': [2, 3, 4]},
   'prismConv': {'numLayers': 3, 'filters': [8, 8, 8], 'poolSize': [2, 3, 4]}},
  {'dataSources': ['modis', 'prism', 'aux'],
   'batchSize': 512,
   'dropoutRate': 0.1,
   'epochs': 100,
   'splitMethod': 'bySite',
   'splitFolds': 10,
   'auxColumns': ['Elevation',
    'Slope',
    'Aspect_sin',
    'Aspect_cos',
   

## Set up model parameters
Set up and customise the model parameters. To find out more about any parameter, run `model_params.help('<parameter>')` after running this cell to create the ModelParams object.

In [5]:
# Customize model parameters
model_params = ModelParams()

model_params['modelName'] = experiment['name']
model_params['description'] = experiment['description']
model_params['modelClass'] = 'LfmcTempCnn'
model_params['modisFilename'] = modis_csv
model_params['prismFilename'] = prism_csv
model_params['auxFilename'] = aux_csv
model_params['modelRuns'] = common.EVALUATION_RUNS
model_params['seedList'] = [
    441, 780, 328, 718, 184, 372, 346, 363, 701, 358,
    566, 451, 795, 237, 788, 185, 397, 530, 758, 633,
    632, 941, 641, 519, 162, 215, 578, 919, 917, 585,
    914, 326, 334, 366, 336, 413, 111, 599, 416, 230,
    191, 700, 697, 332, 910, 331, 771, 539, 575, 457
]

model_params['tempDir'] = common.TEMP_DIR
model_params['modelDir'] = os.path.join(common.MODELS_DIR, model_params['modelName'])

# =============================================================================
# Parameters for parallel execution on GPUs
# =============================================================================
# model_params['gpuDevice'] = 1
# model_params['gpuMemory'] = 3800
# model_params['maxWorkers'] = 5

if not os.path.exists(model_params['modelDir']):
    os.makedirs(model_params['modelDir'])
    
model_params

{'modelName': 'comparison_models',
 'description': 'Generate comparison models for scenarios using the "wrong" and the Modis-tempCNN architecture',
 'modelClass': 'LfmcTempCnn',
 'modelDir': 'G:\\My Drive\\LFMC Data\\LFMC_ensembles\\Models\\comparison_models',
 'tempDir': 'C:\\Temp\\LFMC',
 'diagnostics': False,
 'dataSources': [],
 'restartRun': None,
 'saveModels': False,
 'saveTrain': None,
 'plotModel': True,
 'randomSeed': 1234,
 'modelSeed': 1234,
 'modelRuns': 2,
 'resplit': False,
 'seedList': [441,
  780,
  328,
  718,
  184,
  372,
  346,
  363,
  701,
  358,
  566,
  451,
  795,
  237,
  788,
  185,
  397,
  530,
  758,
  633,
  632,
  941,
  641,
  519,
  162,
  215,
  578,
  919,
  917,
  585,
  914,
  326,
  334,
  366,
  336,
  413,
  111,
  599,
  416,
  230,
  191,
  700,
  697,
  332,
  910,
  331,
  771,
  539,
  575,
  457],
 'maxWorkers': 1,
 'deterministic': False,
 'gpuDevice': 0,
 'gpuMemory': 0,
 'modisFilename': 'G:\\My Drive\\LFMC Data\\LFMC_ensembles\\Datase

## Prepare the data

In [6]:
modis_data = pd.read_csv(model_params['modisFilename'], index_col=0)
x_modis = reshape_data(np.array(modis_data), model_params['modisChannels'])
print(f'Modis shape: {x_modis.shape}')

prism_data = pd.read_csv(model_params['prismFilename'], index_col=0)
x_prism = reshape_data(np.array(prism_data), model_params['prismChannels'])
print(f'Prism shape: {x_prism.shape}')

aux_data = pd.read_csv(model_params['auxFilename'], index_col=0)
y = aux_data[model_params['targetColumn']]

Modis shape: (66946, 365, 7)
Prism shape: (66946, 365, 7)


## Build and run the models
Builds and trains the LFMC models. 

All models, predictions, and evaluation statisticsare saved to `model_dir`, with each test and run saved to a separate sub-directory. For each model created, predictions and evaluation statistics are also returned as attributes of the `model` object. These are stored as nested lists, the structure for a full experiment is:
- Tests (omitted if not an experiment)
  - Runs (omitted for a single run)
    - Folds (for k-fold splitting)

In [7]:
def is_experiment():
    try:
        return bool(experiment['tests'])
    except:
        return False

In [8]:
X = {'modis': x_modis, 'prism': x_prism}
if is_experiment():
    models = run_experiment(experiment, model_params, aux_data, X, y)
else:
    print('Running a single test')
    with open(os.path.join(model_params['modelDir'], 'model_params.json'), 'w') as f:
        model_params.save(f)
    models = create_models(model_params, aux_data, X, y)

Experiment comparison_models - Generate comparison models for scenarios using the "wrong" and the Modis-tempCNN architecture






----------------------------------------------------------------------

test 2 - {'dataSources': ['modis', 'aux'], 'batchSize': 32, 'dropoutRate': 0.5, 'epochs': 100, 'splitMethod': 'byYear', 'splitFolds': 0, 'splitYear': 2014, 'auxColumns': ['Day_sin', 'Day_cos', 'Elevation', 'Slope', 'Aspect_sin', 'Aspect_cos', 'Long_sin', 'Long_cos', 'Lat_norm'], 'auxOneHotCols': [], 'auxAugment': False, 'fc': {'numLayers': 2, 'units': [256, 256]}, 'modisConv': {'numLayers': 3, 'filters': [32, 32, 32], 'poolSize': [2, 3, 4]}}

Auxiliary columns: ['Day_sin', 'Day_cos', 'Elevation', 'Slope', 'Aspect_sin', 'Aspect_cos', 'Long_sin', 'Long_cos', 'Lat_norm']
modis shape: (66946, 365, 7)
aux shape: (66946, 9)
comparison_models_test2_run0 training results: minLoss: 601.405, runTime: 916.460
comparison_models_test2_run1 training results: minLoss: 603.739, runTime: 919.600

----------------------------------------------------------------------

test 3 - {'dataSources': ['modis', 'aux'], 'batchSize': 32, 'dro

In [9]:
if is_experiment():
    for model in models:
        display(getattr(model, 'all_stats', None))
else:
    display(getattr(models, 'all_stats', None))

Unnamed: 0,Bias,R,R2,RMSE,ubRMSE
base,0.64,0.75,0.57,24.81,24.8
best,0.43,0.75,0.57,24.85,24.84
merge10,1.11,0.76,0.57,24.71,24.69
ensemble10,1.52,0.76,0.57,24.74,24.7
merge_best10,1.15,0.76,0.57,24.72,24.69


Unnamed: 0,Bias,R,R2,RMSE,ubRMSE
base,-3.2,0.74,0.55,25.41,25.21
best,-4.73,0.74,0.54,25.69,25.25
merge10,-3.89,0.75,0.54,25.47,25.18
ensemble10,-3.8,0.75,0.55,25.4,25.11
merge_best10,-3.5,0.75,0.55,25.34,25.1


Unnamed: 0,Bias,R,R2,RMSE,ubRMSE
base,-6.2,0.66,0.4,27.77,27.07
best,-4.62,0.66,0.4,27.79,27.41
merge10,-5.05,0.66,0.4,27.76,27.3
ensemble10,-4.91,0.67,0.42,27.34,26.9
merge_best10,-4.86,0.66,0.4,27.72,27.29
