### 0_methods_B_tuning_1_constrain_parameter_sets.ipynb  

# Tuning: constrain parameter sets
The goal is to submit 3000 runs for the tuning, with each an individual parameter set. The constraints make the parameter sets we try out more in line with the literature, and also make the parameter space we explore smaller.  

The constraints are:
- sigmaPaCa < sigmaThCa
- sigmaPaDu < sigmaThDu
- sigmaPaNeph < sigmaThNeph
- sigmaPaNeph < sigmaPaDu
- sigmaThNeph < sigmaThDu  

In addition, we change the setIDs from 0, 1, ..., 3000 to 1000, 1001, ..., 4000, because slurm array jobs delete trailing zeros (0001 becomes 1) thereby changing names of output files etc.

**Tuning 1: First iteration**  
We generate 96000 parameter sets. Expectation: each constraint divides the parameter set roughly by 2, so in total I expect 96000 * 2^5 = 3000. The result will however not be exactly 3000, because different parameters have different intervals and because of the random sampling via the distributions I prescribed per parameter (normal or uniform). I generated 96000 parameter sets and applied constraints to them with this notebook.  

Result: the constraints took away much less parameter sets than expected (reasons mentioned above). Looking at the parameter ranges chosen, this actually makes sense, because most constraints are already almost fulfilled by our ranges, which follow the literature. Details: starting with 96000 parameter sets; after constraint 1 72043 were left; after constraint 2 60114; then 52326; 44098; and finally ending up with 39424, which is a factor 2.4 less than at the start. The file parameters_tuning1_sampled.csv of this iteration is not saved, but replaced by the 2nd iteration. We will do the 2nd iteration with 7200 parameter sets.

**Tuning 1: Second iteration**  
We generate 7200 parameter sets. Expectation: 7200 / 2.4= ca. 3000 after applying the constraints.   

**Tuning 2**  
We continue using this notebook for tuning 2.  

**Tuning 2: First iteration**  
Is again only a test to find the factor that the constraints reduce on the number of parameter sets.  
From the 1000 parameter sets read in, after the constraints 155 were left, and every constraint on its own reduces the set with ca. 25-50%, as desired.  
So the factor is 1000/155 = 6.45.  
**Tuning 2: Second iteration**  
For 500 runs we generate 500x6.45=3225 param sets before constraints.  

**1P5: 1st tuning with 5 parameters for Pa and 5 for Th (i.e. without Ws and kdes)**  
We do not apply any constraints; we only changed the ID to start from 1000 instead of 0.


<a id='setup-notebook'></a>
# Set up notebook
[go to top](#top)

**Easiest:** load conda environment for which this notebook worked.  
**Just use environment.yml (present in this folder) and follow** https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file

In [1]:
from pathlib import Path         # Path objects to avoid inter-platform trouble in file paths
import platform

########## SET FILE PATHS ######################################

# KEEP THE Path() FUNCTION AND USE FORWARD SLASHES '/' ON EVERY OPERATING SYSTEM
savedir = Path('./figures/')      # folder for saving figures & other output; (empty) directory needed
obsdir = Path('./data/')          # obs. data is loaded from here
modeldir = Path('./modeloutput/') # model output is loaded from here

## OR define modeldir depending on which system you are working on:
# if platform.system() == "Darwin":   # on Mac (e.g. laptop)
#    modeldir = Path('~/Documents/PHD/Bern3D/results')
# elif platform.system() == "Linux":  # on linux (e.g. cluster)
#    modeldir = Path('/storage/climatestor/Bern3dLPX/scheen/b3d_results')
# else:
#    raise Exception("unknown system", platform.system())

#############################################

## CHECK FILEPATHS
# expand paths because np.loadtxt can't handle home directory ~
savedir = savedir.expanduser()
obsdir = obsdir.expanduser()
modeldir = modeldir.expanduser()
def check_dir(path):
    if not path.exists():
        raise Exception('File path ' + str(path) + ' does not exit. Correct or create first.')
check_dir(savedir)
check_dir(obsdir)
check_dir(modeldir)

## IMPORT PACKAGES
# first time install missing packages via $conda install numpy OR $pip3 install numpy (be consistent)
import numpy as np
import xarray as xr                            # $conda install -c anaconda -n cartopy xarray; needs some time
import pandas as pd
import importlib as imp                        # to import user-defined functions; renaming new name to name of deprecated package 'imp'
import math                                    # math.e or math.exp()
import xesmf as xe                             # regridding; install via conda-forge channel e.g. !conda install -c conda-forge xesmf -y

# plot-related packages:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import matplotlib.gridspec as gridspec
from matplotlib.ticker import MultipleLocator
matplotlib.rcParams['savefig.bbox'] = 'tight'  # cuts off whitespace
import matplotlib.cm as cmp                    # colormaps
import cmcrameri.cm as cmcr                    # better colour maps (https://www.fabiocrameri.ch/colourmaps/); $conda install -c conda-forge cmcrameri
import cartopy.crs as ccrs
import seaborn as sns

## CHECK PYTHON VERSION
if 1/2 == 0:
    raise Exception("You are using python 2. Please use python 3 for a correct display of the figures.") 

## PLOT SETTINGS
# larger labels
params = {'legend.fontsize': 'x-large',
         'axes.labelsize': 'x-large',
         'axes.titlesize':'x-large',
         'xtick.labelsize':'x-large',
         'ytick.labelsize':'x-large'}
pylab.rcParams.update(params)

## LOAD USER-DEFINED FUNCTIONS
import functions as f                          # my own functions; call via f.function_name()

In [3]:
imp.reload(f)   # shows how to reload functions after editing functions.py w/o kernel restart

<module 'functions' from '/storage/climatestor/Bern3dLPX/scheen/notebooks_new/functions.py'>

<a id='load'></a>
# Load parameter sets with pandas
[go to top](#top)

In [5]:
all_param_sets = pd.read_csv(datadir / "parameters_3P5_sampled.csv", index_col=None, encoding='utf-8-sig')  # encoding avoids \ufeff string in python3
all_param_sets

Unnamed: 0,# setID,BGC_sigmaPaPOC,BGC_sigmaPaCa,BGC_sigmaPaOp,BGC_sigmaPaDu,BGC_sigmaPaNeph,BGC_sigmaThPOC,BGC_sigmaThCa,BGC_sigmaThOp,BGC_sigmaThDu,BGC_sigmaThNeph
0,0,0.041664,0.027694,0.003618,0.025605,0.007500,0.171562,0.108154,0.184250,0.007954,0.079776
1,1,0.032777,0.009552,0.091571,0.009172,0.007582,0.058784,0.367628,0.110147,0.118523,0.001203
2,2,0.030240,0.054182,0.179840,0.022319,0.020960,0.134542,1.716227,0.130726,0.103457,0.045642
3,3,0.011025,0.064817,0.006240,0.034878,0.051747,0.235281,1.807028,0.076880,0.118701,0.054947
4,4,0.030462,0.007852,0.149346,0.003379,0.002312,0.042884,0.332646,0.213407,0.034688,0.099854
...,...,...,...,...,...,...,...,...,...,...,...
2995,2995,0.020129,0.064073,0.158277,0.042453,0.050615,0.033920,1.832027,0.063243,0.046403,0.090363
2996,2996,0.024468,0.070220,0.134154,0.002926,0.007099,0.230160,1.076886,0.119036,0.044523,0.011031
2997,2997,0.017211,0.070603,0.045437,0.028234,0.011729,0.263597,0.944984,0.210549,0.018470,0.021141
2998,2998,0.025439,0.030531,0.172991,0.042291,0.023682,0.107265,0.898596,0.216407,0.023834,0.117130


<a id='load'></a>
# Apply constraints
[go to top](#top)

Recall that the constraints are:
- sigmaPaCa < sigmaThCa
- sigmaPaDu < sigmaThDu
- sigmaPaNeph < sigmaThNeph
- sigmaPaNeph < sigmaPaDu
- sigmaThNeph < sigmaThDu

In [6]:
# constrained_param_sets = all_param_sets.where(
#     all_param_sets.BGC_sigmaPaCa < all_param_sets.BGC_sigmaThCa).where(
#     all_param_sets.BGC_sigmaPaDu < all_param_sets.BGC_sigmaThDu).where(
#     all_param_sets.BGC_sigmaPaNeph < all_param_sets.BGC_sigmaThNeph).where(
#     all_param_sets.BGC_sigmaPaNeph < all_param_sets.BGC_sigmaPaDu).where(
#     all_param_sets.BGC_sigmaThNeph < all_param_sets.BGC_sigmaThDu).dropna()

# constrained_param_sets

# to not apply any constraints:
constrained_param_sets = all_param_sets

Tuning 1: So we indeed read in 7200 parameter sets, and after the constraints 2951 (ca. 3000) are left.  
Note there are now gaps in between the setIDs as expected. We need to rename them to 0,2,3,...,n-1 in order for our run procedure to understand how to read the csv file.  

In [7]:
constrained_param_sets.columns

Index(['# setID', 'BGC_sigmaPaPOC', 'BGC_sigmaPaCa', 'BGC_sigmaPaOp',
       'BGC_sigmaPaDu', 'BGC_sigmaPaNeph', 'BGC_sigmaThPOC', 'BGC_sigmaThCa',
       'BGC_sigmaThOp', 'BGC_sigmaThDu', 'BGC_sigmaThNeph'],
      dtype='object')

In [8]:
n = len(constrained_param_sets)
# at the same time add 1000 to the setIDs because slurm array jobs delete trailing zeros (0001 becomes 1) thereby changing names of output files etc.
constrained_param_sets['# setID'] = range(1000,n+1000)
constrained_param_sets

Unnamed: 0,# setID,BGC_sigmaPaPOC,BGC_sigmaPaCa,BGC_sigmaPaOp,BGC_sigmaPaDu,BGC_sigmaPaNeph,BGC_sigmaThPOC,BGC_sigmaThCa,BGC_sigmaThOp,BGC_sigmaThDu,BGC_sigmaThNeph
0,1000,0.041664,0.027694,0.003618,0.025605,0.007500,0.171562,0.108154,0.184250,0.007954,0.079776
1,1001,0.032777,0.009552,0.091571,0.009172,0.007582,0.058784,0.367628,0.110147,0.118523,0.001203
2,1002,0.030240,0.054182,0.179840,0.022319,0.020960,0.134542,1.716227,0.130726,0.103457,0.045642
3,1003,0.011025,0.064817,0.006240,0.034878,0.051747,0.235281,1.807028,0.076880,0.118701,0.054947
4,1004,0.030462,0.007852,0.149346,0.003379,0.002312,0.042884,0.332646,0.213407,0.034688,0.099854
...,...,...,...,...,...,...,...,...,...,...,...
2995,3995,0.020129,0.064073,0.158277,0.042453,0.050615,0.033920,1.832027,0.063243,0.046403,0.090363
2996,3996,0.024468,0.070220,0.134154,0.002926,0.007099,0.230160,1.076886,0.119036,0.044523,0.011031
2997,3997,0.017211,0.070603,0.045437,0.028234,0.011729,0.263597,0.944984,0.210549,0.018470,0.021141
2998,3998,0.025439,0.030531,0.172991,0.042291,0.023682,0.107265,0.898596,0.216407,0.023834,0.117130


<a id='export'></a>
# Export as .csv
[go to top](#top)

In [9]:
# export without the index column
constrained_param_sets.to_csv(datadir / "parameters_3P5_sampled_after_constraints.csv", sep=',', header=True, index=False)