# Deep dive into thresholding continuous fields to make categorical fields

In this notebook we will explore the categorization process that is in pyEMU.  Note this process was inspired by Todaro and other (2023) "Experimental sandbox tracer tests to characterize a two-facies aquifer via an ensemble smoother" [https://doi.org/10.1007/s10040-023-02662-1](https://doi.org/10.1007/s10040-023-02662-1)

In [None]:
import os
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyemu

First, lets generate just a single multivariate gaussian field.  We will use this as the "original array" thru out the rest of this notebook:

In [None]:
nrow = ncol = 50
delx = np.ones(ncol)
dely = np.ones(nrow)
v = pyemu.geostats.ExpVario(contribution=1.0,a=500)
gs = pyemu.geostats.GeoStruct(variograms=v)
ss = pyemu.geostats.SpecSim2d(delx=delx,dely=dely,geostruct=gs)
np.random.seed(122341)
org_arr = ss.draw_arrays(1,mean_value=10)[0,:,:]
assert org_arr.min() > 0.0
cb = plt.imshow(org_arr)
_ = plt.colorbar(cb)

Now let's setup a workspace

In [None]:
ws = "temp_thresh"
if os.path.exists(ws):
    shutil.rmtree(ws)
os.makedirs(ws)

And save the original array in that workspace

In [None]:
orgarr_file = os.path.join(ws,"orgarr.dat")
np.savetxt(orgarr_file,org_arr)

The categorization process in pyEMU currently only supports 2 categories/facies.  So we need to define a `dict` that contains the proportions of each category and the initial value used to fill each category.  For example purposes, we will use the extrema of the original array as the fill values, so that we will end up with a categorical array that has only two unique values: the min and max of the original array:

In [None]:
cat_dict = {1:[0.95,org_arr.min()],2:[0.05,org_arr.max()]}
thresharr_file,threshcsv_file = pyemu.helpers.setup_threshold_pars(orgarr_file,cat_dict=cat_dict,
                                                         testing_workspace=ws)

Now lets see what was created as part of the setup process:

In [None]:
os.listdir(ws)

Notice the what we have are files with the original array name and some suffix.  Let's check them out:

In [None]:
df = pd.read_csv(threshcsv_file)
df

In [None]:
catarr_file = orgarr_file+".threshcat.dat"
print(catarr_file)
assert os.path.exists(catarr_file)

In [None]:
thresharr_file = orgarr_file+".thresharr.dat"
print(thresharr_file)
assert os.path.exists(thresharr_file)

In [None]:
threshcsv_file_results = orgarr_file+'.threshprops_results.csv'

In [None]:
def load_and_plot(save_name=None):
    cat_arr = np.loadtxt(catarr_file)
    new_arr = np.loadtxt(orgarr_file)
    thresh_arr = np.loadtxt(thresharr_file)
    thresh_arr = (thresh_arr-thresh_arr.min())/thresh_arr.max()
    ddf = pd.read_csv(threshcsv_file_results)
    cat1_prop = ddf.loc[0,"proportion"]/ ddf.loc[:,"proportion"].sum()
    cat2_prop = ddf.loc[1,"proportion"]/ ddf.loc[:,"proportion"].sum()
    cat1_thresh = ddf.loc[0,"threshold"]
    cat2_thresh = ddf.loc[1,"threshold"]
    fig,axes = plt.subplots(1,3,figsize=(10,2.5))
    cb = axes[0].imshow(org_arr)
    plt.colorbar(cb,ax=axes[0])
    #cb = axes[2].imshow(cat_arr)#,vmin=org_arr.min(),vmax=org_arr.max())
    #plt.colorbar(cb,ax=axes[2])
    cb = axes[2].imshow(new_arr,vmin=org_arr.min(),vmax=org_arr.max())
    plt.colorbar(cb,ax=axes[2])
    cb = axes[1].imshow(thresh_arr)#,vmin=org_arr.min(),vmax=org_arr.max())
    plt.colorbar(cb,ax=axes[1])
    axes[1].contour(thresh_arr,levels=[cat1_thresh,cat2_thresh],colors=['w',"w"])
    axes[0].set_title("original array\n",loc="left",fontsize=8)
    axes[1].set_title("thresholding array\ncat 1 threshold:{0:3.4f}".\
                      format(cat1_thresh),loc="left",fontsize=8)
    #axes[2].set_title("categorized array\ncat 1 proportion: {0:3.4f}".\
    #                  format(cat1_prop)\
    #                  ,loc="left",fontsize=10)
    axes[2].set_title("new array\n{1:2.2f}% max, {0:2.2f}% min".format(cat1_prop*100,cat2_prop*100)
                      ,loc="left",fontsize=8)
    plt.tight_layout()
    for ax in axes:
        ax.set_yticks([])
        ax.set_xticks([])
    plt.tight_layout()
    if save_name is not None:
        plt.savefig(save_name)
    else:    
        plt.show()
    plt.close(fig)
    return cat_arr,new_arr,ddf

In [None]:
cat_arr,new_arr,newnew_df = load_and_plot()
newnew_df

So there it is! The original array for reference, the "thresholding array" (which is just a scaled and normed version of the original array) and the resulting "new array".


Now let's experiment - feel free to change the quantities in `new_df`:

In [None]:
new_df = df.copy()
new_df.loc[0,"threshproportion"] = .25
new_df.to_csv(threshcsv_file)
pyemu.helpers.apply_threshold_pars(threshcsv_file)
_,_,newnew_df = load_and_plot()
newnew_df.iloc[0]

Now lets sweep over a range of category 1 proportions and make some figs:

In [None]:
cat1_props = np.linspace(0.01,0.99,100)
cat1_props

In [None]:
for i,prop in enumerate(cat1_props):
    new_df = df.copy()
    new_df.loc[0,"threshproportion"] = prop
    new_df.to_csv(threshcsv_file)
    pyemu.helpers.apply_threshold_pars(threshcsv_file)
    save_name = os.path.join(ws,"fig_{0:04d}.png".format(i))
    _,_,newnew_df = load_and_plot(save_name=save_name)
    print(i," ",end='')


And if you have `ffmpeg` installed, we can make an sweet-as animated gif:

In [None]:
fps = 15
pyemu.os_utils.run("ffmpeg -i fig_{0:04d}.png -vf palettegen=256 palette.png".format(int(len(cat1_props)/2)),cwd=ws)
pyemu.os_utils.run("ffmpeg -r {0} -y -s 1920X1080 -i fig_%04d.png -i palette.png -filter_complex \"scale=720:-1:flags=lanczos[x];[x][1:v]paletteuse\" fancy.gif".format(fps),
        cwd=ws)

![SegmentLocal](temp_thresh/fancy.gif "segment1")

So how does this work within the PEST world?  Well we can treat the thresholding array as an array we want to parameterize (maybe with pilot points?) as well as parameterizing the fill values and proportions in the "threshprops.csv" file.  This will let us manipulate the shape of the resulting categorical array that forward model will use as an input. In turn, this yield variability in the simulated response of the system.  And away we go!