# Effect of weights in the clustering

Now we know that 200 is good number of clusters, it is better to use real ideal and nadir values of the problem (in stead of ones attained from clustered results) and that it would be more justified to use stand closest to centroid as the representative of the cluster (even though the results are worse than if using the cold and mean mean of the cluster).

Then the last thing not tried yet, is changing the weights of different clusters in the optimization phase. This far the weighting has been based on the number of stands in a cluster, but it would be better to scale the sum of all weights to 1.

In [1]:
 %matplotlib inline
import seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from ASF import ASF
from gradutil import *
from pyomo.opt import SolverFactory
seedn = 1

First lets take all the data in

In [2]:
%%time
revenue, carbon, deadwood, ha = init_boreal()
n_revenue = nan_to_bau(revenue)
n_carbon= nan_to_bau(carbon)
n_deadwood = nan_to_bau(deadwood)
n_ha = nan_to_bau(ha)
ide = ideal(False)
nad = nadir(False)
opt = SolverFactory('cplex')

In [3]:
x = pd.concat((n_revenue, n_carbon, n_deadwood, n_ha), axis=1)
x_stack = np.dstack((n_revenue, n_carbon, n_deadwood, n_ha))
#Normalize all the columns in 0-1 scale
x_norm = normalize(x.values)
x_norm_stack = normalize(x_stack)

Cluster the data to some clusters and calculate correponding weights using both ways

In [4]:
%%time 
nclust1 = 200
c, xtoc, dist = cluster(x_norm, nclust1, seedn, verbose=1)

In [5]:
rng = range(nclust1)
total_weight = len(x_norm)
w_orig = np.array([sum(xtoc == i) for i in rng])
w_scale = np.array([sum(xtoc == i)/total_weight for i in rng])

Calculate new cluster centers selecting the stand closest to the centroid

In [6]:
c_close = np.array([x_norm_stack[np.argmin(dist[xtoc == i])] for i in range(nclust1)])

Calculate solution for some reference using original weights

In [7]:
ref = np.array((ide[0], nad[1]+1, nad[2]+1, nad[3]+1))
ASF_lambda = lambda x: ASF(ide, nad, ref, c_close, weights=x[0], scalarization=x[1])

In [28]:
orig_asf   = ASF_lambda((w_orig, 'asf'));   res_orig_asf = opt.solve(orig_asf.model)
orig_stom  = ASF_lambda((w_orig, 'stom'));  res_orig_stom = opt.solve(orig_stom.model)
orig_guess = ASF_lambda((w_orig, 'guess')); res_orig_stom = opt.solve(orig_guess.model)

Calculate solution for some reference using the scaled weights

In [9]:
scale_asf   = ASF_lambda((w_scale, 'asf'));   res_scale_asf = opt.solve(scale_asf.model)
scale_stom  = ASF_lambda((w_scale, 'stom'));  res_scale_stom = opt.solve(scale_stom.model)
scale_guess = ASF_lambda((w_scale, 'guess')); res_scale_stom = opt.solve(scale_guess.model)

In [10]:
model_to_real_values(x_stack, xtoc, scale_asf.model) - model_to_real_values(x_stack, xtoc, orig_asf.model)

In [11]:
model_to_real_values(x_stack, xtoc, scale_stom.model) - model_to_real_values(x_stack, xtoc, orig_stom.model)

In [12]:
model_to_real_values(x_stack, xtoc, scale_guess.model) - model_to_real_values(x_stack, xtoc, orig_guess.model)

Actually there should'n be any difference between results because the scaling is linear. 

As you can see, now the cplex optimizer is used. Before when using clpk optimizer there were differences in the ASF  results. Probably some numerical instabilites of the non-commercial user.

Differences between solvers should be mentioned in the thesis!