# How to handle Nan-values so that the HA doesn't get marginalized?

It has been a problem this far, that the clustering doesn't work as desired, and the problem is now located in the procedure Nan-values have been handled in clustering. So we need a better way to do that.

In [5]:
from gradutil import *
from pyomo.opt import SolverFactory

In [6]:
seed = 2
opt = SolverFactory('glpk')
solutions = real_solutions()
revenue, carbon, deadwood, ha = init_boreal()
x = np.concatenate((revenue, carbon, deadwood, ha), axis=1)

## Set nan:s to smallest existing option

Let's first just try setting all nan:s to the smallest value in the corresponding column

In [3]:
norm_data = x.copy()
inds = np.where(np.isnan(norm_data))
norm_data[inds] = np.take(np.nanmin(norm_data, axis=0),inds[1])

Then normalize all as before

In [4]:
min_norm_x = normalize(norm_data)

  norm_data = np.where(normax != 0., norm_data / normax, 0)


In [11]:
%%time
nclust = 50
optim_revenue50, optim_carbon50, optim_deadwood50, optim_ha50 = cNopt(x, min_norm_x, min_norm_x, opt, nclust, seed)

CPU times: user 9.41 s, sys: 64 ms, total: 9.47 s
Wall time: 9.53 s


In [12]:
print('Relative differences to original values, 50 clusters')
print("(i) Harvest revenues difference {:.3f}".format((optim_revenue50-solutions['revenue'])/solutions['revenue']))
print("(ii) Carbon storage {:.3f}".format((optim_carbon50-solutions['carbon'])/solutions['carbon']))
print("(iii) Deadwood index {:.3f}".format((optim_deadwood50-solutions['deadwood'])/solutions['deadwood']))
print("(iv) Combined Habitat {:.3f}".format((optim_ha50-solutions['ha'])/solutions['ha']))

Relative differences to original values, 50 clusters
(i) Harvest revenues difference -0.107
(ii) Carbon storage -0.056
(iii) Deadwood index nan
(iv) Combined Habitat nan


So it looks like this setting is not enough to drive optimization away from these points, and it doesn't tell anything about clustering. We need to adjust values for the optimization part, so we can know how the clustering goes

In [13]:
no_nan_x = x.copy()
inds = np.where(np.isnan(no_nan_x))
no_nan_x[inds] = np.take(np.nanmin(no_nan_x, axis=0) - np.nanmax(no_nan_x, axis=0), inds[1])

There is then great penalty for choosing the Nan values in optimization.

In [14]:
%%time
nclust = 50
penalty_optim_revenue50, penalty_optim_carbon50, penalty_optim_deadwood50, penalty_optim_ha50 = cNopt(x, min_norm_x, no_nan_x, opt, nclust, seed)

CPU times: user 9.6 s, sys: 40 ms, total: 9.64 s
Wall time: 9.68 s


In [15]:
print('Relative differences to original values, 50 clusters')
print("(i) Harvest revenues difference {:.3f}".format((penalty_optim_revenue50-solutions['revenue'])/solutions['revenue']))
print("(ii) Carbon storage {:.3f}".format((penalty_optim_carbon50-solutions['carbon'])/solutions['carbon']))
print("(iii) Deadwood index {:.3f}".format((penalty_optim_deadwood50-solutions['deadwood'])/solutions['deadwood']))
print("(iv) Combined Habitat {:.3f}".format((penalty_optim_ha50-solutions['ha'])/solutions['ha']))

Relative differences to original values, 50 clusters
(i) Harvest revenues difference -0.014
(ii) Carbon storage -0.004
(iii) Deadwood index -0.011
(iv) Combined Habitat 0.920


We have optimization results and it looks like the clustering is not working. We need another paradigma to handle the Nan-values...

## Give nan:s some penalty

In [17]:
norm_data = x.copy()
inds = np.where(np.isnan(norm_data))
norm_data[inds] = np.take((np.nanmin(norm_data, axis=0)-np.nanmax(norm_data, axis=0))/2,inds[1])
penalty_norm_x = normalize(norm_data)

  norm_data = np.where(normax != 0., norm_data / normax, 0)


In [22]:
%%time
nclust = 50
half_optim_revenue50, half_optim_carbon50, half_optim_deadwood50, half_optim_ha50 = cNopt(x, penalty_norm_x, no_nan_x, opt, nclust, seed)

CPU times: user 10.4 s, sys: 36 ms, total: 10.5 s
Wall time: 10.5 s


In [23]:
print('Relative differences to original values, 50 clusters')
print("(i) Harvest revenues difference {:.3f}".format((half_optim_revenue50-solutions['revenue'])/solutions['revenue']))
print("(ii) Carbon storage {:.3f}".format((half_optim_carbon50-solutions['carbon'])/solutions['carbon']))
print("(iii) Deadwood index {:.3f}".format((half_optim_deadwood50-solutions['deadwood'])/solutions['deadwood']))
print("(iv) Combined Habitat {:.3f}".format((half_optim_ha50-solutions['ha'])/solutions['ha']))

Relative differences to original values, 50 clusters
(i) Harvest revenues difference -0.014
(ii) Carbon storage -0.004
(iii) Deadwood index -0.015
(iv) Combined Habitat 0.893


That is not working either. Need something else.

## Give nans ridiculous penalty

In [25]:
norm_data = x.copy()
inds = np.where(np.isnan(norm_data))
norm_data[inds] = np.take((np.nanmin(norm_data, axis=0)-np.nanmax(norm_data, axis=0))*2,inds[1])
ridiculous_norm_x = normalize(norm_data)

  norm_data = np.where(normax != 0., norm_data / normax, 0)


In [26]:
%%time
nclust = 50
ridic_optim_revenue50, ridic_optim_carbon50, ridic_optim_deadwood50, ridic_optim_ha50 = cNopt(x, ridiculous_norm_x, no_nan_x, opt, nclust, seed)

CPU times: user 11.9 s, sys: 44 ms, total: 12 s
Wall time: 12 s


In [27]:
print('Relative differences to original values, 50 clusters')
print("(i) Harvest revenues difference {:.3f}".format((ridic_optim_revenue50-solutions['revenue'])/solutions['revenue']))
print("(ii) Carbon storage {:.3f}".format((ridic_optim_carbon50-solutions['carbon'])/solutions['carbon']))
print("(iii) Deadwood index {:.3f}".format((ridic_optim_deadwood50-solutions['deadwood'])/solutions['deadwood']))
print("(iv) Combined Habitat {:.3f}".format((ridic_optim_ha50-solutions['ha'])/solutions['ha']))

Relative differences to original values, 50 clusters
(i) Harvest revenues difference -0.014
(ii) Carbon storage -0.004
(iii) Deadwood index -0.015
(iv) Combined Habitat 0.896
