# Trying to calculate ideal and nadir using clustering

In [1]:
 %matplotlib inline
import seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from ASF import ASF
from gradutil import *
from pyomo.opt import SolverFactory
from scipy.spatial.distance import euclidean
seedn = 2

First lets take all the data in

In [2]:
%%time
xses = init_norms()
x = xses['x']
x_stack = xses['x_stack']
x_norm = xses['x_norm']
x_norm_stack = xses['x_norm_stack']

In [11]:
ide = ideal(False)
nad = nadir(False)
opt = SolverFactory('cplex')

Normalize all the columns in 0-1 scale

In [13]:
%%time 
nclust = 300
seedn = 5
c, xtoc, dist = cluster(x_norm, nclust, seedn, verbose=0)

In [14]:
%%time
nvar = len(x_norm)
w = np.array([sum(xtoc == i) for i in range(nclust) if sum(xtoc == i) > 0])/nvar
c_close = np.array([x_norm_stack[min(np.array(range(len(xtoc)))[xtoc == i],
                              key=lambda index: euclidean(x_norm[index],np.mean(x_norm[xtoc == i],axis=0)))]
                    for i in range(nclust) if sum(xtoc == i) > 0])

In [15]:
%%time
data = c_close
weights = w
solver = SolverFactory('cplex')
problems = []
ress = []
for i in range(np.shape(data)[-1]):
    problems.append(BorealWeightedProblem(data[:, :, i], weights, nvar))
for j in range(len(problems)):
    ress.append(solver.solve(problems[j].model))
payoff = [[cluster_to_value(x_stack[:,:,i], res_to_list(problems[j].model), weights*nvar)
                   for i in range(np.shape(data)[-1])]
                  for j in range(len(problems))]
ide_clust = np.max(payoff, axis=0)
nad_clust = np.min(payoff, axis=0)
payoff_model = [[model_to_real_values(x_stack[:, :, i], problems[j].model,xtoc)
                 for i in range(np.shape(data)[-1])]
                  for j in range(len(problems))]
ide_model = np.max(payoff_model, axis=0)
nad_model = np.min(payoff_model, axis=0)

In [16]:

ide_clust, nad_clust

In [17]:
ide_model, nad_model

In [18]:
for p in payoff:
    print(p)

In [19]:
for p in payoff_model:
    print(p)

Now the interesting part is if the different vectors really have any effects on the results. Even though the surrogate ideal and nadir both more averaged than the real ones, we are still dealing with the same more averaged clusters in the optimization.

## Effect of ideal and nadir

We could now test that by doing the same optimization (same reference) by using different ideal and nadir values. Especially the "edges" of Pareto front are interesting.

#### Reference to 0 0 0 0 

In [20]:
ref_test = np.array((0,0,0,0))

In [23]:
asf = ASF(ide, nad, ref_test, c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
real_0 = model_to_real_values(x_stack, asf.model, xtoc)

In [24]:
asf = ASF(ide_clust, nad_clust, ref_test, c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
cluster_0 = model_to_real_values(x_stack, asf.model, xtoc)

In [25]:
real_0-cluster_0

Well, there is difference. As we can see, real ideal and nadir give smaller value for the revenue value and greater for all the rest.

#### Reference to ideal

In this test it is important to note difference if we are referencing to the real ideal or the ideal of clusters. Results are of course different in these cases.

In [26]:
asf = ASF(ide, nad, ide, c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
real_ide = model_to_real_values(x_stack, asf.model, xtoc)

In [27]:
asf = ASF(ide_clust, nad_clust, ide_clust, c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
cluster_ide = model_to_real_values(x_stack, asf.model, xtoc)

In [28]:
real_ide-cluster_ide

The differences are still big, but differently than previously. I still don't know what to say about that.

The essence of this could better be desribed if we try to optimize just one objective. So let's refer to the ideal of the carbon objective.

In [29]:
asf = ASF(ide, nad, np.array((0,ide[1],0,0)), c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
real_carbon = model_to_real_values(x_stack, asf.model, xtoc)

In [30]:
asf = ASF(ide_clust, nad_clust, np.array((0,ide_clust[1],0,0)), c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
cluster_carbon = model_to_real_values(x_stack, asf.model, xtoc)

In [31]:
real_carbon-cluster_carbon

This is exactly the same than when using the ideal!?

In [32]:
real_ide - real_carbon

We even get the same point...

How about the deadwood?

In [33]:
asf = ASF(ide, nad, np.array((0,0,ide[2],0)), c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
real_deadwood = model_to_real_values(x_stack, asf.model, xtoc)

In [34]:
asf = ASF(ide_clust, nad_clust, np.array((0,0,ide_clust[2],0)), c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
cluster_deadwood = model_to_real_values(x_stack, asf.model, xtoc)

In [35]:
real_deadwood-cluster_deadwood

And Habitat index?

In [36]:
asf = ASF(ide, nad, np.array((0,0,0,ide[3])), c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
real_ha = model_to_real_values(x_stack, asf.model, xtoc)

In [37]:
asf = ASF(ide_clust, nad_clust, np.array((0,0,0,ide_clust[3])), c_close, weights=w, nvar=nvar)
opt.solve(asf.model)
cluster_ha = model_to_real_values(x_stack, asf.model, xtoc)

In [38]:
real_ha-cluster_ha

So it looks like that there are some interesting relationships between ideals:
- In the revenue and the carbon there are greater differences in the ideal and nadir vectors compared to the real values. Probably because of this also the results attained by reference points are also more different when using real ideal or clustered ideal
- In the deadwood and the habitat the differences are smaller and so also results of references are more accurate.