# evaluomeR - optimal $k$ analysis

In [4]:
library("evaluomeR")
library("ISLR") 

options(scipen=10)

# Table of contents
* [Dataset](#dataset)
* [Analysis per metric](#single)
    * [Stability](#single_stab)
    * [Quality](#single_qual)
    * [Optimal K value](#single_optimal)
* [Multidimensional optimal k analysis](#all)
    * [Stability](#all_stab)
    * [Quality](#all_qual)
    * [Optimal K value](#all_optimal)

# Dataset <a class="anchor" id="dataset"></a>
We are going to use the NCI60 dataset, actually a subsample of the first 500 columns for testing purposes.

In [5]:
seed = 13606
set.seed(seed)

nci60 = as.data.frame(NCI60$data)
# Creating a Description column
nci60["labels"] = rownames(nci60)
nci60 = nci60[ , c("labels", names(nci60)[names(nci60) != "labels"])]
nci60["labels"] = NCI60$labs
colnames(nci60)[colnames(nci60) == 'labels'] <- 'Description'
nci60 = nci60[1:500]
head(nci60)

Unnamed: 0,Description,1,2,3,4,5,6,7,8,9,...,490,491,492,493,494,495,496,497,498,499
V1,CNS,0.3,1.18,0.55,1.14,-0.265,-0.07,0.35,-0.315,-0.45,...,-0.43,-0.035,0.1,-0.285,-0.14,0.01999023,0.37,-0.38,-0.3725,-0.3200195
V2,CNS,0.679961,1.289961,0.169961,0.379961,0.464961,0.579961,0.699961,0.724961,-0.04003899,...,-0.330039,-0.605039,-0.580039,-0.985039,-0.550039,0.4199512,0.129961,-0.09003899,0.03746101,0.0
V3,CNS,0.94,-0.04,-0.17,-0.04,-0.605,0.0,0.09,0.645,0.43,...,0.23,-0.775,-0.85,-0.665,-0.86,0.2399902,-1.19,-0.84,-0.5125,-0.8900195
V4,RENAL,0.28,-0.31,0.68,-0.81,0.625,-1.387779e-17,0.17,0.245,0.02,...,-0.18,0.385,-0.68,-0.115,-0.66,0.1299902,-0.6,-0.52,-0.3225,-0.2600195
V5,BREAST,0.485,-0.465,0.395,0.905,0.2,-0.005,0.085,0.11,0.235,...,-0.195,-0.15,-0.755,-0.72,-0.355,-1.31500977,-0.975,-0.815,-0.6775,-1.3450195
V6,CNS,0.31,-0.03,-0.1,-0.46,-0.205,-0.54,-0.64,-0.585,-0.77,...,-0.67,-0.515,-0.14,-0.215,-0.14,0.3099902,-0.06,-0.57,-0.5425,-0.5500195


# Analysis per metric <a class="anchor" id="single"></a>
This demonstrates how to conduct an optimal $k$ analysis for each metric or feature in an input dataset. In this instance, we iterate over the range of $k \in [3,6]$. To avoid binary classifications, we exclude $k=2$. The CBI *kmeans* serves as the default clustering method.

In [6]:
k.range=c(3,6)
cbi="kmeans"

### Stability <a class="anchor" id="single_stab"></a>
First off, we have to compute the stabilities for the range of k provided, this is achieved via the `stabilityRange` method. As the output of  `stabilityRange` returns an [ExperimentList](https://rdrr.io/bioc/MultiAssayExperiment/man/ExperimentList.html) object. We can cast the output into a dataframe with the `standardizeStabilityData` method.

In [8]:
stab_range = stabilityRange(data=nci60, k.range=k.range, 
                            bs=100,
                            cbi=cbi)
stab = standardizeStabilityData(stab_range)


Data loaded.
Number of rows: 64
Number of columns: 500


Processing metric: 1(1)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 2(2)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 3(3)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 4(4)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 5(5)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 6(6)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 7(7)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 8(8)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 9(9)
	Calculation of k = 3


	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 73(73)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 74(74)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 75(75)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 76(76)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 77(77)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 78(78)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 79(79)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 80(80)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6


	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 144(144)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 145(145)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 146(146)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 147(147)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 148(148)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 149(149)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 150(150)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 151(151)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Proces

	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 215(215)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 216(216)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 217(217)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 218(218)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 219(219)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 220(220)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 221(221)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 222(222)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calcu

Processing metric: 285(285)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 286(286)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 287(287)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 288(288)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 289(289)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 290(290)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 291(291)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 292(292)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 293(293)
	Calculation of k = 3
	Calculation of k = 4


	Calculation of k = 6
Processing metric: 356(356)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 357(357)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 358(358)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 359(359)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 360(360)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 361(361)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 362(362)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 363(363)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 364(364)
	Calculation of k = 3


	Calculation of k = 5
	Calculation of k = 6
Processing metric: 427(427)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 428(428)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 429(429)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 430(430)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 431(431)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 432(432)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 433(433)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 434(434)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 435(435)


	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 498(498)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 499(499)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6


Stabilities for each metric and for each $k$ value can be seen in the `stab` dataframe:

In [11]:
head(stab, 10)

Unnamed: 0,k_3,k_4,k_5,k_6
1,0.7978412,0.65059,0.6723714,0.76268
10,0.8310416,0.6689639,0.7643663,0.7447169
100,0.8981664,0.7936278,0.7809988,0.7072595
101,0.8425751,0.7249831,0.7716184,0.7360279
102,0.8469411,0.7891323,0.6730645,0.7290758
103,0.8992642,0.7846766,0.7748154,0.7883483
104,0.8180414,0.6761582,0.8054521,0.8640097
105,0.8424271,0.8266604,0.7490818,0.6286293
106,0.8908126,0.6269586,0.6590911,0.5875396
107,0.5450808,0.8377965,0.8555157,0.7682525


### Quality <a class="anchor" id="single_qual"></a>
For goodness analysis, we will do a similar endevour as in stability analysis. For this, we have the `qualityRange` method that returns an [ExperimentList](https://rdrr.io/bioc/MultiAssayExperiment/man/ExperimentList.html) and the method `standardizeQualityData` to transform it into a dataframe.

In [10]:
qual_range = qualityRange(data=nci60, k.range=k.range, 
                            cbi=cbi)
qual = standardizeQualityData(qual_range)


Data loaded.
Number of rows: 64
Number of columns: 500


Processing metric: 1(1)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 2(2)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 3(3)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 4(4)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 5(5)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 6(6)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 7(7)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 8(8)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 9(9)
	Calculation of k = 3


	Calculation of k = 5
	Calculation of k = 6
Processing metric: 73(73)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 74(74)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 75(75)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 76(76)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 77(77)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 78(78)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 79(79)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 80(80)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 81(81)
	Calculation of k 

	Calculation of k = 6
Processing metric: 144(144)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 145(145)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 146(146)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 147(147)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 148(148)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 149(149)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 150(150)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 151(151)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 152(152)
	Calculation of k = 3


	Calculation of k = 5
	Calculation of k = 6
Processing metric: 215(215)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 216(216)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 217(217)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 218(218)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 219(219)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 220(220)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 221(221)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 222(222)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 223(223)


	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 286(286)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 287(287)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 288(288)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 289(289)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 290(290)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 291(291)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 292(292)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 293(293)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Proces

	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 357(357)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 358(358)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 359(359)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 360(360)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 361(361)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 362(362)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 363(363)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 364(364)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calcu

Processing metric: 427(427)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 428(428)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 429(429)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 430(430)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 431(431)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 432(432)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 433(433)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 434(434)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 435(435)
	Calculation of k = 3
	Calculation of k = 4


	Calculation of k = 6
Processing metric: 498(498)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6
Processing metric: 499(499)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6


Qualities for each metric and for each $k$ value can be seen in the `qual` dataframe:

In [12]:
head(qual, 10)

Unnamed: 0,k_3,k_4,k_5,k_6
1,0.569583,0.5037385,0.5475433,0.574357
10,0.5541852,0.5183693,0.5928258,0.5828366
100,0.6277726,0.6035017,0.6088439,0.5825842
101,0.5540026,0.5707742,0.5448087,0.5240356
102,0.557797,0.5500509,0.5498305,0.569866
103,0.5739099,0.5486029,0.5918741,0.5635921
104,0.5310612,0.4930869,0.5700545,0.6139769
105,0.5604474,0.5675506,0.5642461,0.4991574
106,0.5659673,0.501409,0.5079362,0.487848
107,0.5341215,0.596012,0.6206004,0.6312832


### Optimal K value <a class="anchor" id="single_optimal"></a>
In this Section we show how to compute the optimal $k$ value of a dataset **per metric**.

In [13]:
k_opt = getOptimalKValue(stab_range, qual_range, k.range= k.range)
optimal_k = as.numeric(k_opt$Global_optimal_k)

Processing metric: 1

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 2

	Stability k '3' is stable but quality k '6' is not

	Using '3' since it provides higher stability

Processing metric: 3

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 4

	Stability k '3' is stable but quality k '5' is not

	Using '3' since it provides higher stability

Processing metric: 5

	Maximum stability and quality values matches the same K value: '5'

Processing metric: 6

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 7

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 8

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 9

	Stability k '3' is stable but quality k '6' is not

	Using '3' since it provides higher stability

Processing metric: 10

	Both Ks have a stable classifi

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 80

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 81

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 82

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 83

	Maximum stability and quality values matches the same K value: '6'

Processing metric: 84

	Maximum stability and quality values matches the same K value: '4'

Processing metric: 85

	Stability k '5' is stable but quality k '6' is not

	Using '5' since it provides higher stability

Processing metric: 86

	Stability k '3' is stable but quality k '4' is not

	Using '3' since it provides higher stability

Processing metric: 87

	Stability k '3' is stable but quality k '4' is not

	Using '3' since it provides higher stability

Processing metric: 88

	Maximum st

Processing metric: 156

	Maximum stability and quality values matches the same K value: '6'

Processing metric: 157

	Maximum stability and quality values matches the same K value: '4'

Processing metric: 158

	Both Ks have a stable classification: '3', '5'

	Using '5' since it provides higher silhouette width

Processing metric: 159

	Stability k '4' is stable but quality k '6' is not

	Using '4' since it provides higher stability

Processing metric: 160

	Maximum stability and quality values matches the same K value: '6'

Processing metric: 161

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 162

	Stability k '3' is stable but quality k '5' is not

	Using '3' since it provides higher stability

Processing metric: 163

	Stability k '3' is stable but quality k '5' is not

	Using '3' since it provides higher stability

Processing metric: 164

	Maximum stability and quality values matches the same K value: '5'

Pr

Processing metric: 230

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 231

	Maximum stability and quality values matches the same K value: '4'

Processing metric: 232

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 233

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 234

	Stability k '3' is stable but quality k '4' is not

	Using '3' since it provides higher stability

Processing metric: 235

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 236

	Both Ks do not have a stable classification: '4', '5'

	Using '5' since it provides higher silhouette width

Processing metric: 237

	Both Ks have a stable classification: '3', '5'

	Using '5' since it provides higher silhouette width

Processing metric: 238

	Maximum stability and quality values matches the same K v

Processing metric: 303

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 304

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 305

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 306

	Maximum stability and quality values matches the same K value: '4'

Processing metric: 307

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 308

	Stability k '3' is stable but quality k '6' is not

	Using '3' since it provides higher stability

Processing metric: 309

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 310

	Maximum stability and quality values matches the same K value: '5'

Processing metric: 311

	Stability k '3' is stable but quality k '5' is not

	Using '3' since it provides higher stability

Processing metric: 312

	Stability k '3' is stable but quality k '5

Processing metric: 380

	Both Ks do not have a stable classification: '5', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 381

	Maximum stability and quality values matches the same K value: '4'

Processing metric: 382

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 383

	Both Ks have a stable classification: '4', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 384

	Stability k '5' is stable but quality k '6' is not

	Using '5' since it provides higher stability

Processing metric: 385

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 386

	Both Ks have a stable classification: '4', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 387

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 388

	Maximum stability and quality values matches the same K value: '6'

Processing metric: 389


	Both Ks have a stable classification: '3', '5'

	Using '5' since it provides higher silhouette width

Processing metric: 455

	Both Ks have a stable classification: '3', '5'

	Using '5' since it provides higher silhouette width

Processing metric: 456

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 457

	Stability k '3' is stable but quality k '5' is not

	Using '3' since it provides higher stability

Processing metric: 458

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 459

	Both Ks have a stable classification: '3', '6'

	Using '6' since it provides higher silhouette width

Processing metric: 460

	Maximum stability and quality values matches the same K value: '5'

Processing metric: 461

	Maximum stability and quality values matches the same K value: '3'

Processing metric: 462

	Maximum stability and quality values matches the same K value: '5'

Processing metric: 463

	Both Ks have a stable classif

In the following table, we show the $k$ where the metric was most stable in `Stability_max_k` column, in what $k$ we had the highest goodness (quality) in `Quality_max_k` and what is the decision of `evaluomeR` to compute the overall $k$ value of the metric in `Global_optimal_k`.

In [14]:
head(k_opt, 10)

Metric,Stability_max_k,Stability_max_k_stab,Stability_max_k_qual,Quality_max_k,Quality_max_k_stab,Quality_max_k_qual,Global_optimal_k
1,3,0.7978412,0.569583,6,0.76268,0.574357,6
2,3,0.7903263,0.5609821,6,0.6903602,0.6005832,3
3,3,0.7816762,0.5523531,3,0.7816762,0.5523531,3
4,3,0.7888597,0.6147062,5,0.6697399,0.6149489,3
5,5,0.8061986,0.5616904,5,0.8061986,0.5616904,5
6,3,0.9137646,0.6170207,3,0.9137646,0.6170207,3
7,3,0.9116126,0.5920057,3,0.9116126,0.5920057,3
8,3,0.9448024,0.5850644,3,0.9448024,0.5850644,3
9,3,0.9576668,0.614336,6,0.7434382,0.628288,3
10,3,0.8310416,0.5541852,5,0.7643663,0.5928258,5


# Multidimensional optimal k analysis <a class="anchor" id="all"></a>
This outlines the process of determining an optimal $k$ value for the entire dataset. In this scenario, we calculate the optimal $k$ across the dataset as a whole. Once again we consider a $k$ range such as $k \in [3, 6]$, excluding $k = 2$ to avoid binary classifications. The CBI *kmeans* method is utilized as the default clustering algorithm.

### Stability <a class="anchor" id="all_stab"></a>
Here we set the parameter `all_metrics=TRUE` in order to consider the stability of all the metrics as a whole.

In [15]:
stab_range = stabilityRange(data=nci60, k.range=k.range,
                            all_metrics=TRUE,
                            bs=100,
                            cbi=cbi)
stab = standardizeStabilityData(stab_range)
stab


Data loaded.
Number of rows: 64
Number of columns: 500


Processing all metrics, 'merge', in dataframe (499)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6


Unnamed: 0,k_3,k_4,k_5,k_6
all_metrics,0.6342906,0.6680511,0.5301713,0.5237286


In the table above, not all individual metrics are displayed; instead, a combined variable named `all_metrics` wchih represents the overall stability of metrics across the dataset.

### Quality <a class="anchor" id="all_qual"></a>
Similarly as in the [Stability](#all_stab) Section, setting up the parameter `all_metrics=TRUE` is needed to consider all the metrics.

In [16]:
qual_range = qualityRange(data=nci60, k.range=k.range,
                          all_metrics=TRUE,
                          cbi=cbi)
qual = standardizeQualityData(qual_range)
qual


Data loaded.
Number of rows: 64
Number of columns: 500


Processing all metrics, 'merge', in dataframe (499)
	Calculation of k = 3
	Calculation of k = 4
	Calculation of k = 5
	Calculation of k = 6


Unnamed: 0,k_3,k_4,k_5,k_6
all_metrics,0.1181279,0.1059742,0.1063796,0.08479927


### Optimal K value <a class="anchor" id="all_optimal"></a>
In this Section we show how to compute the optimal $k$ value considering all the metrics from **the whole dataset**.

In [17]:
k_opt = getOptimalKValue(stab_range, qual_range, k.range= k.range)
optimal_k = as.numeric(k_opt$Global_optimal_k)
k_opt

Processing metric: all_metrics

	Both Ks do not have a stable classification: '4', '3'

	Using '3' since it provides higher silhouette width



Metric,Stability_max_k,Stability_max_k_stab,Stability_max_k_qual,Quality_max_k,Quality_max_k_stab,Quality_max_k_qual,Global_optimal_k
all_metrics,4,0.6680511,0.1059742,3,0.6342906,0.1181279,3


In the previous table, according to the column `Global_optimal_k`, `evaluomeR` considered that the optimal $k$ value for the given dataset is $k=3$.