**StatTOPSIS**

StatTOPSIS integrates the traditional Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) with statistical methods to enhance decision-making robustness. The approach combines multi-criteria decision analysis with sensitivity analysis, bootstrapping, and non-parametric statistical tests to ensure comprehensive evaluation and reliable rankings of alternatives.

Import necessary Python libraries

In [62]:
import numpy as np
import pandas as pd
from scipy.stats import rankdata, friedmanchisquare
import seaborn as sns

# Set display options to show all columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

In [63]:
# Define the TOPSIS function with epsilon to avoid division by zero
def topsis(raw_data, weights, benefit_categories, epsilon=1e-10):
    m, n = raw_data.shape
    # Normalize the raw data
    divisors = np.sqrt(np.sum(raw_data ** 2, axis=0))
    normalized_data = raw_data / divisors

    # Apply weights
    weighted_data = normalized_data * weights

    # Determine Ideal and Negative Ideal Solutions
    ideal_solution = np.zeros(n)
    negative_ideal_solution = np.zeros(n)
    for j in range(n):
        if j in benefit_categories:
            ideal_solution[j] = np.max(weighted_data[:, j])
            negative_ideal_solution[j] = np.min(weighted_data[:, j])
        else:
            ideal_solution[j] = np.min(weighted_data[:, j])
            negative_ideal_solution[j] = np.max(weighted_data[:, j])

    # Calculate distances
    dist_to_ideal = np.sqrt(np.sum((weighted_data - ideal_solution) ** 2, axis=1))
    dist_to_negative_ideal = np.sqrt(np.sum((weighted_data - negative_ideal_solution) ** 2, axis=1))

    # Calculate TOPSIS scores with epsilon to prevent division by zero
    scores = dist_to_negative_ideal / (dist_to_ideal + dist_to_negative_ideal + epsilon)
    return scores

**Identification of Criteria and Weights**

Define the decision matrix, criteria, and weights for the alternatives.

In [64]:
# Identification of Criteria and Weights
categories = np.array(["Maximum Memory (GB)", "Supported Languages", "Free Tier (million invocations)", "On-Demand USD Cost (per million invocations)"])
alternatives = np.array(["AWS Lambda", "Microsoft Azure Functions", "Google Cloud Functions", "Oracle Cloud Functions"])
raw_data = np.array([
    [10,	8, 1,	0.20],
    [15,	7, 1,	0.20],
    [32,	7, 2,	0.40],
    [2,	6, 2,	0.20],

])

initial_weights = np.array([0.25, 0.25, 0.25, 0.25])
benefit_categories = set([0, 1, 2])

# Display raw data and weights
raw_data_df = pd.DataFrame(data=raw_data, index=alternatives, columns=categories)
weights_df = pd.DataFrame(data=initial_weights, index=categories, columns=["Weights"])

print("Raw Data:")
display(raw_data_df)
print("Initial Weights:")
display(weights_df)

Raw Data:


Unnamed: 0,Maximum Memory (GB),Supported Languages,Free Tier (million invocations),On-Demand USD Cost (per million invocations)
AWS Lambda,10.0,8.0,1.0,0.2
Microsoft Azure Functions,15.0,7.0,1.0,0.2
Google Cloud Functions,32.0,7.0,2.0,0.4
Oracle Cloud Functions,2.0,6.0,2.0,0.2


Initial Weights:


Unnamed: 0,Weights
Maximum Memory (GB),0.25
Supported Languages,0.25
Free Tier (million invocations),0.25
On-Demand USD Cost (per million invocations),0.25


**Normalization of Data**

Normalization is essential to bring all criteria to a common scale, ensuring that each criterion contributes proportionally to the decision-making process. This step involves transforming the raw data for each criterion into a dimensionless value between 0 and 1. Various normalization techniques, such as min-max normalization or z-score normalization, can be applied depending on the nature of the data.


In [65]:
# Normalize the raw data
m, n = raw_data.shape
divisors = np.empty(n)
for j in range(n):
    column = raw_data[:, j]
    divisors[j] = np.sqrt(column @ column)
normalized_data = raw_data / divisors

# Normalize the weights to ensure that they sum up to 1
weights = initial_weights / np.sum(initial_weights)

normalized_data_df = pd.DataFrame(data=normalized_data, index=alternatives, columns=categories)

print("Normalized Data:")
display(normalized_data_df)

Normalized Data:


Unnamed: 0,Maximum Memory (GB),Supported Languages,Free Tier (million invocations),On-Demand USD Cost (per million invocations)
AWS Lambda,0.271864,0.568535,0.316228,0.377964
Microsoft Azure Functions,0.407795,0.497468,0.316228,0.377964
Google Cloud Functions,0.869964,0.497468,0.632456,0.755929
Oracle Cloud Functions,0.054373,0.426401,0.632456,0.377964


The weights are normalized to ensure that they sum up to 1.

In [66]:
# Weighted normalized decision matrix
weighted_data = normalized_data * weights

weighted_data_df = pd.DataFrame(data=weighted_data, index=alternatives, columns=categories)

print("Weighted Normalized Data:")
display(weighted_data_df)

Weighted Normalized Data:


Unnamed: 0,Maximum Memory (GB),Supported Languages,Free Tier (million invocations),On-Demand USD Cost (per million invocations)
AWS Lambda,0.067966,0.142134,0.079057,0.094491
Microsoft Azure Functions,0.101949,0.124367,0.079057,0.094491
Google Cloud Functions,0.217491,0.124367,0.158114,0.188982
Oracle Cloud Functions,0.013593,0.1066,0.158114,0.094491


**Determination of Ideal Solution and Negative Ideal Solution**

Ideal Solution and Negative Ideal Solution are key concepts used to evaluate alternatives based on their distance from these ideal points.

In [67]:
# Determine the Ideal and Negative Ideal Solutions
a_pos = np.zeros(n)
a_neg = np.zeros(n)
for j in range(n):
    column = weighted_data[:, j]
    max_val = np.max(column)
    min_val = np.min(column)

    if j in benefit_categories:
        a_pos[j] = max_val
        a_neg[j] = min_val
    else:
        a_pos[j] = min_val
        a_neg[j] = max_val

ideal_df = pd.DataFrame(data=[a_pos, a_neg], index=["A+", "Negative"], columns=categories)
print("Ideal and Negative Ideal Solutions:")
display(ideal_df)

Ideal and Negative Ideal Solutions:


Unnamed: 0,Maximum Memory (GB),Supported Languages,Free Tier (million invocations),On-Demand USD Cost (per million invocations)
A+,0.217491,0.142134,0.158114,0.094491
Negative,0.013593,0.1066,0.079057,0.188982


**Calculation of Similarity Scores**

The core of TOPSIS lies in the calculation of similarity scores for each alternative with respect to the ideal and negative ideal solutions. The ideal solution represents the maximum (or minimum, depending on the nature of the criterion) values for each criterion, while the negative ideal solution represents the minimum (or maximum) values.

In [None]:
# Calculate the similarity scores
sp = np.zeros(m)
sn = np.zeros(m)
cs = np.zeros(m)

for i in range(m):
    diff_pos = weighted_data[i] - a_pos
    diff_neg = weighted_data[i] - a_neg
    sp[i] = np.sqrt(diff_pos @ diff_pos)
    sn[i] = np.sqrt(diff_neg @ diff_neg)
    cs[i] = sn[i] / (sp[i] + sn[i])

similarity_scores_df = pd.DataFrame(data=zip(sp, sn, cs), index=alternatives, columns=["S+", "S-", "Ci"])
print("Similarity Scores:")
display(similarity_scores_df)

Similarity Scores:


Unnamed: 0,S+,S-,Ci
AWS Lambda,0.169138,0.114663,0.404026
Microsoft Azure Functions,0.141123,0.130579,0.480597
Google Cloud Functions,0.096147,0.219408,0.695309
Oracle Cloud Functions,0.206971,0.123201,0.373143


**Ranking of Alternatives**

The final step involves ranking the alternatives based on their relative closeness to the ideal solution and distance from the anti-ideal solution.

In [None]:
# Ranking of alternatives
initial_ranks = rankdata(-cs)
ranking_df = pd.DataFrame(data=zip(cs, initial_ranks), index=alternatives, columns=["TOPSIS Score", "Initial Rank"]).sort_values(by="Initial Rank")
print("Initial Ranking of Alternatives (Descending Order):")
display(ranking_df)

Initial Ranking of Alternatives (Descending Order):


Unnamed: 0,TOPSIS Score,Initial Rank
Google Cloud Functions,0.695309,1.0
Microsoft Azure Functions,0.480597,2.0
AWS Lambda,0.404026,3.0
Oracle Cloud Functions,0.373143,4.0


**Sensitivity Analysis**

Sensitivity analysis in the context of TOPSIS is performed to evaluate the robustness of the rankings by examining how variations in the criteria weights affect the results. This analysis ensures that the final rankings are reliable and not overly sensitive to changes in the assigned weights.

In [None]:
# Sensitivity Analysis: Varying weights for each criterion
def sensitivity_analysis(raw_data, initial_weights, benefit_categories, alternatives):
    sensitivities = {}
    # Obtain initial ranking with current weights
    base_scores = topsis(raw_data, initial_weights, benefit_categories)
    base_ranking = rankdata(-base_scores)

    for i in range(len(initial_weights)):
        altered_weights = initial_weights.copy()
        for delta in np.linspace(-0.1, 0.1, 5):  # vary weights by ±10%
            if 0 <= initial_weights[i] + delta <= 1:
                altered_weights[i] = initial_weights[i] + delta
                # Ensure the weights sum to 1
                altered_weights /= np.sum(altered_weights)
                scores = topsis(raw_data, altered_weights, benefit_categories)
                ranking = rankdata(-scores)
                # Store the result using base_ranking as reference
                sensitivity_key = (i, delta)
                sensitivities[sensitivity_key] = pd.Series(ranking, index=alternatives)

    # Convert sensitivity results to DataFrame and align columns with initial ranking
    sensitivity_df = pd.DataFrame(sensitivities).T
    sensitivity_df.columns = alternatives  # Ensure correct column names for alternatives
    sensitivity_df.index.names = ['Criterion', 'Delta']
    sensitivity_df = sensitivity_df[ranking_df.sort_values("Initial Rank").index]

    return sensitivity_df

# Perform sensitivity analysis
sensitivity_df = sensitivity_analysis(raw_data, initial_weights, benefit_categories, alternatives)

print("Sensitivity Analysis:")
display(sensitivity_df)

Sensitivity Analysis:


Unnamed: 0_level_0,Unnamed: 1_level_0,Google Cloud Functions,Microsoft Azure Functions,AWS Lambda,Oracle Cloud Functions
Criterion,Delta,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,-0.1,1.0,2.0,4.0,3.0
0,-0.05,1.0,2.0,4.0,3.0
0,0.0,1.0,2.0,3.0,4.0
0,0.05,1.0,2.0,3.0,4.0
0,0.1,1.0,2.0,3.0,4.0
1,-0.1,1.0,2.0,3.0,4.0
1,-0.05,1.0,2.0,3.0,4.0
1,0.0,1.0,2.0,3.0,4.0
1,0.05,1.0,2.0,3.0,4.0
1,0.1,1.0,2.0,3.0,4.0


**Bootstrapping Analysis**

Bootstrapping analysis is employed to evaluate the variability and stability of TOPSIS rankings by generating multiple resamples of the decision matrix and recalculating the TOPSIS scores for each resample. This approach helps in understanding the distribution of rankings and assessing the robustness of the decision outcomes.

In [None]:
# Bootstrapping Analysis: Generating bootstrap samples and calculating TOPSIS scores
def bootstrap_analysis(raw_data, initial_weights, benefit_categories, num_samples=1000):
    m, n = raw_data.shape
    bootstrap_scores = np.zeros((num_samples, m))

    for i in range(num_samples):
        bootstrap_sample_indices = np.random.choice(m, m, replace=True)
        bootstrap_sample = raw_data[bootstrap_sample_indices]
        bootstrap_scores[i] = topsis(bootstrap_sample, initial_weights, benefit_categories)

    return bootstrap_scores

bootstrap_scores = bootstrap_analysis(raw_data, initial_weights, benefit_categories)

# Analyzing the bootstrap results
bootstrap_ranks = np.array([rankdata(-scores) for scores in bootstrap_scores])
bootstrap_mean_ranks = np.mean(bootstrap_ranks, axis=0)
bootstrap_rank_intervals = np.percentile(bootstrap_ranks, [2.5, 97.5], axis=0)

# Display bootstrap analysis results
bootstrap_df = pd.DataFrame({
    "TOPSIS Score": topsis(raw_data, initial_weights, benefit_categories),
    "Initial Rank": initial_ranks,
    "Mean Rank": bootstrap_mean_ranks,
    "2.5% Rank": bootstrap_rank_intervals[0],
    "97.5% Rank": bootstrap_rank_intervals[1]
}, index=alternatives).sort_values(by="Initial Rank")

print("Bootstrap Analysis Results (Descending Order):")
display(bootstrap_df)

Bootstrap Analysis Results (Descending Order):


Unnamed: 0,TOPSIS Score,Initial Rank,Mean Rank,2.5% Rank,97.5% Rank
Google Cloud Functions,0.695309,1.0,2.4915,1.0,4.0
Microsoft Azure Functions,0.480597,2.0,2.504,1.0,4.0
AWS Lambda,0.404026,3.0,2.484,1.0,4.0
Oracle Cloud Functions,0.373143,4.0,2.5205,1.0,4.0


**Non-Parametric Tests**

Non-parametric tests are utilized to evaluate the statistical significance of the differences in rankings obtained from the bootstrapping analysis. These tests do not assume a specific distribution for the data and are particularly useful for analyzing ordinal rankings.

In [None]:
# Non-parametric Tests: Friedman Test
def friedman_test(bootstrap_ranks):
    # Perform the Friedman test
    stat, p = friedmanchisquare(*bootstrap_ranks.T)
    return stat, p

# Perform the Friedman test
stat, p = friedman_test(bootstrap_ranks)
print(f"Friedman Test Statistic: {stat}, p-value: {p}")

# Adding Friedman Test p-value to summary table
bootstrap_df["Friedman Test p-value"] = p
print("Final Summary Table with Friedman Test p-value (Descending Order):")
display(bootstrap_df)

Friedman Test Statistic: 0.5583008763379088, p-value: 0.9059084596267788
Final Summary Table with Friedman Test p-value (Descending Order):


Unnamed: 0,TOPSIS Score,Initial Rank,Mean Rank,2.5% Rank,97.5% Rank,Friedman Test p-value
Google Cloud Functions,0.695309,1.0,2.4915,1.0,4.0,0.905908
Microsoft Azure Functions,0.480597,2.0,2.504,1.0,4.0,0.905908
AWS Lambda,0.404026,3.0,2.484,1.0,4.0,0.905908
Oracle Cloud Functions,0.373143,4.0,2.5205,1.0,4.0,0.905908
