# Global sensitivity Analysis for PULPO

In this notebook we show the workflow for the global sensitivity analysis (GSA) in PULPO.

In [None]:
%load_ext autoreload
%autoreload 2
import sys
import pandas as pd
import os
import sys
sys.path.append('../')
from pulpo import pulpo

## 1. Defining the case

The case for which the sensitivity analysis will be performed for is on a solution of the LP:

$$
    \begin{align}
        & \underset{s, slack}{\text{min}}  && z_h \\
        & \text{s.t.}   && \sum_{j}(a_{i,j}\cdot s_j) = f_i && \forall i \\
        &               && s_j^{low} \leq s_j \leq s_j^{high} && \forall j \\
        &               && z_h = \sum_e \sum_j (q_{h,e}\cdot b_{e,j} \cdot s_j) && \forall h \\
    \end{align}
$$


### 1.1. Creating the problem 


If the rise husk database has not been installed yet

In [2]:
# pulpo.install_rice_husk_db()

Set the parameters for the rise husk example to instancialize PULPO

In [None]:
project = "rice_husk_example" 
database = "rice_husk_example_db"
method = "('my project', 'climate change')"
notebook_dir = os.path.dirname(os.getcwd())
directory = os.path.join(notebook_dir, 'develop_tests/data')

Create a **PulpoOptimizer** instance. This class is used to interact with the LCI database and solve the optimization problem. It is specified by the project, database, method and directory.

In [None]:
pulpo_worker = pulpo.PulpoOptimizer(project, database, method, directory)

Import LCI data. After initializing the PulpoOptimizer instance, the LCI data is imported from the database.

In [None]:
pulpo_worker.get_lci_data()

Specify the **functional unit**. In this case, the functional unit is 1 Mt of processed rice. PULPO implements a search function (```retrieve_processes```) to find the processes that match the specified reference products (alternatively: keys, process name, region).

In [None]:
rice_factory = pulpo_worker.retrieve_processes(reference_products='Processed rice (in Mt)')

demand = {rice_factory[0]: 1}

Specify the **choices**. Here, the choices are regional 🌐 choices for rise husk collections, and technological ⛏ choices for boiler type selection.

The auxiliar choices are needed to resolve the issue that rice, when not used in the boiler must be burned instead. 

(*At this point, just accept. If you are curious about how this multi-functionality is technically adressed, refer to the paper, or reach out.*)

In [None]:
## Rise husk collection
rice_husk_processes = ["Rice husk collection 1",
              "Rice husk collection 2",
              "Rice husk collection 3",
              "Rice husk collection 4",
              "Rice husk collection 5",]
rice_husk_collections = pulpo_worker.retrieve_processes(processes=rice_husk_processes)

In [None]:
## Boilers
boiler_processes = ["Natural gas boiler",
                     "Wood pellet boiler",
                     "Rice husk boiler"]
boilers = pulpo_worker.retrieve_processes(processes=boiler_processes)

In [None]:
## Auxiliar (Ignore for now!)
auxiliar_processes = ["Rice husk market",
                       "Burning of rice husk"]
auxiliar = pulpo_worker.retrieve_processes(processes=auxiliar_processes)

In [None]:
## Combine to create the choices dictionary
## For each kind of choice, assign a 'label' (e.g. 'boilers')
## To each possible choice, assign a process capacity. In the 'unconstrained' case, set this value very high (e.g. 1e10, but depends on the scale of the functional unit)
choices = {'Rice Husk (Mt)': {rice_husk_collections[0]: 0.03,
                              rice_husk_collections[1]: 0.03,
                              rice_husk_collections[2]: 0.03,
                              rice_husk_collections[3]: 0.03,
                              rice_husk_collections[4]: 0.03},
           'Thermal Energy (TWh)': {boilers[0]: 1e10,
                                    boilers[1]: 1e10,
                                    boilers[2]: 1e10},
           'Auxiliar': {auxiliar[0]: 1e10,
                        auxiliar[1]: 1e10}}

**Instantiate** and **solve** the optimization model

In [None]:
pulpo_worker.instantiate(choices=choices, demand=demand)
results = pulpo_worker.solve()

In [None]:
import pyomo.environ as pyo
from pyomo.repn.plugins.baron_writer import *
import pandas as pd
from pathlib import Path
from IPython.display import display
def extract_results(instance, project, database, choices, constraints, demand, process_map, process_map_metadata, itervention_map, itervention_map_metadata, directory, name):
    """
    Args:
        instance: The Pyomo model instance.
        project (str): Name of the project.
        database (str): Name of the database.
        choices (dict): Choices for the model.
        constraints (dict): Constraints applied during optimization.
        demand (dict): Demand data used in optimization.
        process_map (dict): Mapping of process IDs to descriptions.
        process_map_metadata (dict): Metadata to the process_map
        itervention_map (dict): Mapping of intervention IDs to descriptions.
        itervention_map_metadata (dict): Metadata of the itervention_map.
        directory (str): Directory to save the results file.
        name (str): Name of the results file.
    """
    # Recover dictionary values
    list_of_vars = []
    for v in instance.component_objects(ctype=pyo.Var, active=True, descend_into=True):
        for e in v._data:
            v._data[e] = value(v[e])
        list_of_vars.append(v)

    result_data = {}
    inverse_process_map = dict((v, k) for k, v in process_map.items())
    inverse_itervention_map = dict((v, k) for k, v in itervention_map.items())
    # Raw results
    for v in list_of_vars:
        try:
            if str(v) == 'inv_flows' or str(v) == 'inv_vector':
                data = [(k, inverse_itervention_map[k], itervention_map_metadata[k], v) for k, v in v._data.items()]
            else:
                data = [(k, inverse_process_map[k], process_map_metadata[k], v) for k, v in v._data.items()]
            df = pd.DataFrame(data, columns=['ID', 'Process name', "Process metadata", 'Value'])
        except:
            data = [(k, v) for k, v in v._data.items()]
            df = pd.DataFrame(data, columns=['Key', 'Value'])
        df.sort_values(by=['Value'], inplace=True, ascending=False)
        result_data[v.name] = df

    # Normalize database to a list if it is a string
    if isinstance(database, str):
        database = [database]

    # Store the metadata
    result_data["project and db"] = pd.DataFrame([f"{project}__{db}" for db in database])

    choices_data = {}
    for choice in choices:
        i = 0
        temp_dict = []
        for alt in choices[choice]:
            temp_dict.append((alt, i, instance.scaling_vector[process_map[alt.key]]))
            i+=1
        choices_data[(choice, 'Process')] = {'Process ' + str(i): process_map_metadata[process_map[alt.key]] for alt, i, val in temp_dict}
        choices_data[(choice, 'Capacity')] = {'Process ' + str(i): choices[choice][alt] for alt, i, val in temp_dict}
        choices_data[(choice, 'Value')] = {'Process ' + str(i): x for alt, i, x in temp_dict}
    result_data["choices"] = pd.DataFrame(choices_data)

    result_data["demand"] = pd.DataFrame({"demand":{
        process_map_metadata[process_map[key]] if key in process_map else key: demand[key]
        for key in demand
    }})
    result_data["constraints"] = pd.DataFrame({"Demand": {process_map_metadata[process_map[key]]: constraints[key] for key in constraints}})

    return result_data

def save_results(result_data, file_name):
    with pd.ExcelWriter(f"{directory}/results/{file_name}.xlsx") as writer:
        for sheet_name, dataframe in result_data.items():
            dataframe.to_excel(writer, sheet_name=sheet_name)

**Save** and **summarize** the results

In [None]:
result_data = extract_results(pulpo_worker.instance, pulpo_worker.project, pulpo_worker.database, choices, {}, demand,
                            pulpo_worker.lci_data['process_map'], pulpo_worker.lci_data['process_map_metadata'],
                            pulpo_worker.lci_data['intervention_map'], pulpo_worker.lci_data['intervention_map_metadata'],
                            pulpo_worker.directory, "")
result_data
# pulpo_worker.summarize_results(choices=choices, demand=demand, zeroes=True)

In [None]:
save_results(result_data, "new_test_extract_data")

Extracting the scaling vector from the results $s$

In [None]:
s_solution_df = result_data["scaling_vector"]["Value"]
s_solution_df.index = pd.MultiIndex.from_tuples(result_data["scaling_vector"]["Process name"])

## 2. Preparing the data for the sensitivity analysis

**Reformulating the problem for the sensitivity analysis**

We only consider uncertainty in the $B$ and $Q$ parameter matrizes. The scaling vector is given by the optimal solution.

We will look at the environmental impact objective:

$$
    e(Q, B) =  Q \cdot B \cdot s
$$

### 2.1. Preparing the sampling of the parameter space

#### 2.1.1. Extracting the intervention and characterization matrizes

Extracting the intervention matrix

In [None]:
intervention_matrix_df = pd.DataFrame.sparse.from_spmatrix(
    pulpo_worker.lci_data["intervention_matrix"], 
    index=pulpo_worker.lci_data["intervention_map"],
    columns=pulpo_worker.lci_data["process_map"]
    )

Extracting the characterization matrix

In [None]:
# Creating a fake multi method case for testing Q
key = list(pulpo_worker.lci_data["matrices"].keys())[0]
Q_dict = pulpo_worker.lci_data["matrices"].copy()
Q_dict["copy "+key] = pulpo_worker.lci_data["matrices"][key]
Q_dict

In [None]:
characterization_matrix_dfs = []
for method, characterization_matrix in Q_dict.items():
    method_index = {((method,) + intervention_flow):index for intervention_flow, index in pulpo_worker.lci_data["intervention_map"].items()}
    characterization_matrix_df = pd.DataFrame.sparse.from_spmatrix(
        characterization_matrix,
        index=method_index,
        columns=pulpo_worker.lci_data["intervention_map"]
    )
    characterization_matrix_dfs.append(characterization_matrix_df)
multi_characterization_matrix_df = pd.concat(characterization_matrix_dfs, axis=0)

### 2.2. Filtering out the Biosphereflows $B_{i,j}$ that have a neglectable impact

Calculate $Q\cdot B$

In [None]:
env_cost_matrix_df = multi_characterization_matrix_df.dot(intervention_matrix_df)

Calculate $Q\cdot B \cdot s$ as matrix products, to get the impact of each $B_{i,j}$

In [None]:
impacts_matrix_df = env_cost_matrix_df.mul(s_solution_df, axis=1)
impacts_matrix_df_stacked = impacts_matrix_df.melt(ignore_index=False)
melted_index = pd.MultiIndex.from_frame(
    pd.concat(
        [
            impacts_matrix_df_stacked.index.to_frame(), 
            impacts_matrix_df_stacked[['variable_0','variable_1']]
        ], 
        axis=1
        )
)
impacts_matrix_df_stacked = impacts_matrix_df_stacked.drop(
    columns=['variable_0','variable_1']
    ).set_index(melted_index)

Find the reduced set of intervention flows which result in an absolute $QBs_{i,j}$ value lower then $10^n$, where $n$ is found out iteratively, by seeing how much the total impact changes when removing the intervention flows with $\text{abs}(QBs) < 10^n$. The total impact should not change more then 1\%.

This is done for every impact category.

In [None]:
print("The total impact of the QBs with all biosphere flows is:")
for method, impact_value in impacts_matrix_df_stacked.groupby(level=0).sum().sum(axis=1).items():
    print("{}: {:.6E}".format(method, impact_value))
# droplevel means dropping the method descriptor and unique takes away all duplicate invervention flows since the intervention flows have been joined on the methods
print("With {} biosphere flows non equal to zero".format(impacts_matrix_df_stacked[impacts_matrix_df_stacked != 0].dropna().index.droplevel(0).unique().shape[0]))

In [None]:
n_B = -2

In [None]:
impacts_matrix_red_list = []
for method, impacts_matrix in impacts_matrix_df_stacked.groupby(level=0):
    impacts_matrix_gr0 = impacts_matrix[impacts_matrix != 0].dropna()
    impacts_matrix_log10 = np.log10(impacts_matrix_gr0.abs())
    print("The total impact of the reduced QBs is:")
    impacts_matrix_red = impacts_matrix_gr0[impacts_matrix_log10>n_B].dropna()
    total_QBs_red = impacts_matrix_red.sum().values[0]
    print("{:.6E}".format(total_QBs_red))
    print("This results in a percentage change of the total impact:")
    print("{:.2}%".format(100-total_QBs_red/impacts_matrix.sum().sum()*100))
    print("With {} biosphere flows".format(impacts_matrix_red.shape[0]))
    method
    impacts_matrix_red_list.append(impacts_matrix_red)


Concat the impact matrizes of the different impact categories again, for future strealined computations

In [None]:
multi_impact_matrix_reduced = pd.concat(impacts_matrix_red_list, axis=0)

#### Reduce the intervention matrix

1. Keep only the internvention flows which have a reasonable impact


In [None]:
# Create stacked intervention matrix and retain all levels of multiindex
intervention_matrix_df_stacked = intervention_matrix_df.melt(ignore_index=False)
melted_index = pd.MultiIndex.from_frame(
    pd.concat(
        [
            intervention_matrix_df_stacked.index.to_frame(), 
            intervention_matrix_df_stacked[['variable_0','variable_1']]
        ], 
        axis=1
        )
)
intervention_matrix_df_stacked = intervention_matrix_df_stacked[["value"]].set_index(melted_index)

In [None]:
# Keep only the interventions flows which still are in the reduced impact stacked matrix
interventions_in_reduced_impact_matrix = multi_impact_matrix_reduced.index.droplevel(0).unique() # Drop the method name, now the index is the same as stacked intervention matrix
intervention_matrix_df_stacked_reduced = intervention_matrix_df_stacked.loc[interventions_in_reduced_impact_matrix]

### 2.3. Filtering out the unused characterization matrix elements

Stack the characterization matrix

In [None]:
multi_characterization_matrix_df_stacked = multi_characterization_matrix_df.melt(ignore_index=False)
multi_characterization_matrix_df_stacked = multi_characterization_matrix_df_stacked[["value"]]

1. Keep only the charactarization factors $Q_i$ for which there still are biosphere flows (Many biosphere flows of one kind are dropped)


In [None]:
# Drop the process database and process name, now the index is the same as stacked characterization matrix
characterization_factors_in_reduced_impact_matrix = multi_impact_matrix_reduced.index.droplevel(['variable_0', 'variable_1']).unique()
multi_characterization_matrix_df_stacked_reduced = multi_characterization_matrix_df_stacked.loc[characterization_factors_in_reduced_impact_matrix]

2. Drop all characterization factors equal to zero

In [None]:
multi_characterization_matrix_df_stacked_reduced = multi_characterization_matrix_df_stacked_reduced[multi_characterization_matrix_df_stacked_reduced != 0].dropna()

## 3. Getting the standard deviation of the parameter values

### 3.1. Set the uncertainty concept for the parameter values

In [None]:
# Invented uncertainty factors for testing
uncertainty_factors = {
    "matrizes" : {
        "('my project', 'climate change')": 0.1,
        "copy ('my project', 'climate change')": 0.5
    },
    "intervention_matrix": 0.15
}

### 3.2. Calculte the sigma values of the parameters

In [None]:
sigma_multi_characterization_matrix_df = multi_characterization_matrix_df_stacked_reduced.mul(pd.Series(uncertainty_factors['matrizes']), axis=0, level=0).abs()
sigma_intervention_matrix = intervention_matrix_df_stacked_reduced.mul(uncertainty_factors["intervention_matrix"]).abs()

## 4. Define the global sensitivity problem
### 4.1. Define the bound/interval of the parameters
Defining the bound as plus minus the standard deviation, $\sigma^2$, of the nominal value, $\mu$, for each parameter, $p$:

$[\mu_p - \sigma^2_p; \mu_p + \sigma^2_p]$

In [None]:
lb_intervention_matrix = intervention_matrix_df_stacked_reduced - sigma_intervention_matrix
ub_intervention_matrix = intervention_matrix_df_stacked_reduced + sigma_intervention_matrix
bounds_intervention_matrix = pd.concat([lb_intervention_matrix, ub_intervention_matrix], axis=1)
bound_B = bounds_intervention_matrix

In [None]:
lb_characterization_matrix = multi_characterization_matrix_df_stacked_reduced - sigma_multi_characterization_matrix_df
ub_characterization_matrix = multi_characterization_matrix_df_stacked_reduced + sigma_multi_characterization_matrix_df
bounds_characterization_matrix = pd.concat([lb_characterization_matrix, ub_characterization_matrix], axis=1)
bound_Q = bounds_characterization_matrix

### 4.2. Defining the sampling problems

In [None]:
problem_QB = {
    'num_vars': multi_characterization_matrix_df_stacked_reduced.shape[0] + intervention_matrix_df_stacked_reduced.shape[0],
    'names': multi_characterization_matrix_df_stacked_reduced.index.to_list() + intervention_matrix_df_stacked_reduced.index.to_list(),
    'bounds': bounds_characterization_matrix.values.tolist() + bounds_intervention_matrix.values.tolist()
}
problem_QB

### 4.3. Define the sampling method

Select the sampling method, any sampling method from SALib can be chosen, but for most sensitivity analysis there is a sampling method best suited or even neccessary, in this case we use `saltelli` sampling because it is best compatible with Sobol' sequence.

In [None]:
from SALib.sample import saltelli as sample_method

Choose the amount of samples 

In [None]:
N = 2**7

### 4.4. Define the sensitivity analysis method

Select the sensitivity analysis method, remember it is mostly coupled to the sampling method, any sensitivity method from the SALib can be chosen. For this case study we chose Sobol' method

In [None]:
from SALib.analyze import sobol as SA_method

## 5. Perform the sensitivity analysis

$$
    e(Q, B) =  Q \cdot B \cdot s
$$

### 5.1. Sampling the $Q$ and $B$ arrays

In [None]:
sample_data_QB = sample_method.sample(problem_QB, N)

### 5.2. Solving for $Q \cdot B \cdot s$

extract matrizes from sample data

In [None]:
sample_Q = pd.DataFrame(sample_data_QB[:,:bound_Q.shape[0]], columns=bound_Q.index)
sample_B = pd.DataFrame(sample_data_QB[:,bound_Q.shape[0]:], columns=bound_B.index)

Manipulate the columns header and names to be able to perform matrix operations on the complete sample at once

In [None]:
# Rename the invervention matrix columns headers
sample_B_cols = sample_B.columns.to_frame().reset_index(drop=True).rename(columns={0:"intervention_db", 1:"intervention_flow"})
sample_B.columns = pd.MultiIndex.from_frame(sample_B_cols)

In [None]:
# Rename the characterization matrix columns headers
sample_Q_cols = sample_Q.columns.to_frame().reset_index(drop=True).rename(columns={0:"method", 1:"intervention_db", 2:"intervention_flow"})
sample_Q.columns = pd.MultiIndex.from_frame(sample_Q_cols)

In [None]:
# Merge the characterizationa and intervention matrix columns
merged_cols_B_left = pd.merge(left=sample_B_cols, right=sample_Q_cols, on = ["intervention_db","intervention_flow"], how='inner')
merged_cols_B_right = pd.merge(left=sample_Q_cols, right=sample_B_cols, on = ["intervention_db","intervention_flow"], how='inner')

Expand the intervention matrix and the characterization matrix so we can perform dot products and get the impacts per biosphere flow and impact category

In [None]:
sample_Q_expanded = sample_Q.reindex(columns=pd.MultiIndex.from_frame(merged_cols_B_right))

In [None]:
sample_B_expanded = sample_B.reindex(columns=pd.MultiIndex.from_frame(merged_cols_B_left))
sample_B_expanded.columns = sample_B_expanded.columns.reorder_levels(sample_Q_expanded.columns.names)

Compute the environmental cost matrix 

In [None]:
multi_QB = sample_Q_expanded * sample_B_expanded

Compute the environmental impact and group the the impacts based on their category for later analysis

In [None]:
QBs = {}
multi_QB.columns = multi_QB.columns.reorder_levels(['variable_0', 'variable_1','method', 'intervention_db', 'intervention_flow'])
for method, QB in multi_QB.groupby(level='method', axis=1):
    s_for_QBs = s_solution_df.reindex(index=QB.columns)
    QBs[method] = QB * s_for_QBs

### 5.3. Calculate the total output variance

Since we have multiple impact categories in the sampled data and we can only perform the sensitivity analysis per category, we have to specify the category from the data

In [None]:
selected_method = "('my project', 'climate change')"

In [None]:
e_QBs = QBs[selected_method].sum(axis=1)

In [None]:
e_QBs.var()

In [None]:
print("mu-sigma: {:.2E} \t sigma: {:.2E} \t mu+sigma: {:.2E}".format(e_QBs.mean()-e_QBs.std(), e_QBs.mean(), e_QBs.mean()+e_QBs.std()))

#### 5.3.1.  Show the z-value and the distribution of the output

In [None]:
e_QBs.plot.hist(bins=10)
e_QBs.shape

The z-value of the total environmental impact

In [None]:
e_QBs.std()/e_QBs.mean()

### 5.4. Calculate Sobol index

In [None]:
Si_QBs = SA_method.analyze(problem_QB, e_QBs.values)

In [None]:
total_Si_QBs, first_Si_QBs, second_Si_QBs = Si_QBs.to_df()

Since we are currently performing one big sampling sampling including all impact categories, they will appear in the explained variance list, when we look at the impact of each impact category. They should always have zero or close to zero  sensitivity indices

In [None]:
total_Si_QBs

#### 5.4.1. Calculate total explained variance

In [None]:
print("The total explained variance is \n{:.4}%".format(total_Si_QBs["ST"].sum()*100))

#### 5.5. Plot the contribution to variance

Generate the data and the names for the contribution plot

In [None]:
def set_size(width, height, fraction=1):
    """ Set aesthetic figure dimensions to avoid scaling in latex.
 
    Parameters
    ----------
    width: float
            Width in pts
    fraction: float
            Fraction of the width which you wish the figure to occupy
 
    Returns
    -------
    fig_dim: tuple
            Dimensions of figure in inches
    """
    # Width of figure
    fig_width_pt = width * fraction    
 
    # Convert from pt to inches
    inches_per_pt = 1 / 72.27
 
    # Golden ratio to set aesthetic figure height
    golden_ratio = (5**.5 - 1) / 2
 
    # Figure width in inches
    fig_width_in = fig_width_pt * inches_per_pt
    # Figure height in inches
    if height: #if height is specified
        fig_height_pt = height * fraction
        fig_height_in = fig_height_pt * inches_per_pt
    else:
        fig_height_in = fig_width_in * golden_ratio
 
    fig_dim = (fig_width_in, fig_height_in)
 
    return fig_dim

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl
import textwrap

def plot_SA_barplot(data:pd.DataFrame, metadata:pd.DataFrame, colormap:pd.Series=pd.Series([]), bbox_to_anchor_lower:float = -0.6, bbox_to_anchor_center:float=0.5):
    """
        Barplot of the contributional variance of the parameters in an objective

        args:
            data:       dataframe with columns: "ST" and "ST_conf"
            metadata:   metadataframe with "bar_names" column and same indices as data
            colormap:   Series with color codes to each data index     `colormap = pd.Series(mpl.cm.tab20.colors[:data.shape[0]], index=data.index)`
            bbox_to_anchor_lower: negative float, scaled how much the legend is under the plot
    """
    # width = 180
    # height = 180
    width = 4.77*72.4#600
    height = None
    _, ax = plt.subplots(1, 1, figsize=set_size(width,height))

    # Data
    data = data.sort_values(["ST"], ascending=False)
    heights = data["ST"].values * 100
    yerrs = data["ST_conf"].values * 100
    bars = [textwrap.fill(string, 50) for string in metadata["bar_names"].reindex(data.index)]
    y_pos = range(len(bars))
    
    for height, y_po, yerr, indx in zip(heights, y_pos, yerrs, data.index):
        ax.bar(y_po, height, yerr=yerr, capsize=5, ecolor="gray", color=colormap[indx], alpha=0.9)
    ax.set_xticks([])
    if (data["ST"]<=1).all() and (data["ST"]>=0).all():
        ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter())
        ax.yaxis.set_major_locator(mpl.ticker.MultipleLocator(10))
        # For the minor ticks, use no labels; default NullFormatter.
        ax.yaxis.set_minor_locator(mpl.ticker.MultipleLocator(5))
    ax.legend(bars, loc='lower center', bbox_to_anchor=(bbox_to_anchor_center, bbox_to_anchor_lower), borderpad=1)
    ax.set_axisbelow(True)
    ax.yaxis.grid(color='gray', linestyle='dotted')
    return ax

In [None]:
metadata_total_Si_QBs = total_Si_QBs.index.to_frame(name='bar_names')['bar_names'].agg(' - '.join).to_frame()

In [None]:
colormap = pd.Series(mpl.cm.tab20.colors[:total_Si_QBs.shape[0]], index=total_Si_QBs.index)
plot_SA_barplot(data=total_Si_QBs, metadata=metadata_total_Si_QBs, colormap=colormap, bbox_to_anchor_center=1.7, bbox_to_anchor_lower=-.6)

### 5.6. Plot the main contributing variables to the total environmental impact

In [None]:
QBs_per_s_sample = QBs[selected_method]

Generate the data

In [None]:
QBs_tops_indcs = QBs_per_s_sample.mean().abs().sort_values(ascending=False).iloc[:5].index
data_QBs = pd.DataFrame([])
QBs_per_s_sample_scaled = QBs_per_s_sample/ e_QBs.mean()#.divide(QBs_per_s_sample.abs().sum(axis=1), axis="index")
data_QBs["ST"] = QBs_per_s_sample_scaled.mean()[QBs_tops_indcs]
data_QBs["ST_conf"] = QBs_per_s_sample_scaled.std()[QBs_tops_indcs]
metadata_QBs = data_QBs.index.to_frame()
metadata_QBs["bar_names"] = ["{} in {} ".format(metadata_QBs["intervention_flow"], metadata_QBs["variable_1"]) for _,meta in metadata_QBs.iterrows()]

Plot the total environmental impact for the top processes

In [None]:
from typing import Optional
# set matplotlib colormap
colormap_mpl = mpl.cm.Set3.colors

def plot_total_env_impact_barplot(data:pd.DataFrame,  metadata:pd.DataFrame, impact_category:str, colormap:pd.Series=pd.Series([]), savefig:Optional[bool]=None, bbox_to_anchor_center:float=0.5):
    """
        Barplot of the contributional variance of the parameters in total cost objective

        args:
            data:       dataframe with columns: "ST" and "ST_conf"
            metadata:   metadataframe with "bar_names" column and same indices as data
            impact_category:    name of environmental impact category
            savefig:    if true saves fig into specified path
    """
    bbox_to_anchor_lower = .7
    bbox_to_anchor_center = .7
    if colormap.empty:
        colormap = pd.Series(colormap_mpl[:data.shape[0]], index=data.index)
    else:
        act_indcs = [index for index in colormap.index if type(index[1]) == int]
        colormap_red = pd.Series(colormap[act_indcs].values, index=[indcs[1] for indcs in act_indcs])
        addtional_incs = data.index[~data.index.isin(colormap_red.index)]
        additional_colormap = pd.Series(
            colormap_mpl[colormap.shape[0]:colormap.shape[0]+addtional_incs.shape[0]], 
            index = addtional_incs
            )
        colormap = pd.concat([colormap_red, additional_colormap])


    ax = plot_SA_barplot(data, metadata, colormap=colormap, bbox_to_anchor_lower=bbox_to_anchor_lower, bbox_to_anchor_center=bbox_to_anchor_center)    
    ax.set_xlabel("Main environmental parameters")
    ax.set_ylabel("Contribution to total {} in [\%]".format(impact_category))

    # Save figure
    if savefig:
        plt.savefig(r"C:\Users\admin\OneDrive - Carbon Minds GmbH\Dokumente\13 Students\MA_Bartolomeus_Löwgren\02_code\03_optimization_framework\04_case_studies\02_plots\total_env_impact_barplot" + ".{}".format(fileformat), format=fileformat, bbox_inches='tight')


In [None]:
colormap = pd.Series(mpl.cm.tab20.colors[:data_QBs.shape[0]], index=data_QBs.index)
metadata_QBs = metadata_QBs[['bar_names']]
plot_total_env_impact_barplot(data_QBs, metadata=metadata_QBs, impact_category=selected_method, savefig=False)

In [None]:
# Create stacked intervention matrix and retain all levels of multiindex
intervention_matrix_df_stacked = intervention_matrix_df.melt(ignore_index=False)
melted_index = pd.MultiIndex.from_frame(
    pd.concat(
        [
            intervention_matrix_df_stacked.index.to_frame(), 
            intervention_matrix_df_stacked[['variable_0','variable_1']]
        ], 
        axis=1
        )
)
intervention_matrix_df_stacked = intervention_matrix_df_stacked[["value"]].set_index(melted_index)