# Performance Investigation of Hybrid Life-Cycle Assessment Matrix Calculations

This notebook is available online in this Zenodo Record: [`doi:10.5281/zenodo.14786979`](https://doi.org/10.5281/zenodo.14786979)

Note that this investigation was originally run in January 2025 run using a virutal environment with the following packages:

```
numpy==2.2.1
scipy==1.15.0
```

In [14]:
# scientific computing
import numpy as np
rng = np.random.default_rng(seed=42)
import pandas as pd
# data storage
import gzip
import pickle
# system libraries
import time

The below code simply implements the governing equation of environmentally extended input-output analysis:

\begin{align}
e &= \mathbf{C} \cdot \mathbf{B} \cdot (\mathbf{I} - \mathbf{A})^{-1} \cdot \vec{f} \\
[1 \times 1] &= [1 \times N] \cdot [N \times N] \cdot [N \times 1]
\end{align}

| Symbol | Dimension | Units | Description |
| ------ | --------- | ----- | ----------- |
| $e$ | $1 \times 1$ | kg(CO₂ eq.) |  environmental impact (scalar) |
| $\mathbf{C}$ | $N \times N$ | AUD |  total requirements matrix |
| $\mathbf{B}$ | $1 \times N$ | kg(CO₂ eq.)/AUD |  environmental satellite account |
| $\mathbf{I}$ | $N \times N$ | None |  identity matrix |
| $\mathbf{A}$ | $N \times N$ | AUD/AUD=None|  technical coefficient matrix |
| $\vec{f}$ | $N \times 1$ | AUD | final demand vector |

Here, $N$ is the number of sectors in the economy.

For further reference, compare [Miller & Blair (2022)](https://doi.org/10.1017/9781108676212), Eq. (2.11) and Section 13.7.1.

## Load Compressed Data from `pylcaio` Package

In [2]:
path = '/Users/michaelweinold/Library/CloudStorage/OneDrive-TheWeinoldFamily/Documents/University/PhD/Data/HLCA Matrices/hybrid_system.pickle'
with gzip.open(path, 'rb') as pickle_file:
    picklefile = pickle.load(file=pickle_file)

  picklefile = pickle.load(file=pickle_file)


## Build Hybrid Matrices

For the definition of the matrices in the output of the [`pylcaio` package](https://github.com/MaximeAgez/pylcaio/tree/master), see [this section of the source code](https://github.com/MaximeAgez/pylcaio/blob/505898a39144ebc53c109e485644e3ea055ae0ae/src/pylcaio.py#L46
). The matrices are defined as follows:

| Symbol | Dimension | Units | Description |
| ------ | --------- | ----- | ----------- |
| $\mathbf{A}_P$ | $M \times M$ | [kg] ("physical") | technosphere matrix |
| $\mathbf{A}_S$ | $N \times N$ | [\$] ("monetary") | technical coefficient matrix |
| $\mathbf{C}_U$ | $N \times N$ | None |  upstream cut-off matrix |
| $\mathbf{B}_P$ | $R \times R$ | XXX | biosphere matrix |
| $\mathbf{B}_S$ | $P \times P$ | XXX | environmental satellite matrix |
| $\mathbf{C}_P$ | $ \times $ | XXX | characterization matrix process system |
| $\mathbf{C}_S$ | $ \times $ | XXX | characterization matrix sectoral system |

The hybrid matrices are build such that:

\begin{align}
\mathbf{A}_H &= \begin{bmatrix}
\mathbf{A}_P & \mathbf{0} \\
\mathbf{C}_U & \mathbf{A}_S
\end{bmatrix} \\
\mathbf{A}_H &= [(M+N) \times (M+N)]
\end{align}

and

\begin{align}
\mathbf{B}_H = \begin{bmatrix}
\mathbf{B}_P & \mathbf{B}_S
\end{bmatrix} \\
\mathbf{B}_H = [(R+P) \times 1]
\end{align}

In [3]:
A_P = picklefile['A_ff'].todense().A
A_S = picklefile['A_io'].todense().A
C_U = picklefile['A_io_f'].todense().A
A_H = np.block(
    [
        [np.eye(A_P.shape[0]) - A_P, np.zeros((A_P.shape[0], A_S.shape[0]))],
        [C_U, np.eye(A_S.shape[0]) - A_S]
    ]
)

B_S = picklefile['F_io'].todense().A
B_P = picklefile['F_f'].todense().A
B_H = np.block(
    [
        [B_P, np.zeros((B_P.shape[0], B_S.shape[1]))],
        [np.zeros((B_S.shape[0], B_P.shape[1])), B_S]
    ]
)

C_P_climate = picklefile['C_f'].todense().A[0,:]
C_S_climate = picklefile['C_io'].todense().A[0,:]
C_H_climate = np.concatenate((C_P_climate, C_S_climate), axis=0).T

## Prepare Random Sample of Ecoinvent Processes

Every Ecoinvent process has an industry classification code according to the International Standard Industrial Classification of All Economic Activities (ISIC). We use the highest-level classification structure of the 21 (A-U) ISIC "sections" to group Ecoinvent processes (see ["Classification Structure"](https://unstats.un.org/unsd/classifications/Family/Detail/27)).

In [4]:
dict_isic_letters_and_numbers = {
    'A': ['01', '02', '03'],
    'B': ['05', '06', '07', '08', '09'],
    'C': [
        '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', 
        '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', 
        '31', '32', '33'
    ],
    'D': ['35'],
    'E': ['36', '37', '38', '39'],
    'F': ['41', '42', '43'],
    'G': ['45', '46', '47'],
    'H': ['49', '50', '51', '52', '53'],
    'I': ['55', '56'],
    'J': ['58', '59', '60', '61', '62', '63'],
    'K': ['64', '65', '66'],
    'L': ['68'],
    'M': ['69', '70', '71', '72', '73', '74', '75'],
    'N': ['77', '78', '79', '80', '81', '82'],
    'O': ['84'],
    'P': ['85'],
    'Q': ['86', '87', '88'],
    'R': ['90', '91', '92', '93'],
    'S': ['94', '95', '96'],
    'T': ['97', '98'],
    'U': ['99']
}
list_process_metadata_isic = [i for i in picklefile['PRO_f']['ISIC'].values()]
list_process_metadata_isic_numbers = [str(string)[:2] for string in list_process_metadata_isic]

In [31]:
def get_sample_process_indices_from_ecoinvent_per_isic_letter(
    dict_isic_letters_and_numbers: dict,
    list_process_metadata_isic_numbers: list,
    number_of_indices_per_isic_letter: int,
) -> dict:
    """
    Gets a random sample of process indices from the ecoinvent database for each ISIC letter (=sections).

    If the number of processes for a given ISIC letter is less than the number of indices to be sampled,
    the function will sample all available processes.

    Parameters
    ----------
    dict_isic_letters_and_numbers : dict
        Dictionary with ISIC letters (=sections) as keys and lists of ISIC numbers (=divisions) as values.
        For example: {'A': ['01', '02', '03'], 'B': ['05', '06', '07', '08', '09'], ...}
    list_process_metadata_isic_numbers : list
        List of ISIC numbers for each process in the ecoinvent database.
        For example: ['01', '01', '02', '02', '02', '03', ...]
    number_of_indices_per_isic_letter : int
        Number of indices to be sampled for each ISIC letter.

    Returns
    -------
    dict
        Dictionary with ISIC letters as keys and lists of sampled process indices as values.
        For example: {'A': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'B': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], ...}
    """
    
    dict_isic_letters_and_indices = {}

    for isic_letter in dict_isic_letters_and_numbers.keys():
        list_of_process_indices = []
        for isic_number in dict_isic_letters_and_numbers[isic_letter]:
            list_of_process_indices += [index for index, element in enumerate(list_process_metadata_isic_numbers) if element == isic_number]
        dict_isic_letters_and_indices[isic_letter] = list_of_process_indices

    dict_return = {}

    for isic_letter, isic_numbers in dict_isic_letters_and_indices.items():
        sample_size = number_of_indices_per_isic_letter
        if dict_isic_letters_and_indices[isic_letter] == []:
            continue
        if len(isic_numbers) < number_of_indices_per_isic_letter:
            sample_size = len(isic_numbers)
        dict_return[isic_letter] = rng.choice(
            a=dict_isic_letters_and_indices[isic_letter],
            size=sample_size,
            replace=False
        ).tolist()
        
    return dict_return

dict_sample_process_indices = get_sample_process_indices_from_ecoinvent_per_isic_letter(
    dict_isic_letters_and_numbers=dict_isic_letters_and_numbers,
    list_process_metadata_isic_numbers=list_process_metadata_isic_numbers,
    number_of_indices_per_isic_letter=10
)

In [None]:
def generate_final_demand_vector(
    number_of_sectors: int,
    sector_index: int,
    demand_amount: float
) -> np.ndarray:
    """


    _extended_summary_

    Parameters
    ----------
    number_of_sectors : int
        _description_
    sector_index : int
        _description_
    demand_amount : float
        _description_

    Returns
    -------
    np.ndarray
        _description_
    """
    f_vector = np.zeros(number_of_sectors)
    f_vector[sector_index] = demand_amount
    return f_vector


def compute_environmental_burden(
    A_H: np.ndarray,
    B_H: np.ndarray,
    C_H_climate: np.ndarray,
    sector_index: int,
) -> tuple[float, float]:
    """_summary_

    _extended_summary_

    Parameters
    ----------
    A_H : np.ndarray
        _description_
    B_H : np.ndarray
        _description_
    C_H_climate : np.ndarray
        _description_
    sector_index : int
        _description_

    Returns
    -------
    tuple[float, float]
        _description_
    """
    f_vector_H = generate_final_demand_vector(
        number_of_sectors=A_H.shape[0],
        sector_index=sector_index,
        demand_amount=1
    )
    start = time.time()
    vec_intermediate_demand = np.linalg.solve(A_H, f_vector_H)
    vec_environmental_flows = np.dot(B_H, vec_intermediate_demand)
    scal_environmental_burden = np.dot(C_H_climate, vec_environmental_flows)
    end = time.time()
    computation_time = end - start
    return scal_environmental_burden, computation_time

Before starting calculations, ensure that your local NumPy is built against a fast [BLAS library](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (e.g., Intel MKL, OpenBLAS, or Apple Accelerate).

Note that on a 2021 MacBook Pro (M1 Max CPU) with NumPy v2.2.1 [built against Apple Accelerate](https://numpy.org/doc/2.0/release/1.21.0-notes.html#enable-accelerate-framework), one computation takes approximately 130-190 seconds.