# Performance Investigation of Hybrid Life-Cycle Assessment Path Calculations

This notebook is available online in this Zenodo Record: [`doi:10.5281/zenodo.14786979`](https://doi.org/10.5281/zenodo.14786979)

In [18]:
# scientific computing
import pandas as pd
# structural path analysis
import pyspa
# system
import time

## Load Sectoral Data

In [19]:
df_infosheet = pd.read_csv(
    filepath_or_buffer='https://raw.githubusercontent.com/hybridlca/pyspa/refs/heads/master/Infosheet_template.csv',
    header=0,    
)
df_sectors: pd.DataFrame = df_infosheet[['Sector number', 'Name']]
df_sectors

Unnamed: 0,Sector number,Name
0,1,"Sheep, Grains, Beef and Dairy Cattle"
1,2,Poultry and Other Livestock
2,3,Other Agriculture
3,4,Aquaculture
4,5,Forestry and Logging
...,...,...
109,110,Gambling
110,111,Automotive Repair and Maintenance
111,112,Other Repair and Maintenance
112,113,Personal Services


## Calculations of Paths

In [20]:
def run_structural_path_analysis(
    sector_id: int,
) -> pd.DataFrame:
    """
    Run the structural path analysis for a given sector and list of cutoff values.

    _extended_summary_

    Parameters
    ----------
    sector_id : int
        Index of the sector to be analyzed.
    
    Returns
    -------
    pd.DataFrame
        DataFrame with the results of the structural path analysis.
        Of the form:

        | Cutoff | Computation time | SPA coverage |
        |--------|------------------|--------------|
        | 0.01   | 1.2              | 0.51         |
        | 0.001  | 13.4             | 0.65         |
        | ...    | ...              | ...          |

    """
    list_comp_time = []
    list_spa_coverage = []
    for cutoff in [0.1, 0.01, 0.001, 0.0001, 0.00001]:
        start_time = time.time()
        sc = pyspa.get_spa(
            target_ID=sector_id,
            max_stage=20,
            a_matrix='https://raw.githubusercontent.com/hybridlca/pyspa/refs/heads/master/A_matrix_template.csv',
            infosheet='https://raw.githubusercontent.com/hybridlca/pyspa/refs/heads/master/Infosheet_template.csv',
            thresholds={'GHG_emissions': cutoff},
            thresholds_as_percentages=True,
            zero_indexing=True,
        )
        end_time = time.time()
        list_comp_time.append(end_time - start_time)
        list_spa_coverage.append(sc.get_coverage_of('GHG_emissions'))
    df_results = pd.DataFrame(
        data={
            'Cutoff': list_cutoff,
            'Computation time': list_comp_time,
            'SPA coverage': list_spa_coverage,
        }
    )
    return df_results

Before starting calculations, ensure that your local NumPy is built against a fast [BLAS library](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (e.g., Intel MKL, OpenBLAS, or Apple Accelerate). Note that on a 2021 MacBook Pro (M1 Max CPU) with NumPy v2.2.1 [built against Apple Accelerate](https://numpy.org/doc/2.0/release/1.21.0-notes.html#enable-accelerate-framework), the analysis of paths for a single sector may take multiple hours.

In [None]:
%%capture
import multiprocessing as mp
with mp.get_context("fork").Pool() as pool:  # "fork" context works better in Jupyter
    list_results_dataframes = pool.map(run_structural_path_analysis, df_sectors.index)