## Mapping ecoinvent_database (EIDB)

- combining activity name with reference product, and Geo_location, more accurate, more time to run

In [1]:
# dataframe tools
import pandas as pd
import numpy as np
from tqdm import tqdm

# metrics functions
from sklearn.metrics import mean_absolute_percentage_error as mape
from sklearn.metrics import r2_score

# custom package
from caml import config
from caml.similarity import MLModel

# interactive input tools
import ipywidgets as widgets
from ipywidgets import VBox

In [2]:
import sys
sys.path.append('../Module')  #a level up & then down to Module folder
from lci_ml_mod import *

In [3]:
# if you have specific version eidb_overview spreadsheet saved to local drive, 
# eidb_df = pd.read_excel("EIDB_38.xlsx", sheet_name =  "Cut-Off AO")

# or download from ecoinvent directly, latest version as of Apr16 2023:
url = 'https://ecoinvent.org/wp-content/uploads/2022/12/Database-Overview-for-ecoinvent-v3.9.1.xlsx'
eidb_df = pd.read_excel(url, sheet_name =  "Cut-Off AO")

In [4]:
eidb_list = np.unique(eidb_df["Reference Product Name"].values)
eidb_act_list = np.unique(eidb_df["Activity Name"].values)

print("Total N of database is %d, unique Reference products is %d, and unique activity is %d"  % ( len(eidb_df), 
                        len(eidb_list), len(eidb_act_list)))

Total N of database is 21238, unique Reference products is 3550, and unique activity is 8278


extract all unique activity names (N=8278), and its corresponding reference product (RP) (N=3550, becaz one product can be produced differently in diff. location), the final_list combines RP / Activity_name/ Location, same length as raw edib 21238, will be more accurate, but takes longer to mapping 

In [5]:
eidb_df["Geography"] = eidb_df["Geography"].astype(str)
final_list = eidb_df[["Reference Product Name", "Activity Name", "Geography"]].apply("/".join, axis=1)

In [6]:
final_list = final_list.values

#### Using this final_list as the reference list to be mapped, taking longer (e.g. several minutes) since it contains all data entries (N= 21238 for EIDB v3.9 cutoff)

In [7]:
product_list = [
    "renewable electricity, hydro, CA-BC",
    "electricity, at consumer, low-voltage, Shanghai",
    "electric battery car",
]

In [8]:
model = MLModel(config.model_name)
cosine_scores = model.compute_similarity_scores(product_list, final_list)

### Only see LCI mapped with highest score

In [9]:
map_single_lci(cosine_scores = cosine_scores, product_list = product_list, mapdb_list = final_list)

Unnamed: 0,your_product,LCI_mapped,cosine_score
0,"renewable electricity, hydro, CA-BC","electricity, high voltage/electricity production, hydro, run-of-river/CA-BC",0.777
1,"electricity, at consumer, low-voltage, Shanghai","electricity, low voltage/market for electricity, low voltage/HK",0.675
2,electric battery car,"electric motor, vehicle/market for electric motor, vehicle/GLO",0.647


### If you wanna see first N [up to 20, define in below n=] closest mapped LCI

you see below when define only Shanghai, ML output is not precise, you can try enter more detailed information, e.g., Shanghai, eastern China, as detailed as possible for ML

In [10]:
map_multiple_lci(cosine_scores = cosine_scores, n=5, product_list = product_list, mapdb_list = final_list)

Unnamed: 0_level_0,Unnamed: 1_level_0,LCI_mapped,ML_score
your_product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"renewable electricity, hydro, CA-BC",1,"electricity, high voltage/electricity production, hydro, run-of-river/CA-BC",0.777025
"renewable electricity, hydro, CA-BC",2,"electricity, high voltage/electricity production, hydro, reservoir, alpine region/CA-BC",0.758516
"renewable electricity, hydro, CA-BC",3,"electricity, high voltage/electricity production, hydro, pumped storage/CA-BC",0.751335
"renewable electricity, hydro, CA-BC",4,"electricity, high voltage/electricity production, wind, <1MW turbine, onshore/CA-BC",0.738592
"renewable electricity, hydro, CA-BC",5,"electricity, high voltage/electricity production, wind, 1-3MW turbine, onshore/CA-BC",0.724321
"electricity, at consumer, low-voltage, Shanghai",1,"electricity, low voltage/market for electricity, low voltage/HK",0.675108
"electricity, at consumer, low-voltage, Shanghai",2,"electricity, medium voltage/market for electricity, medium voltage/HK",0.668968
"electricity, at consumer, low-voltage, Shanghai",3,"electricity, low voltage/market for electricity, low voltage/SG",0.665731
"electricity, at consumer, low-voltage, Shanghai",4,"electricity, medium voltage/market for electricity, medium voltage/SG",0.649567
"electricity, at consumer, low-voltage, Shanghai",5,"electricity, low voltage/market for electricity, low voltage/QA",0.645833
