## Mapping US Federal Commons  LCI database 

LCA Commons data repository:
- https://www.lcacommons.gov/lca-collaboration/

Within the repo, some are for LCIA (e.g.,./ReCiPe, ./TRACI), or EF, we are interested in unit process LCI, and in the notebook, three LCI database was provided for example:


- University of Washington Design for Environment Laboratory/Field Crop Production  - <b> 'UW_DfE_crop'</b>
- National Renewable Energy Laboratory/USLCI_2023_Q1_v1   - <b>'USLCI'</b>
- Federal Highway Administration/MTU Asphalt Pavement Framework    - <b>'Hwy_pavement' </b>

You can choose to download your own interested database, e.g., for animal product production, you may interested in University of Arkansas/... 

Please also note that the total number of datesets shown below each database included all EFs, actors, and other data, and we are only intersted in unit process LCI dataset, e.g., among 7314 data sets for USLCI, 642 are for LCI processes (FY23_Q1version).

For USLCI, there is uslci-admin GitHub repo containing all version history with downloadable link (as openLCA .zolca or JSON-LD): https://github.com/uslci-admin/uslci-content/blob/dev/docs/release_info/release-downloads.md
   - most updated version (JSON-LD) as of April2023: https://github.com/uslci-admin/uslci-content/blob/dev/downloads/uslci_fy23_q1_01_olca1_10_3_json_ld.zip 

In [1]:
# dataframe tools
import pandas as pd
import numpy as np
from tqdm import tqdm

# metrics functions
from sklearn.metrics import mean_absolute_percentage_error as mape
from sklearn.metrics import r2_score

# custom package
from caml import config
from caml.similarity import MLModel

# interactive input tools
import ipywidgets as widgets
from ipywidgets import VBox

# for readin zipped data (JSON-LD)
import glob
import json
import zipfile
import os

In [2]:
import sys
sys.path.append('../Module')  #a level up & then down to Module folder
from lci_ml_mod import *

### now select your mapping database, three  available now: 
- 'UW_DfE_crop'  
- 'Hwy_pavement'
- 'USLCI'

Sector-specific database such as "Hwy_pavement" , "UW_DfE_crop" will result in a more accurate mapped LCI if you are interested in that specific sector, so put in relevant product list when choose such sector-specific database

In [3]:
USCommon = "Hwy_pavement"   #UW_DfE_crop    USLCI

In [4]:
#create a temperary folder to store unzipped files, then we only interested in the LCI unit process (processes folder)
if USCommon == "USLCI":
    zipfile.ZipFile('data/uslci_fy23_q1_01_olca1_10_3_json_ld.zip').extractall('data/temp_zip/USLCI')
    file_location = os.path.join('data/temp_zip/USLCI',  "processes", '*.json')
elif USCommon == "Hwy_pavement":
    zipfile.ZipFile('data/Federal_Highway_Administration-mtu_pavement.zip').extractall('data/temp_zip/pavement')
    file_location = os.path.join('data/temp_zip/pavement',  "processes", '*.json')
elif USCommon == "UW_DfE_crop":
    zipfile.ZipFile('data/U_Washington_Design_for_Environment_Laboratory-Field_crop_production.zip').extractall('data/temp_zip/UWDfE')
    file_location = os.path.join('data/temp_zip/UWDfE',  "processes", '*.json')

In [5]:
us_list = []
i = 0
for f in glob.glob(file_location): 
    with open(f) as jsonfile:
        df = json.load(jsonfile)
        us_list.append(df['name'])
        i += 1
print("%s has total %d LCI unit processes" %  (USCommon, i))
us_list = np.array(us_list)

Hwy_pavement has total 1298 LCI unit processes


In [6]:
#since I've read in the data, will delete the temp. created folder
import shutil
shutil.rmtree("data/temp_zip/")

### Enter your product name list and mapping with the selected database
- as shown in above code uslci_list.append(df['name']), only dataset name extracted and mapped against your product, freely add in more constrains, e.g., product category code, geo_location etc. 


In [7]:
product_list = [
    "oil sand, produced in Canada Alberta",
    "asphalt"
]

In [8]:
model = MLModel(config.model_name)
cosine_scores = model.compute_similarity_scores(product_list, us_list)
#check cos_score: cosine_scores.sort(dim=1, descending=True)[1]

### Only see LCI mapped with highest cosine_score

In [9]:
map_single_lci(cosine_scores = cosine_scores, product_list = product_list, mapdb_list = us_list)

Unnamed: 0,your_product,LCI_mapped,cosine_score
0,"oil sand, produced in Canada Alberta","Crude oil, extracted",0.69
1,asphalt,"Asphalt mix 2 - 15% RAP, 3% RAS virgin liquid asphalt binder",0.682


### If you wanna see first N [up to 20, define in below n=] closest mapped LCI

In [10]:
map_multiple_lci(cosine_scores =cosine_scores, n=5, product_list = product_list, mapdb_list = us_list)

Unnamed: 0_level_0,Unnamed: 1_level_0,LCI_mapped,ML_score
your_product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"oil sand, produced in Canada Alberta",1,"Crude oil, extracted",0.690039
"oil sand, produced in Canada Alberta",2,"Crude oil, at production",0.681982
"oil sand, produced in Canada Alberta",3,"Petroleum refined, for material use, at plant",0.59365
"oil sand, produced in Canada Alberta",4,"Crude palm kernel oil, at plant",0.592934
"oil sand, produced in Canada Alberta",5,"Petroleum refined, for energy use, at plant",0.588461
asphalt,1,"Asphalt mix 2 - 15% RAP, 3% RAS virgin liquid asphalt binder",0.68225
asphalt,2,Portable asphalt mix,0.66485
asphalt,3,"Asphalt binder, no additives, consumption mix, at terminal, from crude oil",0.648857
asphalt,4,Asphalt mix 1 - virgin mix,0.645278
asphalt,5,"Asphalt mix 2 - 15% RAP, 3% RAS liquid asphalt binder with SBS",0.638938
