Creating EcoInvent Matrix from .spold files<br>
See: https://v34.ecoquery.ecoinvent.org/File/Files
<br>
<br>
At bottom in supporting documents there are instructions for matrix building from spold<br>
First download and unzip relevant system files - you may need a .7z to zip file converter

In [2]:
import os
import xml.etree.ElementTree as ET
import numpy as np
import scipy as sp
import pandas as pd
from scipy import sparse
from scipy.sparse import coo_matrix
from scipy.sparse.linalg import inv
from numpy.linalg import inv

First navigate to the appropriate folder - should be something like ~/Documents/D/Ecoinvent/ecoinvent-3.4_cutoff_ecoSpold02/

In [8]:
os.chdir("datasets")
dir = os.getcwd()

Change file extensions from .spold to .xml

In [10]:
for f in os.listdir():
    if f.endswith('.spold'):
        base = os.path.splitext(f)[0]
        target = os.path.join(dir, f)
        f = os.rename(target, base + '.xml')

Parse each file using ElementTree, extract key attributes as described in step 1 in the Ecoinvent documentation and put them in a dataframe. This is the index for the technosphere exchanges.

In [11]:
colnames = ['act_id', 'ref_id', 'act_name', 'ref_name', 'geog']
idx = pd.DataFrame(data=[], index=list(range(0,len(os.listdir()))), columns=colnames)
for i in range(0, len(os.listdir())):
    fname = os.listdir()[i]
    base = os.path.splitext(fname)[0]
    actid = base.split("_")[0]
    idx.loc[i]['act_id'] = actid
    refid = base.split("_")[1]
    idx.loc[i]['ref_id'] = refid
    tree = ET.parse(os.listdir()[i])
    root = tree.getroot()
    for activityName in root.iter('{http://www.EcoInvent.org/EcoSpold02}activityName'):
        idx.loc[i]['act_name'] = activityName.text   
    for geog in root.findall('.//{http://www.EcoInvent.org/EcoSpold02}geography/{http://www.EcoInvent.org/EcoSpold02}shortname'):
        idx.loc[i]['geog'] = geog.text
    for rf in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}intermediateExchange[@intermediateExchangeId='%s']/{http://www.EcoInvent.org/EcoSpold02}name" % refid):
        idx.loc[i]['ref_name'] = rf.text
    del tree
    del root
    del actid
    del refid
    del fname

Check to make sure it looks right:

In [12]:
idx.head(5)

Unnamed: 0,act_id,ref_id,act_name,ref_name,geog
0,27faf6b1-d2c2-4a4d-8db4-f9cb31b71087,4fcfb407-7879-42ad-9582-ab3fbfe5af10,"market for electricity, medium voltage, alumin...","electricity, medium voltage, aluminium industry","IAI Area, North America, without Quebec"
1,5f376bb5-c17c-4d1c-9a68-11a19045e5b5,637ee275-a239-4dcb-b084-abfa110dd65b,"treatment of waste newspaper, municipal incine...","electricity, for reuse in municipal waste inci...",RoW
2,aa415f54-2081-490d-bff1-52e6cb2d0b42,45b641f7-e903-4fa3-94ec-84ca4c567c32,heat and power co-generation unit construction...,"heat and power co-generation unit, organic Ran...",GLO
3,55819cd5-11d0-439f-be25-b3450b8b4be3,d1a3ebf3-e2d5-4da1-8090-664c9160aa33,market for 4-methyl-2-pentanone,4-methyl-2-pentanone,GLO
4,05988f8b-52ea-45ee-adc4-b9fc2cffda1d,f1d341ae-3435-4f11-b13c-793634d849a4,"miscanthus rhizome production, for planting","miscanthus, chopped",RoW


Now navigate to Elementary Exchanges file to build an index for the elemental exchanges. This time there is only one file.

In [13]:
os.chdir("..")
os.chdir("MasterData")

In [14]:
tree = ET.parse('ElementaryExchanges.xml')
root = tree.getroot()
colnames = ['elex_id', 'subc_id', 'elex_name', 'comp_name', 'subc_name']
L = len(root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange"))
elemidx = pd.DataFrame(data=[], index=list(range(0,L)), columns=colnames)

In [15]:
x = [None]*L
for i, elex in enumerate(root.iter('{http://www.EcoInvent.org/EcoSpold02}elementaryExchange')):
    x[i] = elex.attrib['id']
elemidx['elex_id'] = x
for i, subc in enumerate(root.findall('.//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/{http://www.EcoInvent.org/EcoSpold02}compartment')):
    x[i] = subc.attrib['subcompartmentId']
elemidx['subc_id'] = x
for i, lname in enumerate(root.iter('{http://www.EcoInvent.org/EcoSpold02}name')):
    x[i] = lname.text
elemidx['elex_name'] = x
for i, comp in enumerate(root.findall('.//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/{http://www.EcoInvent.org/EcoSpold02}compartment/{http://www.EcoInvent.org/EcoSpold02}compartment')):
    x[i] = comp.text
elemidx['comp_name'] = x
for i, sname in enumerate(root.iter('{http://www.EcoInvent.org/EcoSpold02}subcompartment')):
    x[i] = sname.text
elemidx['subc_name'] = x

Check to make sure it looks right:

In [16]:
elemidx.head(5)

Unnamed: 0,elex_id,subc_id,elex_name,comp_name,subc_name
0,38a622c6-f086-4763-a952-7c6b3b1c42ba,e8d7772c-55ca-4dd7-b605-fee5ae764578,"1,4-Butanediol",air,urban air close to ground
1,541a823c-0aad-4dc4-9123-d4af4647d942,e8d7772c-55ca-4dd7-b605-fee5ae764578,1-Pentanol,air,urban air close to ground
2,8cbaa905-41b0-4327-8403-bf1c8eb25429,e8d7772c-55ca-4dd7-b605-fee5ae764578,1-Pentene,air,urban air close to ground
3,f681eb3c-854a-4f78-bcfe-76dfbcf9df3c,e1bc9a16-5b6a-494f-98ef-49f461b1a11e,"2,4-D",soil,agricultural
4,a0fec60d-3f74-48bf-a2d2-58c30fc13e53,e8d7772c-55ca-4dd7-b605-fee5ae764578,2-Aminopropanol,air,urban air close to ground


We now have 2 index dataframes, elemidx for the elemental flows and idx for the technosphere inputs and outputs. Now we have to build the actual matrices: The A (technology) matrix, and the B (elementary flow) matrix. First create two matrices of zeros of the appropriate size.

In [94]:
A = np.zeros((len(idx),len(idx)))
B = np.zeros((len(elemidx),len(idx)))

Now we need to parse each of the spold files to find the coefficients, and put them in the right place in the matrix based on their index numbers. First navigate back to the datasets folder.

In [18]:
os.chdir("..")
os.chdir("datasets")

Next, open each file, parse the tree, identify the correct column in the matrix, find the coefficients and put them in. Off diagonal coefficients in the A matrix are multiplied by -1 as described in the EcoInvent documentation. This code is looping ~14,000 files, so don't be surprised if it takes an hour or more to run.

In [95]:
for i in range(0, len(os.listdir())):
    fname = os.listdir()[i]
    base = os.path.splitext(fname)[0]
    actid = base.split("_")[0]
    refid = base.split("_")[1]
    tree = ET.parse(os.listdir()[i])
    root = tree.getroot()
    for IE in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}intermediateExchange[@intermediateExchangeId='%s']/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='0']" % refid):
        amt = float(IE.attrib['amount'])
    q = idx.index.values[(idx['act_id']==actid) & (idx['ref_id']==refid)]  ##this is the column index for A and B matrices
    q = q[0]  

    ## create some temporary empty lists
    k = []
    l = []  
    m = []
    n = []
    o = []
    p = []
    r = []
    s = []
    t = []
    u = []
    v = []
    z = []
    qc = []
    qb = []
    
    ## populate the A matrix
    for IE in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}intermediateExchange[@activityLinkId]/[{http://www.EcoInvent.org/EcoSpold02}inputGroup='5']"):
        if float(IE.attrib['amount'])==0:
            continue
        k.append(IE.attrib['intermediateExchangeId'])
        l.append(IE.attrib['activityLinkId'])
        w = idx.index.values[(idx['act_id']==l[-1]) & (idx['ref_id']==k[-1])] ## find index for the intermediate exchange
        m.append(w[0])
        n.append(-1*float(IE.attrib['amount'])) ## multiply coefficient by -1
    if len(k)==0:
        w=0
        m=0
        n=0
    else: 
        qc = A[:,q]
        np.put(qc, m, n)
        A[:,q] = qc
        A[q][q] = amt + A[q][q] ##Add the reference flow on the diagonal because in a few cases there are losses within the industry itself

    ## populate the B matrix
    # input group 4
    for LX in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}inputGroup='4']"):
        o.append(LX.attrib['elementaryExchangeId'])
        p.append(float(LX.attrib['amount']))    
    for LXc in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}inputGroup='4']/{http://www.EcoInvent.org/EcoSpold02}compartment"):       
        r.append(LXc.attrib['subcompartmentId'])
    for j, match in enumerate(o):
        w = elemidx.index.values[(elemidx['elex_id']==o[j]) & (elemidx['subc_id']==r[j])]
        s.append(w[0])
    if len(o)==0:
        w=0
    #output group 4
    for LX in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='4']"):
        t.append(LX.attrib['elementaryExchangeId'])
        u.append(float(LX.attrib['amount']))    
    for LXc in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='4']/{http://www.EcoInvent.org/EcoSpold02}compartment"):       
        v.append(LXc.attrib['subcompartmentId'])
    for j, match in enumerate(t):
        w = elemidx.index.values[(elemidx['elex_id']==t[j]) & (elemidx['subc_id']==v[j])]
        z.append(w[0])
    if len(t)==0:
        w=0
    else:
        qb = B[:,q]
        np.put(qb, s, p)
        np.put(qb, z, u)
        B[:,q] = qb

    ## clear the temporary variables
    del tree
    del root
    del actid
    del refid
    del k
    del l
    del m
    del n
    del w
    del o 
    del p
    del r
    del s
    del t
    del u
    del v
    del z
    del qc
    del qb

Here is a sample calculation to determine the LCIA of one unit of clinker production in the RoW geography:

In [96]:
A_sp = sp.sparse.coo_matrix(A)
B_sp = sp.sparse.coo_matrix(B)

In [97]:
f = np.zeros(len(A))

In [101]:
i = idx.index[(idx['ref_name']=='clinker')&(idx['geog']=='US')]
idx.iloc[i]

Unnamed: 0,act_id,ref_id,act_name,ref_name,geog
13767,ecf5ad24-a861-4ed0-abae-42ad5ba5a882,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US


In [102]:
np.put(f,13767,1)

In [None]:
s = sp.sparse.linalg.gmres(A_sp, f)

In [None]:
g = np.matmul(B_sp, s)