Creating EcoInvent Matrix from .spold files<br>
See: https://v34.ecoquery.ecoinvent.org/File/Files
<br>
<br>
At bottom in supporting documents there are instructions for matrix building from spold<br>
First download and unzip relevant system files - you may need a .7z to zip file converter

In [1]:
import os
import xml.etree.ElementTree as ET
import numpy as np
import scipy as sp
import pandas as pd
from scipy import sparse
from scipy.sparse import coo_matrix
from scipy.sparse.linalg import inv
from numpy.linalg import inv

First navigate to the appropriate folder - should be something like ~/Documents/D/Ecoinvent/ecoinvent-3.4_cutoff_ecoSpold02/

In [2]:
os.chdir("datasets")
dir = os.getcwd()

Change file extensions from .spold to .xml

In [3]:
for f in os.listdir():
    if f.endswith('.spold'):
        base = os.path.splitext(f)[0]
        target = os.path.join(dir, f)
        f = os.rename(target, base + '.xml')

Parse each file using ElementTree, extract key attributes as described in step 1 in the Ecoinvent documentation and put them in a dataframe. This is the index for the technosphere exchanges.

In [4]:
colnames = ['act_id', 'ref_id', 'act_name', 'ref_name', 'geog']
idx = pd.DataFrame(data=[], index=list(range(0,len(os.listdir()))), columns=colnames)
for i in range(0, len(os.listdir())):
    tree = ET.parse(os.listdir()[i])
    root = tree.getroot()
    for activity in root.iter('{http://www.EcoInvent.org/EcoSpold02}activity'):
        idx.loc[i]['act_id'] = activity.attrib['id']
    for activityName in root.iter('{http://www.EcoInvent.org/EcoSpold02}activityName'):
        idx.loc[i]['act_name'] = activityName.text   
    for geog in root.findall('.//{http://www.EcoInvent.org/EcoSpold02}geography/{http://www.EcoInvent.org/EcoSpold02}shortname'):
        idx.loc[i]['geog'] = geog.text
    for IE in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}intermediateExchange/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='0']"):
        idx.loc[i]['ref_id'] = IE.attrib['intermediateExchangeId']
    for rf in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}intermediateExchange/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='0']/{http://www.EcoInvent.org/EcoSpold02}name"):
        idx.loc[i]['ref_name'] = rf.text
    del tree
    del root

Check to make sure it looks right:

In [5]:
idx.head(5)

Unnamed: 0,act_id,ref_id,act_name,ref_name,geog
0,c276df5a-9f59-4304-a25d-5d33d07303e9,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
1,679932c4-15f4-46d0-82e0-e126ee6ccb05,d69294d7-8d64-4915-a896-9996a014c410,electricity voltage transformation from medium...,"electricity, low voltage",SA
2,4c99ba2a-d1e2-4c41-910d-6ce77382db63,91667712-8403-49bd-95be-c929b47067c2,"market for transport, freight, sea, transocean...","transport, freight, sea, transoceanic ship",GLO
3,8fdf3ecb-6c94-4731-bca6-3cc2a5d6796b,0dab73c6-b214-4e9c-8c38-ab49d608637b,market for protein pea,protein pea,GLO
4,7fc96034-d15c-4ac1-b183-89c346a72821,cbcd0e88-5e08-4e2a-8353-ae6a4ba53ce2,"market for photovoltaic mounting system, for 5...","photovoltaic mounting system, for 570kWp open ...",GLO


Now navigate to Elementary Exchanges file to build an index for the elemental exchanges. This time there is only one file.

In [6]:
os.chdir("..")
os.chdir("MasterData")

In [7]:
tree = ET.parse('ElementaryExchanges.xml')
root = tree.getroot()
colnames = ['elex_id', 'subc_id', 'elex_name', 'comp_name', 'subc_name']
L = len(root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange"))
elemidx = pd.DataFrame(data=[], index=list(range(0,L)), columns=colnames)

In [8]:
x = [None]*L
for i, elex in enumerate(root.iter('{http://www.EcoInvent.org/EcoSpold02}elementaryExchange')):
    x[i] = elex.attrib['id']
elemidx['elex_id'] = x
for i, subc in enumerate(root.findall('.//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/{http://www.EcoInvent.org/EcoSpold02}compartment')):
    x[i] = subc.attrib['subcompartmentId']
elemidx['subc_id'] = x
for i, lname in enumerate(root.iter('{http://www.EcoInvent.org/EcoSpold02}name')):
    x[i] = lname.text
elemidx['elex_name'] = x
for i, comp in enumerate(root.findall('.//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/{http://www.EcoInvent.org/EcoSpold02}compartment/{http://www.EcoInvent.org/EcoSpold02}compartment')):
    x[i] = comp.text
elemidx['comp_name'] = x
for i, sname in enumerate(root.iter('{http://www.EcoInvent.org/EcoSpold02}subcompartment')):
    x[i] = sname.text
elemidx['subc_name'] = x

Check to make sure it looks right:

In [9]:
elemidx.head(5)

Unnamed: 0,elex_id,subc_id,elex_name,comp_name,subc_name
0,38a622c6-f086-4763-a952-7c6b3b1c42ba,e8d7772c-55ca-4dd7-b605-fee5ae764578,"1,4-Butanediol",air,urban air close to ground
1,541a823c-0aad-4dc4-9123-d4af4647d942,e8d7772c-55ca-4dd7-b605-fee5ae764578,1-Pentanol,air,urban air close to ground
2,8cbaa905-41b0-4327-8403-bf1c8eb25429,e8d7772c-55ca-4dd7-b605-fee5ae764578,1-Pentene,air,urban air close to ground
3,f681eb3c-854a-4f78-bcfe-76dfbcf9df3c,e1bc9a16-5b6a-494f-98ef-49f461b1a11e,"2,4-D",soil,agricultural
4,a0fec60d-3f74-48bf-a2d2-58c30fc13e53,e8d7772c-55ca-4dd7-b605-fee5ae764578,2-Aminopropanol,air,urban air close to ground


We now have 2 index dataframes, elemidx for the elemental flows and idx for the technosphere inputs and outputs. Now we have to build the actual matrices: The A (technology) matrix, and the B (elementary flow) matrix. First create two matrices of zeros of the appropriate size.

In [10]:
A = np.zeros((len(idx),len(idx)))
B = np.zeros((len(elemidx),len(idx)))

Now we need to parse each of the spold files to find the coefficients, and put them in the right place in the matrix based on their index numbers. First navigate back to the datasets folder.

In [11]:
os.chdir("..")
os.chdir("datasets")

Next, open each file, parse the tree, identify the correct column in the matrix, find the coefficients and put them in. Off diagonal coefficients in the A matrix are multiplied by -1 as described in the EcoInvent documentation.

In [17]:
for i in range(0, len(os.listdir())):
    tree = ET.parse(os.listdir()[i])
    root = tree.getroot()
    for activity in root.iter('{http://www.EcoInvent.org/EcoSpold02}activity'):
        actid = activity.attrib['id']
    for IE in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}intermediateExchange/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='0']"):
        refid = IE.attrib['intermediateExchangeId']
        amt = float(IE.attrib['amount'])
    q = idx.index.values[(idx['act_id']==actid) & (idx['ref_id']==refid)]  ##this is the column index for A and B matrices
    q = q[0] 
    A[q][q] = amt  ## place the reference flow on the diagonal
    
    ## create some temporary empty lists
    k = []
    l = []  
    m = []
    n = []
    o = []
    p = []
    r = []
    s = []
    t = []
    u = []
    v = []
    z = []
    
    ## populate the A matrix
    for j, IE in enumerate(root.findall(".//{http://www.EcoInvent.org/EcoSpold02}intermediateExchange/[{http://www.EcoInvent.org/EcoSpold02}inputGroup='5']")):
            k.append(IE.attrib['intermediateExchangeId'])
            l.append(IE.attrib['activityLinkId'])
            w = idx.index.values[(idx['act_id']==l[j]) & (idx['ref_id']==k[j])] ## find index for the intermediate exchange
            m.append(w[0])
            n.append(-1*float(IE.attrib['amount'])) ## multiply coefficient by -1
    qc = A[:,q]
    np.put(qc, m, n)
    A[:,q] = qc
    
    ## populate the B matrix
    # input group 4
    for LX in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}inputGroup='4']"):
        o.append(LX.attrib['elementaryExchangeId'])
        p.append(float(LX.attrib['amount']))    
    for LXc in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}inputGroup='4']/{http://www.EcoInvent.org/EcoSpold02}compartment"):       
        r.append(LXc.attrib['subcompartmentId'])
    for j, match in enumerate(o):
        w = elemidx.index.values[(elemidx['elex_id']==o[j]) & (elemidx['subc_id']==r[j])]
        s.append(w[0])
    #output group 4
    for LX in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='4']"):
        t.append(LX.attrib['elementaryExchangeId'])
        u.append(float(LX.attrib['amount']))    
    for LXc in root.findall(".//{http://www.EcoInvent.org/EcoSpold02}elementaryExchange/[{http://www.EcoInvent.org/EcoSpold02}outputGroup='4']/{http://www.EcoInvent.org/EcoSpold02}compartment"):       
        v.append(LXc.attrib['subcompartmentId'])
    for j, match in enumerate(t):
        w = elemidx.index.values[(elemidx['elex_id']==t[j]) & (elemidx['subc_id']==v[j])]
        z.append(w[0])
    qb = B[:,q]
    np.put(qb, s, p)
    np.put(qb, z, u)
    B[:,q] = qb
    
    ## clear the temporary variables
    del tree
    del root
    del actid
    del refid
    del k
    del l
    del m
    del n
    del w
    del o 
    del p
    del r
    del s
    del t
    del u
    del v
    del z
    del qc
    del qb

KeyError: 'activityLinkId'

Not working bc certain activities have multiple outputgroup>0 fields - see clinker production, and not sure how to tell which is actual reference flow. Also the amounts for many of them are zero. If amount is zero, missing activityLinkID. 

Here is a sample calculation to determine the LCIA of one unit of protein pea production in the RoW geography:

In [17]:
A_sp = sp.sparse.coo_matrix(A)

In [13]:
f = np.zeros(len(A))

In [25]:
i = idx.index[(idx['ref_name']=='clinker')&(idx['geog']=='US')]
idx.iloc[i]

Unnamed: 0,act_id,ref_id,act_name,ref_name,geog
0,c276df5a-9f59-4304-a25d-5d33d07303e9,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
1980,579b42bb-0d42-46de-88c6-89960b0543e0,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
2809,bda249e8-b0bb-49bd-b7b5-6d42d8b693f6,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
3790,f76277c9-4bd2-4d45-8560-210ff34ba76b,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
7171,65995787-8263-4cff-bfe9-8be179d86aca,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
7855,62fb3e9e-4053-4291-87c0-874dd45294f5,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
9677,f8c9ece5-ce33-4b78-a2d8-70f7df7806ae,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
9753,38affb53-b4c0-4dc1-9f74-ba507d35a2ef,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
10352,0681ddc2-1a75-40ab-8d6e-5d28197f0249,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US
13902,972c31d6-a4af-41fa-b922-cf27753d9b20,1f41586d-0d8a-4c7c-8473-dd8351bab538,clinker production,clinker,US


In [15]:
np.put(f,[3,6],1)

In [None]:
s = sp.sparse.linalg.gmres(A_sp, f)

In [None]:
g = np.matmul(B, s)