autoprox
automatically generates bridge processes as described in
Ingwersen et al. 2018
directly in openLCA.
For a process p
in a database with a set of background processes Q
,
autoprox
generates a set of bridge processes B
that connect the product
inputs and waste outputs of p
with corresponding product outputs and waste
inputs provided by the processes in Q
. This is done by a
Generator that takes the ID of the
process p
and Matcher M
as input.
For a product input or waste output fp
of p
that does not yet have a
provider process in Q
, the matcher M
generates a set of flow-score pairs
for the product outputs and waste inputs fq
with a provider process in Q
:
M: fp -> {(fq, sq) | fq in Q, sq in [0, 1]}
The generator selects then the top matching flows of fq
with the following
rule where epsilon
can be configured:
abs(1.0 - (sq_i / max(sq))) <= epsilon
A bridge process b
is then generated that has a corresponding exchange for
each of these matching product outputs or waste inputs. The quantitative
reference of b
is set to one unit of fp
and the amount of a matching flow
fq_i
is set to:
sq_i^2 / (sum(sq) * max(sq))
Only flows are currently selected that have the same reference flow property
as fp
so that every amount in b
has the same unit. The name of b
is
set to the name of the reference flow with a _bridge:
prefix and all processes
of B
are stored in the _bridge
category so that it is easy to identify
(and delete) them:
For p
, it should be then possible to create a product system that uses the
generated bridge processes B
to connect p
with Q
:
This matcher extracts the bigrams from the words of the names of the flows that are compared and computes the Sørensen–Dice coefficient of these sets of bigrams. It is fast and simple and gives good results for flow names that are relatively specific:
However, flow names in LCA names often contain terms like at plant
or
production mix
that will lead to imprecise results using this matcher
without a filter:
The InfoContentMatcher
computes the information content I(w)
of a word w
as:
I(w) = |w| * e^(-alpha * freq(w))
|w|
is the number of characters of w
and freq(w)
the absolute frequency of
w
in the flow names of fq
. With this, long words that are less frequent get
a higher weight than terms like at plant
when calculating the similarity
between two flow names. This fixes the concrete
example above:
However, words that have a high information content can describe completely different things:
This matcher calculates the similarities between flow names based on the
information content of the contained words as described above and a semantic
similarity score that is calculated as the path distance between two words
in WordNet. It uses the
WS4j API to calculate this
distance. The WordNet database that comes with WS4j is maybe a bit
outdated. Also, technical terms that are common in LCA databases are often not
present in WordNet. This is why this matcher currently does not give much
better results than the InfoContentMatcher
. However, combining lexical
matching, corpus statistics, and semantic similarities could in principal
give good results (see e.g. this paper).
The easiest way to run this project is to load it into a current version
of IntelliJ IDEA (e.g. the open source
community version). Adopt the process ID of p
and the databases path
in the main function and run it. In order
to use the WordNetPathMatcher
you need to setup the WS4j database as described
below.
WS4j is an archived Google Code project and a bit complicated to set up (see below) and is compatible with a relative old version of WordNet. An alternative could be JWI which supports to load a current WordNet database from a folder (just download and extract the WordNet database files to that folder):
val wordNetPath = "C:/Users/ms/Downloads/WNdb-3.0/dict"
val dict = RAMDictionary(File(wordNetPath), ILoadPolicy.NO_LOAD)
dict.open()
val idxWord = dict.getIndexWord("asphalt", POS.NOUN)
if (idxWord != null) {
val word = dict.getWord(idxWord.wordIDs[0])
val relSynsets = word.synset.relatedSynsets
...
}
However, WS4j provides a lot of features and algorithms that can be used easily while JWI provides a more low level API (but with a nice tutorial).
WS4j is an archived project
on Google Code but there is also a
Github clone available which seems to be the
version that is published in the Maven central repository. In order to run WS4j,
you need to put the configuration files
jawjaw.conf and similarity.conf
and the database file wnjpn.db
into the class-path. The wnjpn.db
file can
be extracted from the distribution packages from the
WS4j Google Code download pages.