# Example
Author: [Romain Sacchi](romain.sacchi@psi.ch), [Alvaro Hahn](alvaro.hahn-menacho@psi.ch)

``pathways``allows calculating LCA impacts of a product or system along a time axis, combining time series of demand with scenario-based LCA databases.


## Introduction

This notebook presents a mock case to illustrate the use of `pathways`. The diagram below introduces the proposed production system.

The goal of this exercise is to calculate the environmental impact (both direct and indirect) associated with meeting the demand over time (2020-2050) for **product A** under two different future scenarios.

We present the technosphere and biosphere matrices at each timestep. According to LCA conventions, the technosphere matrix lists the different activities in columns, and the different products in rows. Positive values indicate outputs from an activity, while negative values indicate inputs. For example, in 2020: *activity A*, to produce 1 unit of *product A*, demands 0.8 units of *product B* and directly emits 1.5 units of CO2. Concurrently, *activity B* consumes 0.2 units of *product E* and emits 0.2 units of CO2 to produce 1 unit of *product B*. [...]

For each timestep, we can identify different changes in the technosphere exchanges and emissions intensities caused by changes in the system.

-------------------

![LCA_system_diagram](figures/example_illustration.png)

-------------------

## Application

We start by instantiating the `Pathways` class, and give it a file path pointing to a datapackage.json of a datapackage or directly the datapackage itself (.zip file).

In [1]:
from pathways import Pathways
import numpy as np
p = Pathways(
    datapackage="datapackage_sample/datapackage.json",
    debug=True # when `debug` is True, a local pathways.log file is created and allows tracking the workflow
)

Invalid datapackage: Descriptor validation error: {'path': 'mapping/mapping.yaml', 'profile': 'data-resource', 'name': 'mapping', 'format': 'yaml', 'mediatype': 'text/yaml', 'encoding': 'utf-8'} is not valid under any of the given schemas at "resources/33" in descriptor and at "properties/resources/items/oneOf" in profile
Invalid datapackage: Descriptor validation error: 'data-resource' is not one of ['tabular-data-resource'] at "resources/33/profile" in descriptor and at "properties/resources/items/properties/profile/enum" in profile
Invalid datapackage: Descriptor validation error: {'path': 'classifications/classifications.yaml', 'profile': 'data-resource', 'name': 'classifications', 'format': 'yaml', 'mediatype': 'text/yaml', 'encoding': 'utf-8'} is not valid under any of the given schemas at "resources/34" in descriptor and at "properties/resources/items/oneOf" in profile
Invalid datapackage: Descriptor validation error: 'data-resource' is not one of ['tabular-data-resource'] at "r

At this point, you can access all the resources of the `datapackage.Package`, such as the scenario data, for example.
We see that the demand for `technology A`, represented by `product A` (see `.mapping`), is 1'000 kilograms (see `.scenarios.attrs`) each year.

In [2]:
p.scenarios.to_dataframe("")

model,pathway,variables,region,year,Unnamed: 5
some model,Scenario A,technology A,EU,2020,1000.0
some model,Scenario A,technology A,EU,2030,1000.0
some model,Scenario A,technology A,EU,2040,1000.0
some model,Scenario A,technology A,EU,2050,1000.0
some model,Scenario B,technology A,EU,2020,1000.0
some model,Scenario B,technology A,EU,2030,1000.0
some model,Scenario B,technology A,EU,2040,1000.0
some model,Scenario B,technology A,EU,2050,1000.0


In [3]:
p.scenarios.attrs

{'units': {'technology A': 'kilogram'}}

We can also see the mapping used to map the scenario variables to the LCA datasets:

In [4]:
p.mapping

{'technology A': {'dataset': [{'name': 'activity A',
    'reference product': 'product A',
    'unit': 'kilogram'}],
  'scenario variable': 'technology A'}}

We can also list the LCIA methods available.

In [5]:
p.lcia_methods[:5]

['CML v4.8 2016 no LT - acidification no LT - acidification (incl. fate, average Europe total, A&B) no LT',
 'CML v4.8 2016 no LT - climate change no LT - global warming potential [GWP100) no LT',
 'CML v4.8 2016 no LT - ecotoxicity: freshwater no LT - freshwater aquatic ecotoxicity [FAETP inf) no LT',
 'CML v4.8 2016 no LT - ecotoxicity: marine no LT - marine aquatic ecotoxicity (MAETP inf) no LT',
 'CML v4.8 2016 no LT - ecotoxicity: terrestrial no LT - terrestrial ecotoxicity (TETP inf) no LT']

And most importantly, once the `datapackage.Package` is loaded, we can use the method `Pathways.calculate()` to calculate the LCA impacts.

Arguments:

* `methods`: list[str]. LCIA methods to use. To get a complete list of available LCIA methods, call `.lcia_methods`
* `scenarios`: list[str]. List of scenarios you want to calculate the impacts for.
* `variables`: list[str]. List of variables you want to calculate the impacts for (if the demand for them is non-null)
* `regions`: list[str]. Regions for which you want to calculate the impacts, provided the specified variables have a non-null demand in these regions.
* `years`: list[int]. Years for which you want to calculate the impacts.
* `multiprocessing`: bool. Multiprocessing may accelerate the process, as it processes each year in parallel.
* `demand_cutoff`: float. Between 0 and 1. Defines a ratio below which demand values are ignored. The default is 0.001.
* `double_accounting`:list[list[str]]. List of predefined category paths to be adjusted to prevent double counting of activities. Each path indicates the hierarchical categories involved.
* `use_distributions`: bool. Number of iterations to use for Monte Carlo analyses. The default is 0 (i.e., regular analysis).

In the example below, we do so using a stochastic approach (i.e., `use_distributions=500`), leveraging on the uncertainty distributions we defined for each exchange in the datapackage. `bw2calc` and the underlying library `stats_array` generates 500 pseudo-random exchange values and update the technosphere and biosphere matrices between each iteration.

In [6]:
p.calculate(
    methods=['EF v3.1 EN15804 - climate change - global warming potential (GWP100)',],
    regions=["EU",],
    scenarios=[
        "Scenario A",
        "Scenario B",
    ],
    variables=[
        "technology A",
    ],
    years=[
        2020,
        2030,
        2040,
        2050
    ],
    use_distributions=500,
    multiprocessing=True
)

Calculating LCA results for some model...
--- Calculating LCA results for Scenario A...
------ Calculating LCA results for 2050...
------ Calculating LCA results for 2020...
------ Calculating LCA results for 2040...
------ Calculating LCA results for 2030...
--- Calculating LCA results for Scenario B...
------ Calculating LCA results for 2050...
------ Calculating LCA results for 2040...
------ Calculating LCA results for 2030...
------ Calculating LCA results for 2020...


0% [########] 100% | ETA: 00:00:00

Statistical analysis files: /Users/romain/Library/Application Support/pathways/stats



Total time elapsed: 00:00:00


We can now access the attribute `.lca_results`, which is an `xarray.DataArray` where the results are stored. We can format it a little to present it in an `pandas.DataFrame`, for example.

In [7]:
# interpolate in-between years
arr = p.lca_results.interp(
    year=range(
        p.lca_results.coords["year"].values.min(),
        p.lca_results.coords["year"].values.max() + 1
    )
)

In [8]:
df = arr.to_dataframe("value")

In [9]:
df = df[df["value"]!=0.0]
df = df[~df["value"].isnull()]
print(len(df))
df=df.reset_index()

930


In [10]:
# pivottablejs is very convenient way to visualize pivot tables
from pivottablejs import pivot_ui
from IPython.display import HTML
pivot_ui(df, outfile_path='example.html')

Impacts with process contributions

![impacts with process contribution](figures/fig2.png)

Impacts with breakdown by geographical location of impacts

![impacts with impacts origins](figures/fig3.png)

Sum of impacts for the 5th, 50th and 95th quantiles

![impacts with uncertainty](figures/fig1.png)

Comparison between two scenarios

![scenarios comparison](figures/fig4.png)

### Providing actual datapackages
While it is possible to manually build data packages such as the one used in this example, it is not very convenient when dealing with real LCA databases, which have more than half a million exchanges.
`premise` can output such datapackage as a zip file that can be directly given to `pathways.Pathways`.

The following will produce a datapackage for all years contained in the IMAGE SSP2-RCP19 scenario.

In [None]:
from premise import *
import bw2data
from datapackage import Package
bw2data.projects.set_current("ei39")
ndb = PathwaysDataPackage(
    scenarios=[
        {"model": "image", "pathway": "SSP2-RCP19"},
    ],
    source_db="ecoinvent 3.9.1 cutoff", # <-- name of the database in the BW2 project. Must be a string.
    source_version="3.9", # <-- version of ecoinvent. Can be "3.5", "3.6", "3.7" or "3.8". Must be a string.
    key="tUePmX_S5B8ieZkkM7WUU2CnO8SmShwmAeWK9x2rTFo=",
)

ndb.create_datapackage(
    name="image-SSP2",
    contributors=[
        {"name": "Romain",
        "email": "r_s at me.com",}
    ]
)

This allows for more complex analyses, such as, for example, the projected demand in cobalt for the global electricity system as projected in that same IMAGE scenario:

Life cycle-based annual demand for cobalt, global electricity supply

![cobalt projection](figures/fig5.png)

### Avoiding double counting

When working with IAM/ESM, the focus s in the **total** quantity (e.g., electricity production, transport, etc.) coming from the IAM/ESM output and not on our examination of the supply chain as provided by LCA. To avoid double counting, we need to identify activities modeled by the IAM/ESM and set their inputs to all activities to zero. 

`pathways` allows users to select the categories to be considered in the double-counting adjustment. The predefined categories are illustrated below:

![LCA_system_diagram](figures/categories.png)

This adjustment can be applied using the argument double_accounting.

In [1]:
from pathways import Pathways
import numpy as np
p = Pathways(
    datapackage="datapackage_sample/datapackage.json",
    debug=True
)

p.calculate(
    methods=['EF v3.1 EN15804 - climate change - global warming potential (GWP100)',],
    regions=["EU",],
    scenarios=[
        "Scenario A",
        "Scenario B",
    ],
    variables=[
        "technology A",
    ],
    years=[
        2020,
        2030,
        2040,
        2050
    ],
    use_distributions=500,
    multiprocessing=True,
    double_accounting = [["Energy"]]
)

Invalid datapackage: Descriptor validation error: {'path': 'mapping/mapping.yaml', 'profile': 'data-resource', 'name': 'mapping', 'format': 'yaml', 'mediatype': 'text/yaml', 'encoding': 'utf-8'} is not valid under any of the given schemas at "resources/33" in descriptor and at "properties/resources/items/oneOf" in profile
Invalid datapackage: Descriptor validation error: 'data-resource' is not one of ['tabular-data-resource'] at "resources/33/profile" in descriptor and at "properties/resources/items/properties/profile/enum" in profile
Invalid datapackage: Descriptor validation error: {'path': 'classifications/classifications.yaml', 'profile': 'data-resource', 'name': 'classifications', 'format': 'yaml', 'mediatype': 'text/yaml', 'encoding': 'utf-8'} is not valid under any of the given schemas at "resources/34" in descriptor and at "properties/resources/items/oneOf" in profile
Invalid datapackage: Descriptor validation error: 'data-resource' is not one of ['tabular-data-resource'] at "r

ValueError: Sheet 'Double accounting - Zeroed' already exists and if_sheet_exists is set to 'error'.

Some examples would be:

- If we are interested in zeroing "Electricity, industrial"

  `double_accounting = [["Energy", "Electricity", "Industrial"]]`

- To zero "Electricity, industrial", "Electricity, residential" and "Freight"

  `double_accounting = [["Energy", "Electricity"], ["Transport", "Freight"]]`

- To adjust for energy (both heat and electricity)

  `double_accounting = [["Energy"]]`