# Using `wurst` to get Brightway databases as documents

In [None]:
import bw2data as bd
import bw2io as bi
import bw2calc as bc
import wurst as w

In [None]:
bd.projects.set_current("Bicycle example")

In [None]:
data = w.extract_brightway2_databases(["ðŸš²"])

Let's look at the data:

In [None]:
data[:2]

You can see (almost) the same data in Excel for in the sheet `lci-bike.xlsx`.

A key difference versus the graph approach is that we **don't have explicit links** - rather, we have edges which give _attributes_ of the desired sources or targets. We can manipulate these attributes (for example, to change the supplier or consumer - or to point to another database) before rebuilding the graph. The ability to make these systematic changes is very powerful.

One limitation of this approach is that we can't easily look over the graph structure, to see for example amounts needed two steps down the supply chain.

`wurst` has some simple searching functions - for example, we can get all products:

In [None]:
w.get_many(data, lambda x: x.get("type") == bd.labels.product_node_default)

By default this returns a generator - something which can get the results on demand, but doesn't compute them ahead of time. Generators can be infinite (this Fibonacci sequence), so it's useful that they don't run to completion when created.

In [None]:
list(w.get_many(data, lambda x: x.get("type") == bd.labels.product_node_default))

Instead of using anonymous `lambda` functions, we can also write a proper filter function:

In [None]:
def get_bicycle_product(node):
    return node['type'] == bd.labels.product_node_default and node['name'] == 'bicycle'

In [None]:
list(w.get_many(data, get_bicycle_product))

`get_many` is just a shortcut for a simple list comprehension, but `get_one` gives you a guarantee that you will get a single result:

In [None]:
w.get_one(data, get_bicycle_product)

In [None]:
w.get_one(data, lambda x: x.get("type") == bd.labels.product_node_default)

In [None]:
w.get_one(data, lambda x: x.get("type") == "missing")

The core idea when working with documents as data is to create a workflow with composable functions. These functions can be filters, but they can also do modification.

Wurst provides some basic capabilities, but the [toolz](https://toolz.readthedocs.io/) library has a lot more of these types of functions for future use.

Let's do a systematic change - let's assume that with future technology all inputs reduce by ten percent, except for carbon fibre production.

In [None]:
def process_but_not_carbon_fibre(obj):
    return obj["type"] == bd.labels.process_node_default and obj["name"] != 'carbon fibre production'

with w.debug_logging():
    for ds in w.get_many(data, process_but_not_carbon_fibre):
        for edge in w.technosphere(ds):
            w.rescale_exchange(edge, 0.9, remove_uncertainty=False)

Finally, we can write the modified data back to a database. This can be the same database, or a new one - in this case we take a new one.

In [None]:
w.write_brightway2_database(
    data, 
    "better bike", 
    metadata={"comment": "Made inputs 10% less"}, 
    products_and_processes=True
)

# Exercise

Extract the data from the 'ðŸš²' database again, and add a new particulates elementary flow and a new electricity generation process and electricity product. 

Add an exchange to emit particulates from the carbon fibre production, and an exchange which consumes electricy in the bicycle manufacture.

Add the following exchanges to the electricity generation:

- Production of electricity
- Emission of carbon dioxide

You can add new nodes by appending a dictionary to the extracted data:

```python
data = w.extract_brightway2_databases(["ðŸš²"])
data.append({<data here>})
```

Exchanges need to have the following attributes:

- database: str
- name: str
- location: str (for products)
- categories: tuple[str] (for elementary flows
- unit: str
- amount: float

Processes need to have the following attributes:

- database: str
- code: str
- name: str
- location: str
- exchanges: list

Products need to have the following attributes:

- database: str
- code: str
- name: str
- unit: str
- location: str

Elementary flows need to have the following attributes:

- database: str
- code: str
- name: str
- unit: str
- categories: list

When you are done adding the data, check internal linking by running the following:

```python
from wurst.brightway.write_database import link_internal_products_processes
data = link_internal_products_processes(data)
```

If everything is linked, you can write the new database.

```python
w.write_brightway2_database(
    data, 
    <new database name>, 
    products_and_processes=True
)
```

You can then check that your data generates a valid matrix by running an inventory (but not impact assessment) calculation:

```python
bike = bd.get_node(name="bicycle", database=<new database name>)
lca = bc.LCA({bike: 1})
lca.lci()
lca.supply_array
```