# Visualize the tax and transfer system

The tax and transfer system is complex and so is its representation in gettsim. A visualization helps you to understand the internal structure and how to implement custom reforms.

If you are unfamiliar with the general interface of gettsim, please, visit the [tutorial](tutorial.ipynb).

We dive right into gettsim. The following cells contain the same code as the test code for "zu versteuerndes Einkommen" or taxable income.

First, we import some necessary modules, functions and variables.

In [1]:
import pandas as pd

from gettsim import compute_taxes_and_transfers
from gettsim import get_policies_for_date
from gettsim import plot_dag

from gettsim.config import ROOT_DIR
from gettsim.tests.test_zu_versteuerndes_eink import INPUT_COLS

Here, we load the test data and select only observations from 2018.

In [2]:
df = (
    pd.read_csv(ROOT_DIR / "tests" / "test_data" / "test_dfs_zve.csv", usecols=INPUT_COLS)
    .query("jahr == 2018")
)

The following three cells contain the usual call to gettsim.

1. Load parameters and policy functions.
2. The user columns are variables which should not be computed by gettsim, but taken from the data.
3. Compute the targets by calling `compute_taxes_and_transfers` with the appropriate arguments. Note that, we also passed `return_dag=True` which is important for the next steps.

In [3]:
params_dict, policy_func_dict = get_policies_for_date(
    policy_date="2018",
    groups=["eink_st_abzuege", "soz_vers_beitr", "kindergeld", "eink_st"],
)

In [4]:
user_columns = [
    "ges_krankenv_beitr_m",
    "arbeitsl_v_beitr_m",
    "pflegev_beitr_m",
    "rentenv_beitr_m",
]

In [5]:
result, dag = compute_taxes_and_transfers(
    df,
    user_columns=user_columns,
    user_functions=policy_func_dict,
    targets=[
        "_zu_verst_eink_kein_kinderfreib_tu",
        "_zu_verst_eink_kinderfreib_tu",
        "kinderfreib_tu",
        "altersfreib",
        "sum_brutto_eink",
    ],
    params=params_dict,
    return_dag=True
)

The natural question is: "How does gettsim compute the quantities under `targets`?" The second return of `compute_taxes_and_transfers`, the `dag`, holds the answer.

First of all, what is DAG? A DAG is short for [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) and it is a special network of nodes and edges. In gettsim's DAG, nodes are individual columns in the DataFrame passed to `compute_taxes_and_transfers` or columns computed inside gettsim. Edges visualize dependencies between nodes. An edge pointing from A to B means that A is necessary to compute B. With that being said, let us take a look at the graph.

We focus on the variable `"kinderfreib_tu"` which is the "Kinderfreibetrag" for each tax unit. We can plot the variable and its surrounding nodes with the following call to `plot_dag`. The dictionary passed to `selectors` selects all direct neighbors of the node `"kinderfreib_tu"`.

In [8]:
plot_dag(
    dag,
    selectors=[
        "_anz_erwachsene_tu",
        "anz_kindergeld_kinder_tu",
        "kinderfreib_tu",
        "_zu_verst_eink_kein_kinderfreib_tu",
        "_zu_verst_eink_kinderfreib_tu" 
    ],
    plot_kwargs={"plot_width": 600, "plot_height": 600}
);

The plot shows a small graph with five nodes. The flow of computation starts at the top and ends at the bottom.

1. The upper left node is `_anz_erwachsene_tu` which is the number of adults in a tax unit. If you hover over the node, you can see the source code of the function which computes this intermediate column.
2. The upper right node computes the total claim to child benefits for each tax unit.
3. Both variables are necessary to compute the value of the node on the left side in the center of the graph which is the `kinderfreib_tu`.
4. The node on the right-hand-side in the center of the graph is `_zu_verst_eink_kein_kinderfreib_tu`. The column is computed with some other variables which are left out of the graph for brevity.
5. Finally, at the bottom of the figure is the target node `_zu_verst_eink_kinderfreib_tu`.

During the execution, the all dependencies of a node are computed before the node is computed.