# Using the statistics module in PyPSA

The `statistics` module is used to easily extract information from your networks. This is useful when inspecting your solved networks and creating first visualizations of your results.

With the `statistics` module, you can look at different metrics of your network. A list of the implemented metrics are:
    
- Capital expenditure
- Operational expenditure
- Installed capacities
- Optimal capacities
- Supply
- Withdrawal
- Curtailment
- Capacity Factor
- Revenue
- Market value
- Energy balance

Now lets look at an example.

In [None]:
import pypsa
import matplotlib.pyplot as plt

First, we open an example network we want to investigate.

In [None]:
n = pypsa.examples.scigrid_de()

Lets run an overview of all statistics by calling:

In [None]:
n.statistics()

So far the `statistics` are not so interesting, because we have not solved the network yet. We can only see that the network already has some installed capacities for different components.

You can see that `statistics` returns a `pandas.DataFrame`. The MultiIndex of the `DataFrame` provides the name of the network component (i.e. first entry of the MultiIndex, like *Generator, Line,...*) on the first index level. The `carrier` index level provides the carrier name of the given component. For example, in `n.generators`, we have the carriers *Brown Coal, Gas* and so on.

Now lets solve the network.

In [None]:
n.optimize(n.snapshots[:4])

Now we can look at the `statistics` of the solved network.

In [None]:
n.statistics().round(1)

As you can see there is now much more information available. There are still no capital expenditures in the network, because we only performed an operational optimization with this example network.

If you are interested in a specific metric, e.g. curtailment, you can run

In [None]:
curtailment = n.statistics.curtailment()
curtailment

Note that when calling a specific metric the `statistics` module returns a `pandas.Series`.
To find the unit of the data returned by `statistics`, you can call `attrs` on the `DataFrame` or `Series`.

In [None]:
curtailment.attrs

So the unit of curtailment is given in `MWh`. You can also customize your request.

For this you have various options:
1. You can select the component from which you want to get the metric with the attribute `comps`. Careful, `comps` has to be a list of strings.

In [None]:
n.statistics.curtailment(comps=["Generator"])

2. For metrics which have a time dimension, you can choose the aggregation method or decide to not aggregate them at all. Just use the `aggregate_time` attribute to specify what you want to do.

For example calculate the mean curtailment per time step is

In [None]:
n.statistics.curtailment(comps=["Generator"], aggregate_time="mean")

Or retrieve the curtailment time series by not aggregating the time series. 

In [None]:
n.statistics.curtailment(comps=["Generator"], aggregate_time=False).iloc[:, :3]

3. You can choose how you want to group the components of the network and how to aggregate the groups. By default the components are grouped by their carriers and summed. However, you can change this by providing different `groupby` and `aggregate_groups` attributes.

In [None]:
n.statistics.curtailment(comps=["Generator"], groupby=["bus"], aggregate_groups="max")

Now you obtained the maximal curtailment during one time step for every bus in the network.

Often it is better when inspecting your network to visualize the tables. Therefore, you can easily make plots to analyze your results. For example the generation/supply of the generators.

In [None]:
n.statistics.supply(comps=["Generator"]).div(1e3).plot.bar(title="Generator in GWh")

Or you could plot the generation time series of the generators.

In [None]:
fig, ax = plt.subplots()
n.statistics.supply(comps=["Generator"], aggregate_time=False).iloc[:, :4].div(
    1e3
).T.plot.area(
    title="Generation in GW",
    ax=ax,
    legend=False,
    linewidth=0,
)
ax.legend(bbox_to_anchor=(1, 0), loc="lower left", title=None, ncol=1)

Finally, we want to look at the energy balance of the network. The energy balance is not included in the overview of the statistics module. To calculate the energy balance, you can do

In [None]:
n.statistics.energy_balance()

Note that there is now an additional index level called bus carrier. This is because an energy balance is defined for every bus carrier. The bus carriers you have in your network you can find by looking at `n.buses.carrier.unique()`. For this network, there is only one bus carrier which is AC. AC corresponds to electricity in the regarded network. However, you can have further bus carriers for example when you have a sector coupled network. You could for example have heat or CO $_2$ as carrier. Therefore, for many `statistics` functions you have to be careful about the units of the values and it is not always given by the `attr` object of the `DataFrame` or `Series`.