5 create minimal set of variable functions for final energy by maxnutz · Pull Request #10 · maxnutz/pypsa_validation_processing

maxnutz · 2026-03-23T13:22:30Z

Basis structure

completes processing structure
framework for statistics-functions for PyPSA network evaluations to be added
resolves Create initial code-structure #1 and implement unit-conversion #9

Minimal set of Statistics-Functions

statistics-functions for a variable can be added to statistic_functions.py and mapping to IAMC-formatted variable needs to be added to configs/mapping.default.yaml
For requirements and naming conventions for statistics functions see corresponding section in README.md

Note

statistics functions are executed for a single pypsa-Network, NOT for a NetworkCollection, as some statistics-parameters are not yet applicable to NetworkCollections.

Summary by Sourcery

Add initial final energy statistics functions and integrate them into the PyPSA-to-IAMC processing workflow, including basic configuration, utilities, and tests.

New Features:

Add a mapping mechanism from IAMC variable names to statistics function names via the default mapping configuration.
Provide a utilities module with EU27 country code mappings and unit conversions for PyPSA outputs.

Enhancements:

Update network collection loading to read explicit NetCDF files from the results directory.
Extend Network_Processor to execute per-network statistics functions, post-process their results by investment year, and build a combined IAMC-compatible dataset.
Map internal unit labels to IAMC units and map ISO country codes to EU27 country names during pyam.IamDataFrame construction.
Clarify and expand the README with conventions for defining and registering variable statistics functions.
Refine internal class documentation for Network_Processor in Copilot instructions to reflect new APIs and attributes.

Tests:

Add unit tests for the new statistics functions to verify output structure and variable naming.
Add unit tests for EU27 country code mappings in the utilities module.

…ests Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com>

…le still reading in pypsa.NetworkCollection

…e-functions Complete Workflow including placeholder statistics-functions

…EADME docs, fix copilot-instructions Code Structure Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com>

…te-country-codes

sourcery-ai · 2026-03-23T13:22:36Z

Reviewer's Guide

Implements the minimal statistics-function framework and integration for final energy variables, wiring PyPSA network results through variable-specific functions into an IAMC-formatted pyam.IamDataFrame, adds EU27 metadata and unit mapping utilities, config defaults, and basic tests.

Sequence diagram for per-network variable calculation and aggregation

sequenceDiagram
    participant workflow_py as workflow_py
    participant np as Network_Processor
    participant cfg as config_yaml
    participant nc as pypsa_NetworkCollection
    participant n as pypsa_Network
    participant sf as statistics_functions_py
    participant utils as utils_py
    participant dsd as nomenclature_DSD
    participant pyam_df as pyam_IamDataFrame

    workflow_py->>np: __init__(path_config)
    np->>cfg: read config file
    np->>np: _read_pypsa_network_collection()
    np-->>nc: pypsa_NetworkCollection(file_list)
    np->>dsd: read_definitions()
    np->>np: _read_mappings() -> functions_dict

    loop for each investment_year_network in network_collection
        np->>nc: get network by index
        nc-->>n: pypsa_Network

        loop for each variable in dsd.variable
            np->>sf: call mapped function(n)
            activate sf
            sf->>n: n.statistics.energy_balance(...)
            n-->>sf: pd_Series result
            deactivate sf

            np->>np: _postprocess_statistics_result(variable, result)
            np-->>np: pd_DataFrame year_df_partial
        end

        np->>np: concat partial results -> year_df
        np-->>np: year_df with column investment_year
    end

    np->>np: merge year_df for all years -> ds_with_values
    np->>utils: EU27_COUNTRY_CODES, UNITS_MAPPING
    np->>np: structure_pyam_from_pandas(ds_with_values)
    np-->>pyam_df: pyam_IamDataFrame
    np-->>workflow_py: dsd_with_values (pyam_IamDataFrame)

Class diagram for updated Network_Processor, utils, and statistics_functions integration

classDiagram
    class Network_Processor {
        +path_config: pathlib_Path
        +config: dict
        +country: str
        +definition_path: pathlib_Path
        +mapping_path: pathlib_Path
        +output_path: pathlib_Path
        +network_results_path: pathlib_Path
        +model_name: str
        +scenario_name: str
        +network_collection: pypsa_NetworkCollection
        +dsd: nomenclature_DataStructureDefinition
        +functions_dict: dict
        +dsd_with_values: pyam_IamDataFrame
        +path_dsd_with_values: pathlib_Path
        +__init__(path_config: pathlib_Path)
        +_read_config() dict
        +_read_mappings() dict
        +_read_pypsa_network_collection() pypsa_NetworkCollection
        +read_definitions() nomenclature_DataStructureDefinition
        +_execute_function_for_variable(variable: str, n: pypsa_Network) pd_Series
        +_postprocess_statistics_result(variable: str, result: pd_Series) pd_DataFrame
        +structure_pyam_from_pandas(df: pd_DataFrame) pyam_IamDataFrame
        +calculate_variables_values() None
        +write_output_to_xlsx() None
    }

    class statistics_functions_py {
        +Final_Energy_by_Carrier__Electricity(n: pypsa_Network) pd_DataFrame
        +Final_Energy_by_Sector__Transportation(n: pypsa_Network) pd_DataFrame
    }

    class utils_py {
        +EU27_COUNTRY_CODES: dict~str,str~
        +UNITS_MAPPING: dict~str,str~
    }

    class config_default_yaml {
        +country: str
        +definitions_path: str
        +output_path: str
        +network_results_path: str
        +model_name: str
        +scenario_name: str
    }

    class mapping_default_yaml {
        +Final_Energy_by_Carrier__Electricity: str
        +Final_Energy_by_Sector__Transportation: str
    }

    class pypsa_NetworkCollection
    class pypsa_Network {
        +name: str
        +statistics: pypsa_StatisticsAccessor
    }
    class pypsa_StatisticsAccessor {
        +energy_balance(components: list, carrier: str, groupby: list) pd_Series
    }

    class nomenclature_DataStructureDefinition {
        +variable: pd_Series
    }

    class pyam_IamDataFrame
    class pd_DataFrame
    class pd_Series

    Network_Processor --> pypsa_NetworkCollection : owns
    Network_Processor --> nomenclature_DataStructureDefinition : uses
    Network_Processor --> pyam_IamDataFrame : creates
    Network_Processor --> statistics_functions_py : calls
    Network_Processor --> utils_py : imports
    Network_Processor --> config_default_yaml : reads
    Network_Processor --> mapping_default_yaml : reads
    pypsa_NetworkCollection --> pypsa_Network : contains
    pypsa_Network --> pypsa_StatisticsAccessor : has

File-Level Changes

Change	Details	Files
Introduce initial statistics functions and conventions for final energy variables and wire them into the Network_Processor workflow.	Define two placeholder statistics functions for electricity-by-carrier and transportation-by-sector final energy that currently call pypsa statistics and return their result. Document the expected function signature and return structure in README, including naming conventions and mapping behavior. Extend Network_Processor to look up and invoke variable-specific statistic functions per network, postprocess their Series output into DataFrames, and merge results across investment years into a single table used to build the pyam.IamDataFrame.	`pypsa_validation_processing/statistics_functions.py` `README.md` `pypsa_validation_processing/class_definitions.py`
Enhance configuration, utilities, and metadata handling for regions and units.	Add a utils module providing EU27 country-code mapping and unit normalization mapping, and use it when building the IAMC data (region name and unit mapping). Update default configuration values (country, definitions path, network path, model, scenario, output path) to a concrete example setup. Adjust the GitHub Copilot instructions diagram to include utils, updated method names, and corrected dsd_with_values type.	`pypsa_validation_processing/utils.py` `pypsa_validation_processing/configs/config.default.yaml` `.github/copilot-instructions.md` `pypsa_validation_processing/class_definitions.py`
Add tests for the new statistics functions and utilities and introduce IAMC-variable-to-function mapping entries.	Add unit tests for the two statistics functions that assert basic structural properties of their DataFrame outputs. Add unit tests for the EU27 country-code mapping to verify presence and integrity of all entries. Extend the default mapping YAML to map two IAMC final-energy variables to their corresponding statistics functions.	`tests/test_statistics_functions.py` `tests/test_utils.py` `pypsa_validation_processing/configs/mapping.default.yaml`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#1	Implement the processing pipeline module that reads IAMC variable definitions and YAML mappings, reads a PyPSA NetworkCollection, executes mapped statistics functions per variable and per investment year, and aggregates results into a pyam.IamDataFrame.	✅
#1	Implement the mapping and statistics-function mechanism for a minimal set of final energy IAMC variables, including YAML mappings, pypsa.statistics-based functions, and supporting utilities for units and country codes.	✅
#1	Provide at least one example or automated test that demonstrates the end-to-end workflow from IAMC variable definitions and PyPSA networks through to a pyam.IamDataFrame output.	❌	The PR adds tests for individual statistics functions and utilities, but it does not include an example or test that runs the full workflow (Network_Processor + mappings + NetworkCollection) to produce a pyam.IamDataFrame, as required by the acceptance criterion for demonstrating that the workflow works end-to-end.
#5	Implement statistics functions in statistics_functions.py for the two specified variables (`Final Energy [by Carrier]	Electricity`and`Final Energy [by Sector]	Transportation`), following the naming convention, embedding them into the existing structure, and including a dummy pypsa.statistics-based implementation.
#5	Register the new statistics functions in configs/mapping.default.yaml so that each IAMC variable name maps to the correct function.	✅
#5	Provide numpy-style docstrings for the new statistics functions and integrate them into the workflow so that the overall process can be run and adapted as needed.	✅

Possibly linked issues

create minimal set of variable-functions for final energy #5: The PR implements the two specified final energy functions, their mappings, and integrates them into the processing workflow.
Create initial code-structure #1: PR implements the requested IAMC–PyPSA mapping module, YAML-based mappings, statistics integration, and tests for final energy.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 5 issues, and left some high level feedback:

The statistics_functions contract is inconsistent across the codebase: the top-level docstring and tests assume functions take a NetworkCollection and return a long-format DataFrame, while Network_Processor._execute_function_for_variable and the README now expect a Network and a Series; align the signature, return type, and tests to a single, clearly documented interface.
The new implementations of Final_Energy_by_* call n.statistics.energy_balance(...), but the tests still use a dummy object with no statistics attribute and expect a preformatted DataFrame, so either adapt the tests to use a minimal pypsa.Network (or a realistic mock of statistics.energy_balance) or reintroduce the placeholder behavior described in the docstrings.
In calculate_variables_values, deriving investment_year from n.name[-4:] and assuming container_investment_years[0] exists is brittle; consider a more robust way to obtain the year (e.g., from network metadata or a parsed name) and handle the case where no statistics are returned without indexing an empty list.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `statistics_functions` contract is inconsistent across the codebase: the top-level docstring and tests assume functions take a `NetworkCollection` and return a long-format `DataFrame`, while `Network_Processor._execute_function_for_variable` and the README now expect a `Network` and a `Series`; align the signature, return type, and tests to a single, clearly documented interface.
- The new implementations of `Final_Energy_by_*` call `n.statistics.energy_balance(...)`, but the tests still use a dummy object with no `statistics` attribute and expect a preformatted `DataFrame`, so either adapt the tests to use a minimal `pypsa.Network` (or a realistic mock of `statistics.energy_balance`) or reintroduce the placeholder behavior described in the docstrings.
- In `calculate_variables_values`, deriving `investment_year` from `n.name[-4:]` and assuming `container_investment_years[0]` exists is brittle; consider a more robust way to obtain the year (e.g., from network metadata or a parsed name) and handle the case where no statistics are returned without indexing an empty list.

## Individual Comments

### Comment 1
<location path="pypsa_validation_processing/class_definitions.py" line_range="226-235" />
<code_context>
+        container_investment_years = []
</code_context>
<issue_to_address>
**issue:** Handle the case where no values are computed to avoid IndexError on `container_investment_years[0]`.

If the network collection is empty or all variable functions return `None`, `container_investment_years` will be empty and accessing index 0 will raise an `IndexError`. Please guard this access (e.g., return early or allow `ds_with_values` to be `None` as before) when the list is empty.
</issue_to_address>

### Comment 2
<location path="pypsa_validation_processing/class_definitions.py" line_range="194-198" />
<code_context>
         df = df.rename(
             columns={k: v for k, v in col_renaming_dict.items() if k in df.columns}
         )
+        df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING)
         # drop columns not needed

</code_context>
<issue_to_address>
**suggestion:** Mapping units without a fallback can introduce NaNs for unmapped units.

`Series.map` will set any unit not in `UNITS_MAPPING` to `NaN`, which can break `IamDataFrame` construction or cause units to be dropped. If you only want to normalize known units, consider:

```python
df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING).fillna(df["unit_pypsa"])
```

This keeps unknown units while still applying the mapping where defined.

```suggestion
        df = df.rename(
             columns={k: v for k, v in col_renaming_dict.items() if k in df.columns}
         )
+        df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING).fillna(df["unit_pypsa"])
         # drop columns not needed
```
</issue_to_address>

### Comment 3
<location path="tests/test_statistics_functions.py" line_range="18-27" />
<code_context>
+REQUIRED_COLUMNS = {"variable", "unit", "year", "value"}
+
+
+class _DummyNetworkCollection:
+    """Minimal stand-in for pypsa.NetworkCollection used in unit tests.
+
+    The current statistics functions are placeholders and do not call any
+    methods on the network collection, so an empty class is sufficient.
+    """
+
+
+@pytest.fixture
+def dummy_network_collection():
+    return _DummyNetworkCollection()
+
</code_context>
<issue_to_address>
**issue (testing):** The dummy network type used in tests does not match the real function signature and will cause attribute errors.

These functions now accept a real `pypsa.Network` and call `n.statistics.energy_balance(...)`, but `_DummyNetworkCollection` has no `statistics` attribute, so these tests will raise `AttributeError` before reaching any assertions. To reflect the real contract, either:

- provide a minimal fake `Network` with a `statistics.energy_balance(...)` method returning controlled data, or
- use a real `pypsa.Network` and monkeypatch `statistics.energy_balance`.

This ensures the tests cover the actual integration with the `statistics` API rather than accepting any object.
</issue_to_address>

### Comment 4
<location path="README.md" line_range="71" />
<code_context>
+    ...
+```
+
+**The returned `Series` is of the structure of the direct outcome of a `pypsa.statistics` - Function.** It therefore must have a multi-level index that includes a level named `"unit"` so that the post-processing step can extract the unit information. It is possible to return multiple values with different units. This is then forwarded and further processed as two indipendend rows of the pyam.IamDataFrame.
+
+### Mapping File
</code_context>
<issue_to_address>
**issue (typo):** Fix the typo "indipendend" → "independent".

In the last sentence, change "two indipendend rows" to "two independent rows".

```suggestion
**The returned `Series` is of the structure of the direct outcome of a `pypsa.statistics` - Function.** It therefore must have a multi-level index that includes a level named `"unit"` so that the post-processing step can extract the unit information. It is possible to return multiple values with different units. This is then forwarded and further processed as two independent rows of the pyam.IamDataFrame.
```
</issue_to_address>

### Comment 5
<location path="README.md" line_range="84-85" />
<code_context>
+
+At runtime, `Network_Processor` reads this mapping, looks up the function for each defined variable, and calls it for every network in the collection.  Variables without a mapping entry are silently skipped. 
+
+### Register a new variable-statistics
+To register a new variable
+- add an entry to the mapping file
+- implement the corresponding function
</code_context>
<issue_to_address>
**suggestion (typo):** Improve grammar in the section title and introductory line for registering a new variable.

The heading and intro sentence are grammatically off. Consider renaming the heading to something like “Register a new variable statistic” or “Register a new variable statistics function,” and updating the intro line to “To register a new variable, you need to:” or “To register a new variable:” before the bullet list.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pypsa_validation_processing/class_definitions.py

tests/test_statistics_functions.py

README.md

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

…ergy' of github.com:maxnutz/pypsa-validation_processing into 5-create-minimal-set-of-variable-functions-for-final-energy

…onding postprocessing steps; include general debugging steps

…ace] syntax and lock file v6 Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com> Agent-Logs-Url: https://github.com/maxnutz/pypsa_validation_processing/sessions/1df2d786-39b3-430a-ba4a-f5e31fa8ffc9

…ariable-functions-for-fin Fix CI: bump pixi to v0.66.0 to support `[workspace]` manifest and lock file v6

Copilot AI and others added 12 commits March 20, 2026 11:52

Initial plan

0db7d1d

Add Final Energy statistics functions, mappings, fix pd.concat, and t…

4c8409b

…ests Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com>

fix gitignore for notebooks

c7231d7

change processing object from NetworkCollection to pypsa.Network, whi…

6eb5729

…le still reading in pypsa.NetworkCollection

review to first complete workflow run

6276097

Merge pull request #6 from maxnutz/copilot/create-minimal-set-variabl…

19e6915

…e-functions Complete Workflow including placeholder statistics-functions

Initial plan

163f201

sanitary-issues: create utils.py, integrate EU27 country codes, add R…

f50aa53

…EADME docs, fix copilot-instructions Code Structure Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com>

review README

82042a5

adapt filestructure in copliot-instructions.md

6909260

Merge pull request #8 from maxnutz/copilot/add-utils-file-and-integra…

b373eb3

…te-country-codes

initial structure for mapping of pypsa-units

09c4c36

maxnutz self-assigned this Mar 23, 2026

maxnutz linked an issue Mar 23, 2026 that may be closed by this pull request

create minimal set of variable-functions for final energy #5

Closed

8 tasks

sourcery-ai bot reviewed Mar 23, 2026

View reviewed changes

maxnutz and others added 8 commits March 23, 2026 14:45

Update README.md

33a0715

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Review code structure for Pull request

afb776d

Merge branch '5-create-minimal-set-of-variable-functions-for-final-en…

92fbe84

…ergy' of github.com:maxnutz/pypsa-validation_processing into 5-create-minimal-set-of-variable-functions-for-final-energy

add postprocessing including different energy-units and adapt corresp…

7061401

…onding postprocessing steps; include general debugging steps

include first pypsa-Network statistics functions into package

e2c2ef1

update package descriptions

26c754c

add unit tests and corresponding github-workflow for automatic testing

8448300

Initial plan

c7ea151

Copilot AI mentioned this pull request Mar 24, 2026

Fix CI: bump pixi to v0.66.0 to support [workspace] manifest and lock file v6 #11

Merged

Copilot AI and others added 2 commits March 24, 2026 10:39

Merge pull request #11 from maxnutz/copilot/5-create-minimal-set-of-v…

8ac88a4

…ariable-functions-for-fin Fix CI: bump pixi to v0.66.0 to support `[workspace]` manifest and lock file v6

maxnutz merged commit 366333e into main Mar 24, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5 create minimal set of variable functions for final energy#10

5 create minimal set of variable functions for final energy#10
maxnutz merged 22 commits intomainfrom
5-create-minimal-set-of-variable-functions-for-final-energy

maxnutz commented Mar 23, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot commented Mar 23, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maxnutz commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Basis structure

Minimal set of Statistics-Functions

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for per-network variable calculation and aggregation

Class diagram for updated Network_Processor, utils, and statistics_functions integration

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxnutz commented Mar 23, 2026 •

edited

Loading

sourcery-ai bot commented Mar 23, 2026 •

edited

Loading