Skip to content

5 create minimal set of variable functions for final energy#10

Merged
maxnutz merged 22 commits intomainfrom
5-create-minimal-set-of-variable-functions-for-final-energy
Mar 24, 2026
Merged

5 create minimal set of variable functions for final energy#10
maxnutz merged 22 commits intomainfrom
5-create-minimal-set-of-variable-functions-for-final-energy

Conversation

@maxnutz
Copy link
Copy Markdown
Owner

@maxnutz maxnutz commented Mar 23, 2026

Basis structure

Minimal set of Statistics-Functions

  • statistics-functions for a variable can be added to statistic_functions.py and mapping to IAMC-formatted variable needs to be added to configs/mapping.default.yaml
  • For requirements and naming conventions for statistics functions see corresponding section in README.md

Note

statistics functions are executed for a single pypsa-Network, NOT for a NetworkCollection, as some statistics-parameters are not yet applicable to NetworkCollections.

Summary by Sourcery

Add initial final energy statistics functions and integrate them into the PyPSA-to-IAMC processing workflow, including basic configuration, utilities, and tests.

New Features:

  • Add a mapping mechanism from IAMC variable names to statistics function names via the default mapping configuration.
  • Provide a utilities module with EU27 country code mappings and unit conversions for PyPSA outputs.

Enhancements:

  • Update network collection loading to read explicit NetCDF files from the results directory.
  • Extend Network_Processor to execute per-network statistics functions, post-process their results by investment year, and build a combined IAMC-compatible dataset.
  • Map internal unit labels to IAMC units and map ISO country codes to EU27 country names during pyam.IamDataFrame construction.
  • Clarify and expand the README with conventions for defining and registering variable statistics functions.
  • Refine internal class documentation for Network_Processor in Copilot instructions to reflect new APIs and attributes.

Tests:

  • Add unit tests for the new statistics functions to verify output structure and variable naming.
  • Add unit tests for EU27 country code mappings in the utilities module.

@maxnutz maxnutz self-assigned this Mar 23, 2026
@maxnutz maxnutz linked an issue Mar 23, 2026 that may be closed by this pull request
8 tasks
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 23, 2026

Reviewer's Guide

Implements the minimal statistics-function framework and integration for final energy variables, wiring PyPSA network results through variable-specific functions into an IAMC-formatted pyam.IamDataFrame, adds EU27 metadata and unit mapping utilities, config defaults, and basic tests.

Sequence diagram for per-network variable calculation and aggregation

sequenceDiagram
    participant workflow_py as workflow_py
    participant np as Network_Processor
    participant cfg as config_yaml
    participant nc as pypsa_NetworkCollection
    participant n as pypsa_Network
    participant sf as statistics_functions_py
    participant utils as utils_py
    participant dsd as nomenclature_DSD
    participant pyam_df as pyam_IamDataFrame

    workflow_py->>np: __init__(path_config)
    np->>cfg: read config file
    np->>np: _read_pypsa_network_collection()
    np-->>nc: pypsa_NetworkCollection(file_list)
    np->>dsd: read_definitions()
    np->>np: _read_mappings() -> functions_dict

    loop for each investment_year_network in network_collection
        np->>nc: get network by index
        nc-->>n: pypsa_Network

        loop for each variable in dsd.variable
            np->>sf: call mapped function(n)
            activate sf
            sf->>n: n.statistics.energy_balance(...)
            n-->>sf: pd_Series result
            deactivate sf

            np->>np: _postprocess_statistics_result(variable, result)
            np-->>np: pd_DataFrame year_df_partial
        end

        np->>np: concat partial results -> year_df
        np-->>np: year_df with column investment_year
    end

    np->>np: merge year_df for all years -> ds_with_values
    np->>utils: EU27_COUNTRY_CODES, UNITS_MAPPING
    np->>np: structure_pyam_from_pandas(ds_with_values)
    np-->>pyam_df: pyam_IamDataFrame
    np-->>workflow_py: dsd_with_values (pyam_IamDataFrame)
Loading

Class diagram for updated Network_Processor, utils, and statistics_functions integration

classDiagram
    class Network_Processor {
        +path_config: pathlib_Path
        +config: dict
        +country: str
        +definition_path: pathlib_Path
        +mapping_path: pathlib_Path
        +output_path: pathlib_Path
        +network_results_path: pathlib_Path
        +model_name: str
        +scenario_name: str
        +network_collection: pypsa_NetworkCollection
        +dsd: nomenclature_DataStructureDefinition
        +functions_dict: dict
        +dsd_with_values: pyam_IamDataFrame
        +path_dsd_with_values: pathlib_Path
        +__init__(path_config: pathlib_Path)
        +_read_config() dict
        +_read_mappings() dict
        +_read_pypsa_network_collection() pypsa_NetworkCollection
        +read_definitions() nomenclature_DataStructureDefinition
        +_execute_function_for_variable(variable: str, n: pypsa_Network) pd_Series
        +_postprocess_statistics_result(variable: str, result: pd_Series) pd_DataFrame
        +structure_pyam_from_pandas(df: pd_DataFrame) pyam_IamDataFrame
        +calculate_variables_values() None
        +write_output_to_xlsx() None
    }

    class statistics_functions_py {
        +Final_Energy_by_Carrier__Electricity(n: pypsa_Network) pd_DataFrame
        +Final_Energy_by_Sector__Transportation(n: pypsa_Network) pd_DataFrame
    }

    class utils_py {
        +EU27_COUNTRY_CODES: dict~str,str~
        +UNITS_MAPPING: dict~str,str~
    }

    class config_default_yaml {
        +country: str
        +definitions_path: str
        +output_path: str
        +network_results_path: str
        +model_name: str
        +scenario_name: str
    }

    class mapping_default_yaml {
        +Final_Energy_by_Carrier__Electricity: str
        +Final_Energy_by_Sector__Transportation: str
    }

    class pypsa_NetworkCollection
    class pypsa_Network {
        +name: str
        +statistics: pypsa_StatisticsAccessor
    }
    class pypsa_StatisticsAccessor {
        +energy_balance(components: list, carrier: str, groupby: list) pd_Series
    }

    class nomenclature_DataStructureDefinition {
        +variable: pd_Series
    }

    class pyam_IamDataFrame
    class pd_DataFrame
    class pd_Series

    Network_Processor --> pypsa_NetworkCollection : owns
    Network_Processor --> nomenclature_DataStructureDefinition : uses
    Network_Processor --> pyam_IamDataFrame : creates
    Network_Processor --> statistics_functions_py : calls
    Network_Processor --> utils_py : imports
    Network_Processor --> config_default_yaml : reads
    Network_Processor --> mapping_default_yaml : reads
    pypsa_NetworkCollection --> pypsa_Network : contains
    pypsa_Network --> pypsa_StatisticsAccessor : has
Loading

File-Level Changes

Change Details Files
Introduce initial statistics functions and conventions for final energy variables and wire them into the Network_Processor workflow.
  • Define two placeholder statistics functions for electricity-by-carrier and transportation-by-sector final energy that currently call pypsa statistics and return their result.
  • Document the expected function signature and return structure in README, including naming conventions and mapping behavior.
  • Extend Network_Processor to look up and invoke variable-specific statistic functions per network, postprocess their Series output into DataFrames, and merge results across investment years into a single table used to build the pyam.IamDataFrame.
pypsa_validation_processing/statistics_functions.py
README.md
pypsa_validation_processing/class_definitions.py
Enhance configuration, utilities, and metadata handling for regions and units.
  • Add a utils module providing EU27 country-code mapping and unit normalization mapping, and use it when building the IAMC data (region name and unit mapping).
  • Update default configuration values (country, definitions path, network path, model, scenario, output path) to a concrete example setup.
  • Adjust the GitHub Copilot instructions diagram to include utils, updated method names, and corrected dsd_with_values type.
pypsa_validation_processing/utils.py
pypsa_validation_processing/configs/config.default.yaml
.github/copilot-instructions.md
pypsa_validation_processing/class_definitions.py
Add tests for the new statistics functions and utilities and introduce IAMC-variable-to-function mapping entries.
  • Add unit tests for the two statistics functions that assert basic structural properties of their DataFrame outputs.
  • Add unit tests for the EU27 country-code mapping to verify presence and integrity of all entries.
  • Extend the default mapping YAML to map two IAMC final-energy variables to their corresponding statistics functions.
tests/test_statistics_functions.py
tests/test_utils.py
pypsa_validation_processing/configs/mapping.default.yaml

Assessment against linked issues

Issue Objective Addressed Explanation
#1 Implement the processing pipeline module that reads IAMC variable definitions and YAML mappings, reads a PyPSA NetworkCollection, executes mapped statistics functions per variable and per investment year, and aggregates results into a pyam.IamDataFrame.
#1 Implement the mapping and statistics-function mechanism for a minimal set of final energy IAMC variables, including YAML mappings, pypsa.statistics-based functions, and supporting utilities for units and country codes.
#1 Provide at least one example or automated test that demonstrates the end-to-end workflow from IAMC variable definitions and PyPSA networks through to a pyam.IamDataFrame output. The PR adds tests for individual statistics functions and utilities, but it does not include an example or test that runs the full workflow (Network_Processor + mappings + NetworkCollection) to produce a pyam.IamDataFrame, as required by the acceptance criterion for demonstrating that the workflow works end-to-end.
#5 Implement statistics functions in statistics_functions.py for the two specified variables (`Final Energy [by Carrier] ElectricityandFinal Energy [by Sector] Transportation`), following the naming convention, embedding them into the existing structure, and including a dummy pypsa.statistics-based implementation.
#5 Register the new statistics functions in configs/mapping.default.yaml so that each IAMC variable name maps to the correct function.
#5 Provide numpy-style docstrings for the new statistics functions and integrate them into the workflow so that the overall process can be run and adapted as needed.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • The statistics_functions contract is inconsistent across the codebase: the top-level docstring and tests assume functions take a NetworkCollection and return a long-format DataFrame, while Network_Processor._execute_function_for_variable and the README now expect a Network and a Series; align the signature, return type, and tests to a single, clearly documented interface.
  • The new implementations of Final_Energy_by_* call n.statistics.energy_balance(...), but the tests still use a dummy object with no statistics attribute and expect a preformatted DataFrame, so either adapt the tests to use a minimal pypsa.Network (or a realistic mock of statistics.energy_balance) or reintroduce the placeholder behavior described in the docstrings.
  • In calculate_variables_values, deriving investment_year from n.name[-4:] and assuming container_investment_years[0] exists is brittle; consider a more robust way to obtain the year (e.g., from network metadata or a parsed name) and handle the case where no statistics are returned without indexing an empty list.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `statistics_functions` contract is inconsistent across the codebase: the top-level docstring and tests assume functions take a `NetworkCollection` and return a long-format `DataFrame`, while `Network_Processor._execute_function_for_variable` and the README now expect a `Network` and a `Series`; align the signature, return type, and tests to a single, clearly documented interface.
- The new implementations of `Final_Energy_by_*` call `n.statistics.energy_balance(...)`, but the tests still use a dummy object with no `statistics` attribute and expect a preformatted `DataFrame`, so either adapt the tests to use a minimal `pypsa.Network` (or a realistic mock of `statistics.energy_balance`) or reintroduce the placeholder behavior described in the docstrings.
- In `calculate_variables_values`, deriving `investment_year` from `n.name[-4:]` and assuming `container_investment_years[0]` exists is brittle; consider a more robust way to obtain the year (e.g., from network metadata or a parsed name) and handle the case where no statistics are returned without indexing an empty list.

## Individual Comments

### Comment 1
<location path="pypsa_validation_processing/class_definitions.py" line_range="226-235" />
<code_context>
+        container_investment_years = []
</code_context>
<issue_to_address>
**issue:** Handle the case where no values are computed to avoid IndexError on `container_investment_years[0]`.

If the network collection is empty or all variable functions return `None`, `container_investment_years` will be empty and accessing index 0 will raise an `IndexError`. Please guard this access (e.g., return early or allow `ds_with_values` to be `None` as before) when the list is empty.
</issue_to_address>

### Comment 2
<location path="pypsa_validation_processing/class_definitions.py" line_range="194-198" />
<code_context>
         df = df.rename(
             columns={k: v for k, v in col_renaming_dict.items() if k in df.columns}
         )
+        df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING)
         # drop columns not needed

</code_context>
<issue_to_address>
**suggestion:** Mapping units without a fallback can introduce NaNs for unmapped units.

`Series.map` will set any unit not in `UNITS_MAPPING` to `NaN`, which can break `IamDataFrame` construction or cause units to be dropped. If you only want to normalize known units, consider:

```python
df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING).fillna(df["unit_pypsa"])
```

This keeps unknown units while still applying the mapping where defined.

```suggestion
        df = df.rename(
             columns={k: v for k, v in col_renaming_dict.items() if k in df.columns}
         )
+        df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING).fillna(df["unit_pypsa"])
         # drop columns not needed
```
</issue_to_address>

### Comment 3
<location path="tests/test_statistics_functions.py" line_range="18-27" />
<code_context>
+REQUIRED_COLUMNS = {"variable", "unit", "year", "value"}
+
+
+class _DummyNetworkCollection:
+    """Minimal stand-in for pypsa.NetworkCollection used in unit tests.
+
+    The current statistics functions are placeholders and do not call any
+    methods on the network collection, so an empty class is sufficient.
+    """
+
+
+@pytest.fixture
+def dummy_network_collection():
+    return _DummyNetworkCollection()
+
</code_context>
<issue_to_address>
**issue (testing):** The dummy network type used in tests does not match the real function signature and will cause attribute errors.

These functions now accept a real `pypsa.Network` and call `n.statistics.energy_balance(...)`, but `_DummyNetworkCollection` has no `statistics` attribute, so these tests will raise `AttributeError` before reaching any assertions. To reflect the real contract, either:

- provide a minimal fake `Network` with a `statistics.energy_balance(...)` method returning controlled data, or
- use a real `pypsa.Network` and monkeypatch `statistics.energy_balance`.

This ensures the tests cover the actual integration with the `statistics` API rather than accepting any object.
</issue_to_address>

### Comment 4
<location path="README.md" line_range="71" />
<code_context>
+    ...
+```
+
+**The returned `Series` is of the structure of the direct outcome of a `pypsa.statistics` - Function.** It therefore must have a multi-level index that includes a level named `"unit"` so that the post-processing step can extract the unit information. It is possible to return multiple values with different units. This is then forwarded and further processed as two indipendend rows of the pyam.IamDataFrame.
+
+### Mapping File
</code_context>
<issue_to_address>
**issue (typo):** Fix the typo "indipendend" → "independent".

In the last sentence, change "two indipendend rows" to "two independent rows".

```suggestion
**The returned `Series` is of the structure of the direct outcome of a `pypsa.statistics` - Function.** It therefore must have a multi-level index that includes a level named `"unit"` so that the post-processing step can extract the unit information. It is possible to return multiple values with different units. This is then forwarded and further processed as two independent rows of the pyam.IamDataFrame.
```
</issue_to_address>

### Comment 5
<location path="README.md" line_range="84-85" />
<code_context>
+
+At runtime, `Network_Processor` reads this mapping, looks up the function for each defined variable, and calls it for every network in the collection.  Variables without a mapping entry are silently skipped. 
+
+### Register a new variable-statistics
+To register a new variable
+- add an entry to the mapping file
+- implement the corresponding function
</code_context>
<issue_to_address>
**suggestion (typo):** Improve grammar in the section title and introductory line for registering a new variable.

The heading and intro sentence are grammatically off. Consider renaming the heading to something like “Register a new variable statistic” or “Register a new variable statistics function,” and updating the intro line to “To register a new variable, you need to:” or “To register a new variable:” before the bullet list.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

maxnutz and others added 8 commits March 23, 2026 14:45
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
…ergy' of github.com:maxnutz/pypsa-validation_processing into 5-create-minimal-set-of-variable-functions-for-final-energy
…onding postprocessing steps; include general debugging steps
Copilot AI and others added 2 commits March 24, 2026 10:39
…ace] syntax and lock file v6

Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com>
Agent-Logs-Url: https://github.com/maxnutz/pypsa_validation_processing/sessions/1df2d786-39b3-430a-ba4a-f5e31fa8ffc9
…ariable-functions-for-fin

Fix CI: bump pixi to v0.66.0 to support `[workspace]` manifest and lock file v6
@maxnutz maxnutz merged commit 366333e into main Mar 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

create minimal set of variable-functions for final energy Create initial code-structure

2 participants