Conversation
…ests Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com>
…le still reading in pypsa.NetworkCollection
…e-functions Complete Workflow including placeholder statistics-functions
…EADME docs, fix copilot-instructions Code Structure Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com>
…te-country-codes
8 tasks
Contributor
Reviewer's GuideImplements the minimal statistics-function framework and integration for final energy variables, wiring PyPSA network results through variable-specific functions into an IAMC-formatted pyam.IamDataFrame, adds EU27 metadata and unit mapping utilities, config defaults, and basic tests. Sequence diagram for per-network variable calculation and aggregationsequenceDiagram
participant workflow_py as workflow_py
participant np as Network_Processor
participant cfg as config_yaml
participant nc as pypsa_NetworkCollection
participant n as pypsa_Network
participant sf as statistics_functions_py
participant utils as utils_py
participant dsd as nomenclature_DSD
participant pyam_df as pyam_IamDataFrame
workflow_py->>np: __init__(path_config)
np->>cfg: read config file
np->>np: _read_pypsa_network_collection()
np-->>nc: pypsa_NetworkCollection(file_list)
np->>dsd: read_definitions()
np->>np: _read_mappings() -> functions_dict
loop for each investment_year_network in network_collection
np->>nc: get network by index
nc-->>n: pypsa_Network
loop for each variable in dsd.variable
np->>sf: call mapped function(n)
activate sf
sf->>n: n.statistics.energy_balance(...)
n-->>sf: pd_Series result
deactivate sf
np->>np: _postprocess_statistics_result(variable, result)
np-->>np: pd_DataFrame year_df_partial
end
np->>np: concat partial results -> year_df
np-->>np: year_df with column investment_year
end
np->>np: merge year_df for all years -> ds_with_values
np->>utils: EU27_COUNTRY_CODES, UNITS_MAPPING
np->>np: structure_pyam_from_pandas(ds_with_values)
np-->>pyam_df: pyam_IamDataFrame
np-->>workflow_py: dsd_with_values (pyam_IamDataFrame)
Class diagram for updated Network_Processor, utils, and statistics_functions integrationclassDiagram
class Network_Processor {
+path_config: pathlib_Path
+config: dict
+country: str
+definition_path: pathlib_Path
+mapping_path: pathlib_Path
+output_path: pathlib_Path
+network_results_path: pathlib_Path
+model_name: str
+scenario_name: str
+network_collection: pypsa_NetworkCollection
+dsd: nomenclature_DataStructureDefinition
+functions_dict: dict
+dsd_with_values: pyam_IamDataFrame
+path_dsd_with_values: pathlib_Path
+__init__(path_config: pathlib_Path)
+_read_config() dict
+_read_mappings() dict
+_read_pypsa_network_collection() pypsa_NetworkCollection
+read_definitions() nomenclature_DataStructureDefinition
+_execute_function_for_variable(variable: str, n: pypsa_Network) pd_Series
+_postprocess_statistics_result(variable: str, result: pd_Series) pd_DataFrame
+structure_pyam_from_pandas(df: pd_DataFrame) pyam_IamDataFrame
+calculate_variables_values() None
+write_output_to_xlsx() None
}
class statistics_functions_py {
+Final_Energy_by_Carrier__Electricity(n: pypsa_Network) pd_DataFrame
+Final_Energy_by_Sector__Transportation(n: pypsa_Network) pd_DataFrame
}
class utils_py {
+EU27_COUNTRY_CODES: dict~str,str~
+UNITS_MAPPING: dict~str,str~
}
class config_default_yaml {
+country: str
+definitions_path: str
+output_path: str
+network_results_path: str
+model_name: str
+scenario_name: str
}
class mapping_default_yaml {
+Final_Energy_by_Carrier__Electricity: str
+Final_Energy_by_Sector__Transportation: str
}
class pypsa_NetworkCollection
class pypsa_Network {
+name: str
+statistics: pypsa_StatisticsAccessor
}
class pypsa_StatisticsAccessor {
+energy_balance(components: list, carrier: str, groupby: list) pd_Series
}
class nomenclature_DataStructureDefinition {
+variable: pd_Series
}
class pyam_IamDataFrame
class pd_DataFrame
class pd_Series
Network_Processor --> pypsa_NetworkCollection : owns
Network_Processor --> nomenclature_DataStructureDefinition : uses
Network_Processor --> pyam_IamDataFrame : creates
Network_Processor --> statistics_functions_py : calls
Network_Processor --> utils_py : imports
Network_Processor --> config_default_yaml : reads
Network_Processor --> mapping_default_yaml : reads
pypsa_NetworkCollection --> pypsa_Network : contains
pypsa_Network --> pypsa_StatisticsAccessor : has
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- The
statistics_functionscontract is inconsistent across the codebase: the top-level docstring and tests assume functions take aNetworkCollectionand return a long-formatDataFrame, whileNetwork_Processor._execute_function_for_variableand the README now expect aNetworkand aSeries; align the signature, return type, and tests to a single, clearly documented interface. - The new implementations of
Final_Energy_by_*calln.statistics.energy_balance(...), but the tests still use a dummy object with nostatisticsattribute and expect a preformattedDataFrame, so either adapt the tests to use a minimalpypsa.Network(or a realistic mock ofstatistics.energy_balance) or reintroduce the placeholder behavior described in the docstrings. - In
calculate_variables_values, derivinginvestment_yearfromn.name[-4:]and assumingcontainer_investment_years[0]exists is brittle; consider a more robust way to obtain the year (e.g., from network metadata or a parsed name) and handle the case where no statistics are returned without indexing an empty list.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `statistics_functions` contract is inconsistent across the codebase: the top-level docstring and tests assume functions take a `NetworkCollection` and return a long-format `DataFrame`, while `Network_Processor._execute_function_for_variable` and the README now expect a `Network` and a `Series`; align the signature, return type, and tests to a single, clearly documented interface.
- The new implementations of `Final_Energy_by_*` call `n.statistics.energy_balance(...)`, but the tests still use a dummy object with no `statistics` attribute and expect a preformatted `DataFrame`, so either adapt the tests to use a minimal `pypsa.Network` (or a realistic mock of `statistics.energy_balance`) or reintroduce the placeholder behavior described in the docstrings.
- In `calculate_variables_values`, deriving `investment_year` from `n.name[-4:]` and assuming `container_investment_years[0]` exists is brittle; consider a more robust way to obtain the year (e.g., from network metadata or a parsed name) and handle the case where no statistics are returned without indexing an empty list.
## Individual Comments
### Comment 1
<location path="pypsa_validation_processing/class_definitions.py" line_range="226-235" />
<code_context>
+ container_investment_years = []
</code_context>
<issue_to_address>
**issue:** Handle the case where no values are computed to avoid IndexError on `container_investment_years[0]`.
If the network collection is empty or all variable functions return `None`, `container_investment_years` will be empty and accessing index 0 will raise an `IndexError`. Please guard this access (e.g., return early or allow `ds_with_values` to be `None` as before) when the list is empty.
</issue_to_address>
### Comment 2
<location path="pypsa_validation_processing/class_definitions.py" line_range="194-198" />
<code_context>
df = df.rename(
columns={k: v for k, v in col_renaming_dict.items() if k in df.columns}
)
+ df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING)
# drop columns not needed
</code_context>
<issue_to_address>
**suggestion:** Mapping units without a fallback can introduce NaNs for unmapped units.
`Series.map` will set any unit not in `UNITS_MAPPING` to `NaN`, which can break `IamDataFrame` construction or cause units to be dropped. If you only want to normalize known units, consider:
```python
df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING).fillna(df["unit_pypsa"])
```
This keeps unknown units while still applying the mapping where defined.
```suggestion
df = df.rename(
columns={k: v for k, v in col_renaming_dict.items() if k in df.columns}
)
+ df["unit_pypsa"] = df["unit_pypsa"].map(UNITS_MAPPING).fillna(df["unit_pypsa"])
# drop columns not needed
```
</issue_to_address>
### Comment 3
<location path="tests/test_statistics_functions.py" line_range="18-27" />
<code_context>
+REQUIRED_COLUMNS = {"variable", "unit", "year", "value"}
+
+
+class _DummyNetworkCollection:
+ """Minimal stand-in for pypsa.NetworkCollection used in unit tests.
+
+ The current statistics functions are placeholders and do not call any
+ methods on the network collection, so an empty class is sufficient.
+ """
+
+
+@pytest.fixture
+def dummy_network_collection():
+ return _DummyNetworkCollection()
+
</code_context>
<issue_to_address>
**issue (testing):** The dummy network type used in tests does not match the real function signature and will cause attribute errors.
These functions now accept a real `pypsa.Network` and call `n.statistics.energy_balance(...)`, but `_DummyNetworkCollection` has no `statistics` attribute, so these tests will raise `AttributeError` before reaching any assertions. To reflect the real contract, either:
- provide a minimal fake `Network` with a `statistics.energy_balance(...)` method returning controlled data, or
- use a real `pypsa.Network` and monkeypatch `statistics.energy_balance`.
This ensures the tests cover the actual integration with the `statistics` API rather than accepting any object.
</issue_to_address>
### Comment 4
<location path="README.md" line_range="71" />
<code_context>
+ ...
+```
+
+**The returned `Series` is of the structure of the direct outcome of a `pypsa.statistics` - Function.** It therefore must have a multi-level index that includes a level named `"unit"` so that the post-processing step can extract the unit information. It is possible to return multiple values with different units. This is then forwarded and further processed as two indipendend rows of the pyam.IamDataFrame.
+
+### Mapping File
</code_context>
<issue_to_address>
**issue (typo):** Fix the typo "indipendend" → "independent".
In the last sentence, change "two indipendend rows" to "two independent rows".
```suggestion
**The returned `Series` is of the structure of the direct outcome of a `pypsa.statistics` - Function.** It therefore must have a multi-level index that includes a level named `"unit"` so that the post-processing step can extract the unit information. It is possible to return multiple values with different units. This is then forwarded and further processed as two independent rows of the pyam.IamDataFrame.
```
</issue_to_address>
### Comment 5
<location path="README.md" line_range="84-85" />
<code_context>
+
+At runtime, `Network_Processor` reads this mapping, looks up the function for each defined variable, and calls it for every network in the collection. Variables without a mapping entry are silently skipped.
+
+### Register a new variable-statistics
+To register a new variable
+- add an entry to the mapping file
+- implement the corresponding function
</code_context>
<issue_to_address>
**suggestion (typo):** Improve grammar in the section title and introductory line for registering a new variable.
The heading and intro sentence are grammatically off. Consider renaming the heading to something like “Register a new variable statistic” or “Register a new variable statistics function,” and updating the intro line to “To register a new variable, you need to:” or “To register a new variable:” before the bullet list.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
…ergy' of github.com:maxnutz/pypsa-validation_processing into 5-create-minimal-set-of-variable-functions-for-final-energy
…onding postprocessing steps; include general debugging steps
…ace] syntax and lock file v6 Co-authored-by: maxnutz <81740567+maxnutz@users.noreply.github.com> Agent-Logs-Url: https://github.com/maxnutz/pypsa_validation_processing/sessions/1df2d786-39b3-430a-ba4a-f5e31fa8ffc9
…ariable-functions-for-fin Fix CI: bump pixi to v0.66.0 to support `[workspace]` manifest and lock file v6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Basis structure
Minimal set of Statistics-Functions
statistic_functions.pyand mapping to IAMC-formatted variable needs to be added toconfigs/mapping.default.yamlNote
statistics functions are executed for a single pypsa-Network, NOT for a NetworkCollection, as some statistics-parameters are not yet applicable to NetworkCollections.
Summary by Sourcery
Add initial final energy statistics functions and integrate them into the PyPSA-to-IAMC processing workflow, including basic configuration, utilities, and tests.
New Features:
Enhancements:
Tests: