Fix runoff in case runoff data are missed for some hydropower plants #757

ekatef · 2023-06-12T02:37:06Z

PR fixes an issue appeared in a workflow for India with gadm clustering on. The problem is caused by a case when there are hydropower plants on islands, namely Kalpong hydro power plant for India, for which ERA5 can miss the data:

Changes proposed in this Pull Request

Hydropower indices are matched with plants names.

Checklist

I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
I tested my contribution locally and it seems to work fine.
Code and workflow changes are sufficiently documented.
Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

davide-f · 2023-06-14T08:25:40Z

Hello @ekatef :)

This PR sounds to me straightforward and thanks!

Is it complete in your opinion?

davide-f

I have a comment to add: we shall make sure the order is guaranteed when using plant and name.
when doing the assignment that may be lost and hence the correspondance between plant and its corresponding inflow may be lost

ekatef · 2023-06-14T09:19:26Z

Hello @davide-f and thanks for the review! :)
Agree on the comment and will look into it.

Basically, I don't have much to add to the PR, except it would be nice to have successful CI runs 🙂 Hopefully, #761 will resolve this weird issue with windows CI

ekatef · 2023-06-19T21:42:38Z

As far as I understand, correspondence between intersection_plants and idxs_to_keep vectors is ensured by defining the latter with a subsetting intersection_plants with a boolean pd.Series:

pypsa-earth/scripts/add_electricity.py

Lines 468 to 470 in 20e5061

    
           idxs_to_keep = inflow_buses[ 
        
               inflow_buses.isin(intersection_plants) 
        
           ].index

So, an order of elements in intersection_plants and idxs_to_keep should match by definition meaning that it's safe to use the first one for subsetting and the second one as the coordinates.

@davide-f could you please double-check if this does make sense for you? :)

ekatef · 2023-06-19T22:09:01Z

A couple of additional points that should be checked in context of this PR.

Would be any additional warnings helpful? Currently a warning is triggered with missing_plants being not empty and the reported problem is perfectly consistent with the data issues the current PR is focused on. However, technically the current issue is caused by occurence of intersection_plants. So, the question is if there is anything special in the encountered issue which would be worth reporting?
There are some entries in inflow.indexes["plant"] which look like they would have been inherited from the contested territories issue ('not found.14_1', 'not found.3_1'); it would be nice to understand a particular reason.

ekatef · 2023-06-20T13:58:35Z

Update by both the points above:

A warning related to presence of missing_plants is enough to properly process this issue. If intersection_plants is non-empty is means only that there are some plants with available runoff data. Some naming revision can be beneficial to make it more clear.
Names like not found* have propagated to the renewable profiles from regions_onshore.geojson. This problem is observed for IN but for some reason isn't for CN. Have added Occurence of "non-found" as shape_id in a regions_onshore shapefile for some countries #772 to track that

ekatef · 2023-06-20T13:59:56Z

@davide-f my feeling is that this PR is ready. Would be grateful if you could look into it :)

ekatef · 2023-06-25T15:30:27Z

As a side note, the error can be reproduced also for CD with alternative clustering. There is no runoff data for Ruzizi and two Rutshuru power plants. I suspect the reason is that these plants are located on the very border of CDR with Rwanda, as shown bellow. Which probably leads to the catchment areas not being captured properly when integrating ERA5 by area

davide-f · 2023-06-28T22:33:30Z

scripts/add_electricity.py

-                idxs_to_keep = inflow_buses[
-                    inflow_buses.isin(intersection_plants)
-                ].index
+                idxs_to_keep = inflow_buses[inflow_buses.isin(plants_with_data)].index


I think we need to improve the readability of the code because it is easy to get lost with indices.
Something that Irealize now is that this code seems a duplication and we may need to debug it together.

Initially is defined as the values of a subset of inflow_buses, whose index are idxs_to_keep.
I think that to improve the readability, we may avoid defining idx_to_keep inside the if but outside.
plants with data may be the view: inflow_buses[inflow_buses.isin(inflow.indexes["plant"])] where its index is idxs_to_leep and its values are '''plants'''.
We may revise the code in that way so that it may be easier to understand.

What do you think? Let me know if the above is clear

@davide-f agree that this change look very natural. However, I needed about an hour to understand it, so I absolutely aligned with your idea to make this fragment a bit more readable! :D

Have implemented a possible approach. Along with revision of idxs_to_keep definition, there were following changes:

move reading PHS phs = ppl.query('technology == "Pumped Storage"') to a less over-crowded part;

revise the names of the data frames ror and hydro to make it clear that both were derived from ppl and contain run-of-river and reservoir data, respectively;

change inflow_idx to hydro_ppl_idx to avoid an association with inflow indices which can be a bit misleading.

Not sure if all these changes are really necessary. I'd suggest to select only those which are really needed. Could you please check if these changes do really allow to decrease reading time? :)

ekatef · 2023-07-03T13:20:53Z

Tested on "CD" and proved to resolve a problem which has previously arisen when alternative_clustering is on.

What I'm not sure about is updating of plants_to_keep: shouldn't it be included to this fragment along with updating network_buses_to_keep?

                plants_with_data = inflow_buses[inflow_buses.isin(plants_to_keep)]
                network_buses_to_keep = plants_with_data.index

davide-f · 2023-07-03T14:10:08Z

Tested on "CD" and proved to resolve a problem which has previously arisen when alternative_clustering is on.

What I'm not sure about is updating of plants_to_keep: shouldn't it be included to this fragment along with updating network_buses_to_keep?
                plants_with_data = inflow_buses[inflow_buses.isin(plants_to_keep)]
                network_buses_to_keep = plants_with_data.index

For the sake of avoiding problems, that's a good point, you could add a row below to update the value of plants_to_keep.
From the point of view of functionality, for the current code, that's not needed as that value is anyway updated below.
However, I'd be prone to update the value to avoid potential problems in the future.
Good point :)

ekatef · 2023-07-03T14:26:24Z

For the sake of avoiding problems, that's a good point, you could add a row below to update the value of plants_to_keep. From the point of view of functionality, for the current code, that's not needed as that value is anyway updated below. However, I'd be prone to update the value to avoid potential problems in the future. Good point :)

Great! Thank you :) Then let's try this way. To me, that looks also a bit more consistent to keep updates of both network_buses_to_keep and plants_to_keep along each other

Sorry for the mess in commits. Actually, I can squash them after the final revision ;)

davide-f · 2023-07-03T16:29:10Z

scripts/add_electricity.py

-            if not intersection_plants.empty:
+            # if there are any plants for which runoff data are available
+            if not plants_with_data.empty:
+                network_buses_to_keep = plants_with_data.index


Hello :)

I saw that you removed here the plants_to_keep = plants_with_data.to_numpy(), maybe we misunderstood.
I understood your proposal was to add its definition also there, without moving it from below.

Having the two definitions of the entries here ensures an easy understanding that the two values are of equal length and are consistent: plants_with_data represents indeed a mapping and with that, it is clear.
Having the definitions across the previous 20 lines unfortunately is not so clear.

I see two options:

keep it was before

or having it both here and above

What do you think?

Hello :)

I saw that you removed here the plants_to_keep = plants_with_data.to_numpy(), maybe we misunderstood. I understood your proposal was to add its definition also there, without moving it from below.

Having the two definitions of the entries here ensures an easy understanding that the two values are of equal length and are consistent: plants_with_data represents indeed a mapping and with that, it is clear. Having the definitions across the previous 20 lines unfortunately is not so clear.

I see two options:

keep it was before

or having it both here and above

What do you think?

I have tired to avoid duplicating of plants_to_keep definition. But agree that it might be not so obvious to have a definition twenty lines above... :)

My personal preferences is to have plants_to_keep in both fragments, but also happy to keep the original version if you feel it more consistent

davide-f

@ekatef for me it's good to go :)
Let me know if you want to squash the last two commits or go as-is ;)

ekatef · 2023-07-03T21:20:25Z

@ekatef for me it's good to go :) Let me know if you want to squash the last two commits or go as-is ;)

Super! Squash done ;)

davide-f · 2023-07-03T21:28:02Z

Merged :D

ekatef · 2023-07-03T21:28:57Z

Merged :D

Thank you so much ;)

Fix for IN with gadm clustering

9275f93

ekatef changed the title ~~Fix runoff in case of projected power plants are present in data~~ Fix runoff in case runoff data are missed for some hydropower plants Jun 12, 2023

Fix names assignment for a case when nothing is missed

1276668

davide-f approved these changes Jun 14, 2023

View reviewed changes

davide-f reviewed Jun 14, 2023

View reviewed changes

Merge branch 'main' into fix_in_runoff

16d9d55

Trying to improve naming

30e1695

davide-f reviewed Jun 28, 2023

View reviewed changes

ekatef force-pushed the fix_in_runoff branch from 48c0fd1 to 30e1695 Compare July 3, 2023 11:12

Revise definitions of the indices

2324a73

davide-f reviewed Jul 3, 2023

View reviewed changes

davide-f approved these changes Jul 3, 2023

View reviewed changes

Add definition of plants_to_keep

0a41411

ekatef force-pushed the fix_in_runoff branch from ecf9dce to 0a41411 Compare July 3, 2023 21:18

davide-f merged commit 7900bc7 into pypsa-meets-earth:main Jul 3, 2023
4 checks passed

ekatef deleted the fix_in_runoff branch November 14, 2023 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix runoff in case runoff data are missed for some hydropower plants #757

Fix runoff in case runoff data are missed for some hydropower plants #757

ekatef commented Jun 12, 2023 •

edited

Loading

davide-f commented Jun 14, 2023

davide-f left a comment

ekatef commented Jun 14, 2023

ekatef commented Jun 19, 2023

ekatef commented Jun 19, 2023 •

edited

Loading

ekatef commented Jun 20, 2023

ekatef commented Jun 20, 2023

ekatef commented Jun 25, 2023

davide-f Jun 28, 2023

ekatef Jun 30, 2023

ekatef commented Jul 3, 2023

davide-f commented Jul 3, 2023

ekatef commented Jul 3, 2023

davide-f Jul 3, 2023

ekatef Jul 3, 2023

davide-f left a comment

ekatef commented Jul 3, 2023

davide-f commented Jul 3, 2023

ekatef commented Jul 3, 2023

Fix runoff in case runoff data are missed for some hydropower plants #757

Fix runoff in case runoff data are missed for some hydropower plants #757

Conversation

ekatef commented Jun 12, 2023 • edited Loading

Changes proposed in this Pull Request

Checklist

davide-f commented Jun 14, 2023

davide-f left a comment

Choose a reason for hiding this comment

ekatef commented Jun 14, 2023

ekatef commented Jun 19, 2023

ekatef commented Jun 19, 2023 • edited Loading

ekatef commented Jun 20, 2023

ekatef commented Jun 20, 2023

ekatef commented Jun 25, 2023

davide-f Jun 28, 2023

Choose a reason for hiding this comment

ekatef Jun 30, 2023

Choose a reason for hiding this comment

ekatef commented Jul 3, 2023

davide-f commented Jul 3, 2023

ekatef commented Jul 3, 2023

davide-f Jul 3, 2023

Choose a reason for hiding this comment

ekatef Jul 3, 2023

Choose a reason for hiding this comment

davide-f left a comment

Choose a reason for hiding this comment

ekatef commented Jul 3, 2023

davide-f commented Jul 3, 2023

ekatef commented Jul 3, 2023

ekatef commented Jun 12, 2023 •

edited

Loading

ekatef commented Jun 19, 2023 •

edited

Loading