Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runoff in case runoff data are missed for some hydropower plants #757

Merged
merged 6 commits into from
Jul 3, 2023

Conversation

ekatef
Copy link
Collaborator

@ekatef ekatef commented Jun 12, 2023

PR fixes an issue appeared in a workflow for India with gadm clustering on. The problem is caused by a case when there are hydropower plants on islands, namely Kalpong hydro power plant for India, for which ERA5 can miss the data:

image

Changes proposed in this Pull Request

Hydropower indices are matched with plants names.

Checklist

  • I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@ekatef ekatef changed the title Fix runoff in case of projected power plants are present in data Fix runoff in case runoff data are missed for some hydropower plants Jun 12, 2023
@davide-f
Copy link
Member

Hello @ekatef :)

This PR sounds to me straightforward and thanks!

Is it complete in your opinion?

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a comment to add: we shall make sure the order is guaranteed when using plant and name.
when doing the assignment that may be lost and hence the correspondance between plant and its corresponding inflow may be lost

@ekatef
Copy link
Collaborator Author

ekatef commented Jun 14, 2023

Hello @davide-f and thanks for the review! :)
Agree on the comment and will look into it.

Basically, I don't have much to add to the PR, except it would be nice to have successful CI runs 🙂 Hopefully, #761 will resolve this weird issue with windows CI

@ekatef
Copy link
Collaborator Author

ekatef commented Jun 19, 2023

As far as I understand, correspondence between intersection_plants and idxs_to_keep vectors is ensured by defining the latter with a subsetting intersection_plants with a boolean pd.Series:

idxs_to_keep = inflow_buses[
inflow_buses.isin(intersection_plants)
].index

So, an order of elements in intersection_plants and idxs_to_keep should match by definition meaning that it's safe to use the first one for subsetting and the second one as the coordinates.

@davide-f could you please double-check if this does make sense for you? :)

@ekatef
Copy link
Collaborator Author

ekatef commented Jun 19, 2023

A couple of additional points that should be checked in context of this PR.

  1. Would be any additional warnings helpful? Currently a warning is triggered with missing_plants being not empty and the reported problem is perfectly consistent with the data issues the current PR is focused on. However, technically the current issue is caused by occurence of intersection_plants. So, the question is if there is anything special in the encountered issue which would be worth reporting?

  2. There are some entries in inflow.indexes["plant"] which look like they would have been inherited from the contested territories issue ('not found.14_1', 'not found.3_1'); it would be nice to understand a particular reason.

@ekatef
Copy link
Collaborator Author

ekatef commented Jun 20, 2023

Update by both the points above:

  1. A warning related to presence of missing_plants is enough to properly process this issue. If intersection_plants is non-empty is means only that there are some plants with available runoff data. Some naming revision can be beneficial to make it more clear.
  2. Names like not found* have propagated to the renewable profiles from regions_onshore.geojson. This problem is observed for IN but for some reason isn't for CN. Have added Occurence of "non-found" as shape_id in a regions_onshore shapefile for some countries #772 to track that

@ekatef
Copy link
Collaborator Author

ekatef commented Jun 20, 2023

@davide-f my feeling is that this PR is ready. Would be grateful if you could look into it :)

@ekatef
Copy link
Collaborator Author

ekatef commented Jun 25, 2023

As a side note, the error can be reproduced also for CD with alternative clustering. There is no runoff data for Ruzizi and two Rutshuru power plants. I suspect the reason is that these plants are located on the very border of CDR with Rwanda, as shown bellow. Which probably leads to the catchment areas not being captured properly when integrating ERA5 by area

image

idxs_to_keep = inflow_buses[
inflow_buses.isin(intersection_plants)
].index
idxs_to_keep = inflow_buses[inflow_buses.isin(plants_with_data)].index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to improve the readability of the code because it is easy to get lost with indices.
Something that Irealize now is that this code seems a duplication and we may need to debug it together.

Initially is defined as the values of a subset of inflow_buses, whose index are idxs_to_keep.
I think that to improve the readability, we may avoid defining idx_to_keep inside the if but outside.
plants with data may be the view: inflow_buses[inflow_buses.isin(inflow.indexes["plant"])] where its index is idxs_to_leep and its values are '''plants'''.
We may revise the code in that way so that it may be easier to understand.

What do you think? Let me know if the above is clear

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f agree that this change look very natural. However, I needed about an hour to understand it, so I absolutely aligned with your idea to make this fragment a bit more readable! :D

Have implemented a possible approach. Along with revision of idxs_to_keep definition, there were following changes:

  1. move reading PHS phs = ppl.query('technology == "Pumped Storage"') to a less over-crowded part;
  2. revise the names of the data frames ror and hydro to make it clear that both were derived from ppl and contain run-of-river and reservoir data, respectively;
  3. change inflow_idx to hydro_ppl_idx to avoid an association with inflow indices which can be a bit misleading.

Not sure if all these changes are really necessary. I'd suggest to select only those which are really needed. Could you please check if these changes do really allow to decrease reading time? :)

@ekatef
Copy link
Collaborator Author

ekatef commented Jul 3, 2023

Tested on "CD" and proved to resolve a problem which has previously arisen when alternative_clustering is on.

What I'm not sure about is updating of plants_to_keep: shouldn't it be included to this fragment along with updating network_buses_to_keep?

                plants_with_data = inflow_buses[inflow_buses.isin(plants_to_keep)]
                network_buses_to_keep = plants_with_data.index

@davide-f
Copy link
Member

davide-f commented Jul 3, 2023

Tested on "CD" and proved to resolve a problem which has previously arisen when alternative_clustering is on.

What I'm not sure about is updating of plants_to_keep: shouldn't it be included to this fragment along with updating network_buses_to_keep?

                plants_with_data = inflow_buses[inflow_buses.isin(plants_to_keep)]
                network_buses_to_keep = plants_with_data.index

For the sake of avoiding problems, that's a good point, you could add a row below to update the value of plants_to_keep.
From the point of view of functionality, for the current code, that's not needed as that value is anyway updated below.
However, I'd be prone to update the value to avoid potential problems in the future.
Good point :)

@ekatef
Copy link
Collaborator Author

ekatef commented Jul 3, 2023

For the sake of avoiding problems, that's a good point, you could add a row below to update the value of plants_to_keep. From the point of view of functionality, for the current code, that's not needed as that value is anyway updated below. However, I'd be prone to update the value to avoid potential problems in the future. Good point :)

Great! Thank you :) Then let's try this way. To me, that looks also a bit more consistent to keep updates of both network_buses_to_keep and plants_to_keep along each other

Sorry for the mess in commits. Actually, I can squash them after the final revision ;)

if not intersection_plants.empty:
# if there are any plants for which runoff data are available
if not plants_with_data.empty:
network_buses_to_keep = plants_with_data.index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello :)

I saw that you removed here the plants_to_keep = plants_with_data.to_numpy(), maybe we misunderstood.
I understood your proposal was to add its definition also there, without moving it from below.

Having the two definitions of the entries here ensures an easy understanding that the two values are of equal length and are consistent: plants_with_data represents indeed a mapping and with that, it is clear.
Having the definitions across the previous 20 lines unfortunately is not so clear.

I see two options:

  1. keep it was before
  2. or having it both here and above

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello :)

I saw that you removed here the plants_to_keep = plants_with_data.to_numpy(), maybe we misunderstood. I understood your proposal was to add its definition also there, without moving it from below.

Having the two definitions of the entries here ensures an easy understanding that the two values are of equal length and are consistent: plants_with_data represents indeed a mapping and with that, it is clear. Having the definitions across the previous 20 lines unfortunately is not so clear.

I see two options:

  1. keep it was before
  2. or having it both here and above

What do you think?

I have tired to avoid duplicating of plants_to_keep definition. But agree that it might be not so obvious to have a definition twenty lines above... :)

My personal preferences is to have plants_to_keep in both fragments, but also happy to keep the original version if you feel it more consistent

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekatef for me it's good to go :)
Let me know if you want to squash the last two commits or go as-is ;)

@ekatef
Copy link
Collaborator Author

ekatef commented Jul 3, 2023

@ekatef for me it's good to go :) Let me know if you want to squash the last two commits or go as-is ;)

Super! Squash done ;)

@davide-f davide-f merged commit 7900bc7 into pypsa-meets-earth:main Jul 3, 2023
4 checks passed
@davide-f
Copy link
Member

davide-f commented Jul 3, 2023

Merged :D

@ekatef
Copy link
Collaborator Author

ekatef commented Jul 3, 2023

Merged :D

Thank you so much ;)

@ekatef ekatef deleted the fix_in_runoff branch November 14, 2023 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants