-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix geometry issues - clean version #563
Conversation
for more information, see https://pre-commit.ci
A test on ["PK", "KG"] works |
Tests both on ["PK"] and ["PK", "KG"] was successful. I think it's ready for review now.
If we are intended to keep this output, it may be better to add a proper snakemake output. But I'm not sure if it's really necessary as it looks like such situations are quite rare.
Here changes in a way of applying @davide-f, would be grateful for your opinion on this and the review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments, I fear there have been some misunderstanding; hope not
scripts/_helpers.py
Outdated
@@ -551,7 +551,7 @@ def country_name_2_two_digits(country_name): | |||
2-digit country name | |||
""" | |||
if ( | |||
country_name | |||
country_name.any() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no the intended behavior. It should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! So, your answer to my second questions (attached to the review notification) is "no". Agree, the where
modification doesn't look great. Commit reverted
scripts/build_osm_network.py
Outdated
no_data_countries = set(country_list).difference(set(bus_country_list)) | ||
# it may happen that bus_country_list contains entries not relevant as a country name (e.g. "not found") | ||
# difference can't give negative values; the following will return only releant country names | ||
no_data_countries = set(country_list).difference(set(bus_country_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be worth using symmetric_difference here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we can't handle two difference cases as symmetrical:
-
set(country_list).difference(set(bus_country_list))
= countries from thecountries
parameter of the config which don't have any data to restore a buses dataframe [meaning that we need to generate some data for such areas] -
set(bus_country_list).difference(set(country_list)
) = countries from the buses dataframe which are not in thecountries
from the config and hence were not requested by the user to be included into the model [which basically means that something went wrong in the workflow before]
Initially the following code chunk was intended to address the first situation but the condition captured both of them being len(bus_country_list) != len(country_list)
( see
pypsa-earth/scripts/build_osm_network.py
Line 833 in fa799bf
if len(bus_country_list) != len(country_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My proposal would be to use:
difference_countries = set(country_list).symmetric_difference(set(bus_country_list))
difference_countries should be empty if the two sets match, othewise difference_countries contains the items that are missing in one set or in the other one
if country list is ["AG"]and bus_country_list is ["AG", "Something"], the current revision doesn't catch the difference while the previousone using len does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, if the case you mention arise, it should be fixed
But the approach should be different. Would it be probably a good idea to have an additional check on set(bus_country_list).difference(set(country_list))
and throw an error if it happens? Because this would mean that our attempt to fix it in build_shapes
has failed
scripts/build_shapes.py
Outdated
return row["GID_0"] | ||
|
||
|
||
def build_gadm_df(file, layer, cc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there has been a misunderstanding; I was proposing to move to a separate function just the filtering of the GID_0 component.
Basically, keeping the get_GADM_layer as it was and there only adding a line or two like these:
if {check config option drop}:
geodf.drop(list of indices not matching, axis=0, inplace=True);
elif {check config option set to country}:
geodf["COUNTRY"] = country_code
if more lines are needed, we may define a function that contains the rows above and beyond ( e.g. the output of the non-standard zones). By default, the output file shall not be saved
def filter_gadm_flag(geodf, config, save_non_standard_geo=False):
.... stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm... Agree that the code could be better structured but don't quite get what your idea
Besides, I'm not sure that drop
is a good default option. Apart of ethical concerns, it'd require some additional changes in the code. (I'll attach a picture to the main PR conversation to explain what is the matter)
So, I suggest to focus in this PR on following GADM conventions with introducing custom_prescribe
and (probably) drop
option as the next step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think too that by default shapes shouldn't be dropped, it was just a simple proposal to provide 2 options
scripts/build_shapes.py
Outdated
@@ -267,8 +298,10 @@ def eez(countries, geo_crs, country_shapes, EEZ_gpkg, out_logging=False, distanc | |||
) | |||
|
|||
ret_df = ret_df.apply(lambda x: make_valid(x)) | |||
country_shapes = country_shapes.apply(lambda x: make_valid(x)) | |||
# country_shapes may consist of different geometries which need to be united | |||
country_shapes = country_shapes.apply(lambda x: make_valid(x)).unary_union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why unary_union?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid having multiple offshore shapes which is the case if the onshore dataframe for a country contains multiple geometries
Each non-standard GADM code leads to an additional geometry entry in the countries_shape. For some reasons this results is duplication of the offshore shapes for this country when calculating the difference in ret_df.difference(country_shapes_with_buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was because the unary_union merges all shapes of all countries and I was wondering if that introduces (a) additional computational time that may not be required and (b) alter the results as a single eez of one country would be compared to a merged shape by the unary_union.
Have you tested it with multiple countries and check that the output shapes are ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have tested it and it looks ok. But I'd be happy to understand the underlying behaviour of difference
and unary_union
better. It looks like there is some implicit grouping by country name and looping inside it. Regarding performance, would be very interested in your opinion. This aspect I haven't (yet) considered properly
This reverts commit 75a581c.
Fix technical error
@davide-f, thanks a lot for the review. I'd say it's an iterative work process. Which seems to be converging :) Regarding points in greens are buses, points in reds are powerplants extracted from PPM So, it looks like when dropping the non-standard areas we may need to add further modifications along the workflow: e.g. we may need to fix generators which belong to the requested area but can't be located there |
…ountry_geom_fixes_clean
for more information, see https://pre-commit.ci
…ountry_geom_fixes_clean
for more information, see https://pre-commit.ci
Thanks @ekatef ! :D |
…ountry_geom_fixes_clean
for more information, see https://pre-commit.ci
Thank you for your guidance @davide-f :) Result of squashing is in #570. That is the result of Could you please have a look? |
Can we close this PR? |
@davide-f, yes absolutely! :) |
Closed to be finalized in #572 |
Changes proposed in this Pull Request
That is a cleaned version of #532
Checklist
envs/environment.yaml
andenvs/environment.docs.yaml
.config.default.yaml
andconfig.tutorial.yaml
.test/
(note tests are changing the config.tutorial.yaml)doc/configtables/*.csv
and line references are adjusted indoc/configuration.rst
anddoc/tutorial.rst
.doc/release_notes.rst
is amended in the format of previous release notes, including reference to the requested PR.