Fix geometry issues - clean version #563

ekatef · 2023-01-10T22:43:22Z

Changes proposed in this Pull Request

That is a cleaned version of #532

Checklist

I tested my contribution locally and it seems to work fine.
Code and workflow changes are sufficiently documented.
Newly introduced dependencies are added to envs/environment.yaml and envs/environment.docs.yaml.
Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

for more information, see https://pre-commit.ci

ekatef · 2023-01-10T22:44:01Z

A test on ["PK", "KG"] works

…ountry_geom_fixes_clean

for more information, see https://pre-commit.ci

…ountry_geom_fixes_clean

ekatef · 2023-01-12T21:07:04Z

Tests both on ["PK"] and ["PK", "KG"] was successful. I think it's ready for review now.
There are a couple of questions left.

do we really need to keep output of the non-standard regions to a file? Currently it's done with writing a csv without proper Snakemake declarations:

https://github.com/ekatef/pypsa-earth/blob/53c2c04b7555ad054d66742bc2375f7d1d956a19/scripts/build_shapes.py#L114-L121

If we are intended to keep this output, it may be better to add a proper snakemake output. But I'm not sure if it's really necessary as it looks like such situations are quite rare.

when implementing where instead of apply it was found that to do so there may be a need to vectorise the name-transforming functions:

75a581c

Here changes in a way of applying country_name_2_two_digits required to add any to the function definition to make it work. Obviously, more throughout revision of country_name_2_two_digits code is needed to make if work properly both on a single string and on a list of strings. But again, I'm not sure if it's worth it :)

@davide-f, would be grateful for your opinion on this and the review

davide-f

I've added some comments, I fear there have been some misunderstanding; hope not

README.md

davide-f · 2023-01-12T21:31:44Z

scripts/_helpers.py

@@ -551,7 +551,7 @@ def country_name_2_two_digits(country_name):
        2-digit country name
    """
    if (
-        country_name
+        country_name.any()


This is no the intended behavior. It should be removed

Okay! So, your answer to my second questions (attached to the review notification) is "no". Agree, the where modification doesn't look great. Commit reverted

scripts/build_bus_regions.py

davide-f · 2023-01-12T21:36:36Z

scripts/build_osm_network.py

-        no_data_countries = set(country_list).difference(set(bus_country_list))
+    # it may happen that bus_country_list contains entries not relevant as a country name (e.g. "not found")
+    # difference can't give negative values; the following will return only releant country names
+    no_data_countries = set(country_list).difference(set(bus_country_list))


may be worth using symmetric_difference here

Because we can't handle two difference cases as symmetrical:

set(country_list).difference(set(bus_country_list)) = countries from the countries parameter of the config which don't have any data to restore a buses dataframe [meaning that we need to generate some data for such areas]

set(bus_country_list).difference(set(country_list)) = countries from the buses dataframe which are not in the countries from the config and hence were not requested by the user to be included into the model [which basically means that something went wrong in the workflow before]

Initially the following code chunk was intended to address the first situation but the condition captured both of them being len(bus_country_list) != len(country_list) ( see

pypsa-earth/scripts/build_osm_network.py

Line 833 in fa799bf

if len(bus_country_list) != len(country_list):

). That lead to troubles when a non-standard code appeared in the buses list. So, I had to introduce a fix exactly to avoid mixing two situations. (Sorry for not being clear enough when explaining it in our discussion!)

My proposal would be to use:
difference_countries = set(country_list).symmetric_difference(set(bus_country_list))
difference_countries should be empty if the two sets match, othewise difference_countries contains the items that are missing in one set or in the other one

if country list is ["AG"]and bus_country_list is ["AG", "Something"], the current revision doesn't catch the difference while the previousone using len does

Agree, if the case you mention arise, it should be fixed

But the approach should be different. Would it be probably a good idea to have an additional check on set(bus_country_list).difference(set(country_list)) and throw an error if it happens? Because this would mean that our attempt to fix it in build_shapes has failed

davide-f · 2023-01-12T21:48:20Z

scripts/build_shapes.py

+        return row["GID_0"]
+
+
+def build_gadm_df(file, layer, cc):


Maybe there has been a misunderstanding; I was proposing to move to a separate function just the filtering of the GID_0 component.

Basically, keeping the get_GADM_layer as it was and there only adding a line or two like these:

if {check config option drop}: geodf.drop(list of indices not matching, axis=0, inplace=True); elif {check config option set to country}: geodf["COUNTRY"] = country_code

if more lines are needed, we may define a function that contains the rows above and beyond ( e.g. the output of the non-standard zones). By default, the output file shall not be saved

def filter_gadm_flag(geodf, config, save_non_standard_geo=False): .... stuff

Mmmm... Agree that the code could be better structured but don't quite get what your idea

Besides, I'm not sure that drop is a good default option. Apart of ethical concerns, it'd require some additional changes in the code. (I'll attach a picture to the main PR conversation to explain what is the matter)

So, I suggest to focus in this PR on following GADM conventions with introducing custom_prescribe and (probably) drop option as the next step

I think too that by default shapes shouldn't be dropped, it was just a simple proposal to provide 2 options

davide-f · 2023-01-12T21:51:00Z

scripts/build_shapes.py

@@ -267,8 +298,10 @@ def eez(countries, geo_crs, country_shapes, EEZ_gpkg, out_logging=False, distanc
    )

    ret_df = ret_df.apply(lambda x: make_valid(x))
-    country_shapes = country_shapes.apply(lambda x: make_valid(x))
+    # country_shapes may consist of different geometries which need to be united
+    country_shapes = country_shapes.apply(lambda x: make_valid(x)).unary_union


why unary_union?

to avoid having multiple offshore shapes which is the case if the onshore dataframe for a country contains multiple geometries

Each non-standard GADM code leads to an additional geometry entry in the countries_shape. For some reasons this results is duplication of the offshore shapes for this country when calculating the difference in ret_df.difference(country_shapes_with_buffer)

My question was because the unary_union merges all shapes of all countries and I was wondering if that introduces (a) additional computational time that may not be required and (b) alter the results as a single eez of one country would be compared to a merged shape by the unary_union.
Have you tested it with multiple countries and check that the output shapes are ok?

Yes, I have tested it and it looks ok. But I'd be happy to understand the underlying behaviour of difference and unary_union better. It looks like there is some implicit grouping by country name and looping inside it. Regarding performance, would be very interested in your opinion. This aspect I haven't (yet) considered properly

This reverts commit 75a581c.

Fix technical error

ekatef · 2023-01-14T18:16:56Z

I've added some comments, I fear there have been some misunderstanding; hope not

@davide-f, thanks a lot for the review. I'd say it's an iterative work process. Which seems to be converging :)

Regarding drop option: the point is that some of the contended areas contain substations and/or power plants:

points in greens are buses, points in reds are powerplants extracted from PPM

So, it looks like when dropping the non-standard areas we may need to add further modifications along the workflow: e.g. we may need to fix generators which belong to the requested area but can't be located there

…ountry_geom_fixes_clean

for more information, see https://pre-commit.ci

…ountry_geom_fixes_clean

for more information, see https://pre-commit.ci

davide-f · 2023-01-21T16:19:55Z

Thanks @ekatef ! :D
I think it's a good base, I think I can finalize the details starting from here.
I can take over for little changes, but first would you mind to do some squashing to reduce the history?
Once that is done, I'd like to add few commits, then we comment the PR and finalize

…ountry_geom_fixes_clean

for more information, see https://pre-commit.ci

ekatef · 2023-01-21T21:18:37Z

Thank you for your guidance @davide-f :)
Have tried to implement your suggestions

Result of squashing is in #570. That is the result of git merge --squash country_geom_fixes_clean Not sure if it's the best approach to have a new branch and open a new PR for that. But I wanted to avoid using reset to guarantee that the history will be kept. What I was not able to do is to get rid of README modifications :( Again, some git spells probably could be used to modify the history but not sure if it's really a good idea

Could you please have a look?

davide-f · 2023-01-22T23:38:46Z

Can we close this PR?

ekatef · 2023-01-23T09:56:20Z

@davide-f, yes absolutely! :)

ekatef · 2023-01-23T09:56:50Z

Closed to be finalized in #572

github-actions bot and others added 13 commits January 8, 2023 21:03

docs(contributor): contrib-readme-action has updated readme

8519d6f

Merge branch 'main' of https://github.com/ekatef/pypsa-earth

f314100

Add a dependency

89e6ecc

Unify the outline

05afc81

Fix a no_data condition

1c53fab

Add an import

17627fe

Put a GID checking into a function

ecf6f3b

Wrap building of a gadm dataframe into a function

dec9682

Apply fixing functions to gadm building

770147d

Unite shapes for a country

bd0a3e0

Add a geometry repair

219877a

Add geometry repair

27b57d7

[pre-commit.ci] auto fixes from pre-commit.com hooks

446a6ac

for more information, see https://pre-commit.ci

ekatef and others added 7 commits January 11, 2023 23:48

Fix formatting

b9d78b1

Add TODO

87b2d70

Merge remote-tracking branch 'origin/country_geom_fixes_clean' into c…

e70c635

…ountry_geom_fixes_clean

Replace apply with where

75a581c

[pre-commit.ci] auto fixes from pre-commit.com hooks

d2b1350

for more information, see https://pre-commit.ci

Amend a release note

9d1672c

Merge remote-tracking branch 'origin/country_geom_fixes_clean' into c…

53c2c04

…ountry_geom_fixes_clean

ekatef marked this pull request as ready for review January 12, 2023 21:07

davide-f reviewed Jan 12, 2023

View reviewed changes

ekatef added 4 commits January 14, 2023 18:50

Revert "Replace apply with where"

2424c36

This reverts commit 75a581c.

Update a docstring

4a2227b

Revise path

838630a

Fix technical error

Use list instead of set

2c01d3e

Add TODOs

b2cffde

ekatef and others added 14 commits January 20, 2023 02:19

Rename refactoring

501c1cd

Fix order of arguments

2c125e2

Fix typo

8a375c9

Fix typo

f8f0c97

Merge remote-tracking branch 'origin/country_geom_fixes_clean' into c…

38b9151

…ountry_geom_fixes_clean

Fix typos

d585246

Put back the comment

0a4d10f

[pre-commit.ci] auto fixes from pre-commit.com hooks

1065e20

for more information, see https://pre-commit.ci

Improve a comment

44ab3e6

Merge remote-tracking branch 'origin/country_geom_fixes_clean' into c…

3744508

…ountry_geom_fixes_clean

Replace a debug value with a normal one

001452f

Simplify subsetting

66d4fec

Fix defaults

7cb2812

[pre-commit.ci] auto fixes from pre-commit.com hooks

dd7b68d

for more information, see https://pre-commit.ci

ekatef and others added 7 commits January 21, 2023 23:12

Revise the conditions order

9adcf5a

Revise warnings

d642628

Add a TODO

3519b0d

Merge remote-tracking branch 'origin/country_geom_fixes_clean' into c…

6b28f35

…ountry_geom_fixes_clean

[pre-commit.ci] auto fixes from pre-commit.com hooks

b938590

for more information, see https://pre-commit.ci

Merge branch 'main' of https://github.com/pypsa-meets-earth/pypsa-earth

699901a

Merge branch 'main' into country_geom_fixes_clean

5030858

ekatef mentioned this pull request Jan 21, 2023

Fix geometry issues - squash #570

Closed

7 tasks

davide-f mentioned this pull request Jan 22, 2023

Finalize clean_build_shapes #572

Merged

7 tasks

ekatef closed this Jan 23, 2023

ekatef deleted the country_geom_fixes_clean branch November 14, 2023 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix geometry issues - clean version #563

Fix geometry issues - clean version #563

ekatef commented Jan 10, 2023 •

edited

ekatef commented Jan 10, 2023

ekatef commented Jan 12, 2023

davide-f left a comment •

edited

davide-f Jan 12, 2023

ekatef Jan 14, 2023

davide-f Jan 12, 2023

ekatef Jan 14, 2023

davide-f Jan 14, 2023

ekatef Jan 14, 2023

davide-f Jan 12, 2023

ekatef Jan 14, 2023

davide-f Jan 14, 2023

davide-f Jan 12, 2023

ekatef Jan 14, 2023 •

edited

davide-f Jan 14, 2023

ekatef Jan 14, 2023 •

edited

ekatef commented Jan 14, 2023 •

edited

davide-f commented Jan 21, 2023

ekatef commented Jan 21, 2023

davide-f commented Jan 22, 2023

ekatef commented Jan 23, 2023

ekatef commented Jan 23, 2023

Fix geometry issues - clean version #563

Fix geometry issues - clean version #563

Conversation

ekatef commented Jan 10, 2023 • edited

Changes proposed in this Pull Request

Checklist

ekatef commented Jan 10, 2023

ekatef commented Jan 12, 2023

davide-f left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekatef Jan 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekatef Jan 14, 2023 • edited

Choose a reason for hiding this comment

ekatef commented Jan 14, 2023 • edited

davide-f commented Jan 21, 2023

ekatef commented Jan 21, 2023

davide-f commented Jan 22, 2023

ekatef commented Jan 23, 2023

ekatef commented Jan 23, 2023

ekatef commented Jan 10, 2023 •

edited

davide-f left a comment •

edited

ekatef Jan 14, 2023 •

edited

ekatef Jan 14, 2023 •

edited

ekatef commented Jan 14, 2023 •

edited