Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix geometry issues - clean version #563

Closed
wants to merge 77 commits into from

Conversation

ekatef
Copy link
Collaborator

@ekatef ekatef commented Jan 10, 2023

Changes proposed in this Pull Request

That is a cleaned version of #532

Checklist

  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and envs/environment.docs.yaml.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@ekatef
Copy link
Collaborator Author

ekatef commented Jan 10, 2023

A test on ["PK", "KG"] works

@ekatef
Copy link
Collaborator Author

ekatef commented Jan 12, 2023

Tests both on ["PK"] and ["PK", "KG"] was successful. I think it's ready for review now.
There are a couple of questions left.

  1. do we really need to keep output of the non-standard regions to a file? Currently it's done with writing a csv without proper Snakemake declarations:

https://github.com/ekatef/pypsa-earth/blob/53c2c04b7555ad054d66742bc2375f7d1d956a19/scripts/build_shapes.py#L114-L121

If we are intended to keep this output, it may be better to add a proper snakemake output. But I'm not sure if it's really necessary as it looks like such situations are quite rare.

  1. when implementing where instead of apply it was found that to do so there may be a need to vectorise the name-transforming functions:

75a581c

Here changes in a way of applying country_name_2_two_digits required to add any to the function definition to make it work. Obviously, more throughout revision of country_name_2_two_digits code is needed to make if work properly both on a single string and on a list of strings. But again, I'm not sure if it's worth it :)

@davide-f, would be grateful for your opinion on this and the review

@ekatef ekatef marked this pull request as ready for review January 12, 2023 21:07
Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments, I fear there have been some misunderstanding; hope not

README.md Show resolved Hide resolved
@@ -551,7 +551,7 @@ def country_name_2_two_digits(country_name):
2-digit country name
"""
if (
country_name
country_name.any()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no the intended behavior. It should be removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! So, your answer to my second questions (attached to the review notification) is "no". Agree, the where modification doesn't look great. Commit reverted

scripts/build_bus_regions.py Show resolved Hide resolved
no_data_countries = set(country_list).difference(set(bus_country_list))
# it may happen that bus_country_list contains entries not relevant as a country name (e.g. "not found")
# difference can't give negative values; the following will return only releant country names
no_data_countries = set(country_list).difference(set(bus_country_list))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth using symmetric_difference here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we can't handle two difference cases as symmetrical:

  1. set(country_list).difference(set(bus_country_list)) = countries from the countries parameter of the config which don't have any data to restore a buses dataframe [meaning that we need to generate some data for such areas]

  2. set(bus_country_list).difference(set(country_list)) = countries from the buses dataframe which are not in the countries from the config and hence were not requested by the user to be included into the model [which basically means that something went wrong in the workflow before]

Initially the following code chunk was intended to address the first situation but the condition captured both of them being len(bus_country_list) != len(country_list) ( see

if len(bus_country_list) != len(country_list):
). That lead to troubles when a non-standard code appeared in the buses list. So, I had to introduce a fix exactly to avoid mixing two situations. (Sorry for not being clear enough when explaining it in our discussion!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal would be to use:
difference_countries = set(country_list).symmetric_difference(set(bus_country_list))
difference_countries should be empty if the two sets match, othewise difference_countries contains the items that are missing in one set or in the other one

if country list is ["AG"]and bus_country_list is ["AG", "Something"], the current revision doesn't catch the difference while the previousone using len does

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, if the case you mention arise, it should be fixed

But the approach should be different. Would it be probably a good idea to have an additional check on set(bus_country_list).difference(set(country_list)) and throw an error if it happens? Because this would mean that our attempt to fix it in build_shapes has failed

return row["GID_0"]


def build_gadm_df(file, layer, cc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there has been a misunderstanding; I was proposing to move to a separate function just the filtering of the GID_0 component.

Basically, keeping the get_GADM_layer as it was and there only adding a line or two like these:

if {check config option drop}:
  geodf.drop(list of indices not matching, axis=0, inplace=True);
elif {check config option set to country}:
  geodf["COUNTRY"] = country_code

if more lines are needed, we may define a function that contains the rows above and beyond ( e.g. the output of the non-standard zones). By default, the output file shall not be saved

def filter_gadm_flag(geodf, config, save_non_standard_geo=False):
   .... stuff

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm... Agree that the code could be better structured but don't quite get what your idea

Besides, I'm not sure that drop is a good default option. Apart of ethical concerns, it'd require some additional changes in the code. (I'll attach a picture to the main PR conversation to explain what is the matter)

So, I suggest to focus in this PR on following GADM conventions with introducing custom_prescribe and (probably) drop option as the next step

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think too that by default shapes shouldn't be dropped, it was just a simple proposal to provide 2 options

@@ -267,8 +298,10 @@ def eez(countries, geo_crs, country_shapes, EEZ_gpkg, out_logging=False, distanc
)

ret_df = ret_df.apply(lambda x: make_valid(x))
country_shapes = country_shapes.apply(lambda x: make_valid(x))
# country_shapes may consist of different geometries which need to be united
country_shapes = country_shapes.apply(lambda x: make_valid(x)).unary_union
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why unary_union?

Copy link
Collaborator Author

@ekatef ekatef Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid having multiple offshore shapes which is the case if the onshore dataframe for a country contains multiple geometries

Each non-standard GADM code leads to an additional geometry entry in the countries_shape. For some reasons this results is duplication of the offshore shapes for this country when calculating the difference in ret_df.difference(country_shapes_with_buffer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was because the unary_union merges all shapes of all countries and I was wondering if that introduces (a) additional computational time that may not be required and (b) alter the results as a single eez of one country would be compared to a merged shape by the unary_union.
Have you tested it with multiple countries and check that the output shapes are ok?

Copy link
Collaborator Author

@ekatef ekatef Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have tested it and it looks ok. But I'd be happy to understand the underlying behaviour of difference and unary_union better. It looks like there is some implicit grouping by country name and looping inside it. Regarding performance, would be very interested in your opinion. This aspect I haven't (yet) considered properly

@ekatef
Copy link
Collaborator Author

ekatef commented Jan 14, 2023

I've added some comments, I fear there have been some misunderstanding; hope not

@davide-f, thanks a lot for the review. I'd say it's an iterative work process. Which seems to be converging :)

Regarding drop option: the point is that some of the contended areas contain substations and/or power plants:

image

image

points in greens are buses, points in reds are powerplants extracted from PPM

So, it looks like when dropping the non-standard areas we may need to add further modifications along the workflow: e.g. we may need to fix generators which belong to the requested area but can't be located there

@davide-f
Copy link
Member

Thanks @ekatef ! :D
I think it's a good base, I think I can finalize the details starting from here.
I can take over for little changes, but first would you mind to do some squashing to reduce the history?
Once that is done, I'd like to add few commits, then we comment the PR and finalize

@ekatef ekatef mentioned this pull request Jan 21, 2023
7 tasks
@ekatef
Copy link
Collaborator Author

ekatef commented Jan 21, 2023

Thank you for your guidance @davide-f :)
Have tried to implement your suggestions

Result of squashing is in #570. That is the result of git merge --squash country_geom_fixes_clean Not sure if it's the best approach to have a new branch and open a new PR for that. But I wanted to avoid using reset to guarantee that the history will be kept. What I was not able to do is to get rid of README modifications :( Again, some git spells probably could be used to modify the history but not sure if it's really a good idea

Could you please have a look?

@davide-f davide-f mentioned this pull request Jan 22, 2023
7 tasks
@davide-f
Copy link
Member

Can we close this PR?

@ekatef
Copy link
Collaborator Author

ekatef commented Jan 23, 2023

@davide-f, yes absolutely! :)

@ekatef
Copy link
Collaborator Author

ekatef commented Jan 23, 2023

Closed to be finalized in #572

@ekatef ekatef closed this Jan 23, 2023
@ekatef ekatef deleted the country_geom_fixes_clean branch November 14, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants