Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing of regions management when loading OSM data #249

Merged
merged 101 commits into from
Feb 15, 2022
Merged

Testing of regions management when loading OSM data #249

merged 101 commits into from
Feb 15, 2022

Conversation

ekatef
Copy link
Collaborator

@ekatef ekatef commented Jan 25, 2022

Changes proposed in this Pull Request

This pull request aims to test management of different world countries and regions when downloading OSM data.

The goals are

  • Check that differences between ISO and an OSM server (GeoFabrik) are resolved correctly. Particularly, there are following kinds of differences
    • continents the country belong are defined differently (e.g. Georgia)
    • a number of countries are merged in OSM (e.g. GCC)
  • Check and comment island countries
    • in world_iso
    • in world_geofk
  • Check completeness and correctness of the continent_regions dictionary used to define macro-regions
  • Avoid duplicates in dictionaries keys (like "LA" both for Laos and Latin America)

Test by continents and regions

Let's see how the workflow runs if OSM data are loaded for different regions of the world. The macro region shortcuts for testing are defined according to the config_osm_data.continent_regions

  • Africa
    • Western African region
    • Central African region
    • Eastern African region
    • Southern African region
  • Asia
    • Western Asia
    • Central Asia
    • Far East
    • South-Eastern Asia
    • Southern Asia
    • Middle East
  • Australia-Oceania
  • Europe
    • Scandinavian region
    • Eastern region
    • Central region
    • Balkan region
    • Western region
    • Southern region
  • Russia
  • North-America
  • South-America
  • Central-America

Checklist

  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and envs/environment.docs.yaml.
  • Changes in configuration options are added in all of config.default.yaml, config.tutorial.yaml, and test/config.test1.yaml.
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes.

Snakefile Outdated Show resolved Hide resolved
@ekatef
Copy link
Collaborator Author

ekatef commented Feb 10, 2022

@davide-f, could you please have a look on the PR? I think it's mainly done.

The test of OSM data load across the globe were generally successful, but there were some local problems:

  1. OSM data are not available for few countries (Guinea-Bissau, Somalia, Guyane);

  2. OSM power data availability seems to be limited for some areas, especially in Africa, Central and Southern America;

  3. there may be some peculiarities with the regions merged in OSM: e.g. I was not able to see any grid extracted for Bahrain and Singapore;

  4. sometimes data load (or more precise data pre-processing during load) for a single country fails but it can be resolved by increase of the requested area.

@davide-f
Copy link
Member

@davide-f, could you please have a look on the PR? I think it's mainly done.

The test of OSM data load across the globe were generally successful, but there were some local problems:

  1. OSM data are not available for few countries (Guinea-Bissau, Somalia, Guyane);
  2. OSM power data availability seems to be limited for some areas, especially in Africa, Central and Southern America;
  3. there may be some peculiarities with the regions merged in OSM: e.g. I was not able to see any grid extracted for Bahrain and Singapore;
  4. sometimes data load (or more precise data pre-processing during load) for a single country fails but it can be resolved by increase of the requested area.

Hi @ekatef , good job! I answer point by point below:

  1. It seems that some data are available (guyane, guinea-bissau), you mean that there are no useful extracted electrical data? If that's the case, I think we can't to anything about that (yet, waiting for detect_energy :)). Here we should be able to download the pbfs, if no data are available, that's a result.
  2. Understandable
  3. Ok, the important thing is that the pbfs are successfully processed, if there are no data ok, the workflow anyway is able to process them once they will come available (we may work towards that in the future :))
  4. Can you elaborate on this? I don't fully understand the point

@davide-f davide-f marked this pull request as ready for review February 12, 2022 21:27
@ekatef
Copy link
Collaborator Author

ekatef commented Feb 14, 2022

Hi @davide-f , many thanks for checking!
Some additional comments:

  1. data availability

It seems that some data are available (guyane, guinea-bissau), you mean that there are no useful extracted electrical data? If that's the case, I think we can't to anything about that (yet, waiting for detect_energy :)). Here we should be able to download the pbfs, if no data are available, that's a result.

Yes, exactly: some OSM data for Somalia and Guinea-Bissau are available (and pbf files are being loaded by download_osm_data) but some raw power data files are empty. E.g. for Somalia there is some information on lines but data on cables and substations are missed.

With Guyane there it seems to be naming troubles. Strictly speaking, GeoFabrik page on guyane relates to French Guiana = "GF" while GeoFabrik data related to Co‑operative Republic of Guyana = "GY" I was not able to find.

  1. problems with pre-processing

sometimes data load (or more precise data pre-processing during load) for a single country fails but it can be resolved by increase of the requested area.
Can you elaborate on this? I don't fully understand the point

That seems to be connected with data frame subsetting here:

https://github.com/pypsa-meets-africa/pypsa-africa/blob/dfca0162648bf9872d10b73adb5f080d3f33acb0/scripts/download_osm_data.py#L441-L446

For some areas subsetting df_all_feature["lonlat"] results in a key error while increasing of the requested area resolves an issue. E.g. that is the case for Honduras: setting countries: ["HN"] results in KeyError: 'lonlat' while countries: ["central_america"] or countries: ["BZ", "CR", "HN", "GT", "NI", "PA", "SV"] works properly including extraction data on Honduras.

@davide-f
Copy link
Member

Hi @davide-f , many thanks for checking! Some additional comments:

  1. data availability

It seems that some data are available (guyane, guinea-bissau), you mean that there are no useful extracted electrical data? If that's the case, I think we can't to anything about that (yet, waiting for detect_energy :)). Here we should be able to download the pbfs, if no data are available, that's a result.

Yes, exactly: some OSM data for Somalia and Guinea-Bissau are available (and pbf files are being loaded by download_osm_data) but some raw power data files are empty. E.g. for Somalia there is some information on lines but data on cables and substations are missed.

With Guyane there it seems to be naming troubles. Strictly speaking, GeoFabrik page on guyane relates to French Guiana = "GF" while GeoFabrik data related to Co‑operative Republic of Guyana = "GY" I was not able to find.

  1. problems with pre-processing

sometimes data load (or more precise data pre-processing during load) for a single country fails but it can be resolved by increase of the requested area.
Can you elaborate on this? I don't fully understand the point

That seems to be connected with data frame subsetting here:

https://github.com/pypsa-meets-africa/pypsa-africa/blob/dfca0162648bf9872d10b73adb5f080d3f33acb0/scripts/download_osm_data.py#L441-L446

For some areas subsetting df_all_feature["lonlat"] results in a key error while increasing of the requested area resolves an issue. E.g. that is the case for Honduras: setting countries: ["HN"] results in KeyError: 'lonlat' while countries: ["central_america"] or countries: ["BZ", "CR", "HN", "GT", "NI", "PA", "SV"] works properly including extraction data on Honduras.

Thank you for your comments.
I think that we can merge the branch; please keep track of these problems with an issue.
The areas with such problems may also be larger; @EmreYorat talked to me about similar issues for Armenia for example.
Let's keep track of such issues and also understand why that occurs and fix that; however, the current branch is ready to merge.

Thanks @ekatef !

@davide-f davide-f closed this Feb 15, 2022
@davide-f davide-f reopened this Feb 15, 2022
@davide-f davide-f merged commit 44b6236 into pypsa-meets-earth:main Feb 15, 2022
@ekatef
Copy link
Collaborator Author

ekatef commented Feb 16, 2022

@davide-f, thank you for the review and merging. Great to know that this point is done :)

Ok, I'll add an issue related to troubles with structure of data frames extracted from OSM. Apart of the KeyErrors in download_osm_data there are also ValueErrors when calling clean_osm_data and both problems seem to be connected.

pz-max pushed a commit that referenced this pull request Sep 24, 2022
Testing of regions management when loading OSM data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants