# Load Open Street Maps Data

This notebook is aimed to demonstrate how we obtain spatial data on power transmission lines. Our main data source are the Open Street Maps datasets. The `download_osm_data.py` script is used to extract OSM data for a world area requested by a user. The `config_osm_data.py` contains configuration data needed for such an extraction.

## Set working folder

In [1]:
# change current directory to parent folder
import os
import sys

if not os.path.isdir("pypsa-earth"):
    os.chdir("../..")
sys.path.append(os.getcwd()+"/pypsa-earth/scripts")

## Import nessesary packages

Load Python packages and set visibility options:

In [2]:
import logging
import sys
import pandas as pd
import requests
import urllib3
import time

pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", 70)

logger = logging.getLogger(__name__)

Load local packages written to load OSM data:

In [3]:
from config_osm_data import continent_regions, continents, iso_to_geofk_dict, world_iso

## Management of geographical data

OSM data are being organized by continents, macroregions and countries. Input data on country codes should correspond to ISO standard and be transformed into a valid OSM data request.

The `world_iso` Python two-levels dictionaries are used to keep data on such organization according to ISO conventions. Define a couple of supplementary functions to work with these data structures. The first one `list_countries()` transforms an input dictionary into a list while the second `getContinentCountry()` retrieves the continent and country names by the country code.

In [4]:
def list_countries(w_dc):
    countries_list = []

    for continent in w_dc:
        country = w_dc[continent]
        countries_list.append(list(country.keys()))

    return countries_list


def getContinentCountry(code):
    for continent in world_iso:
        country = world_iso[continent].get(code, 0)
        if country:
            return continent, country
    return continent, country


list_word_iso_countries = list_countries(world_iso)

### Tackle ISO-OSM differences

Let's transform the list of ISO codes into a set

In [5]:
iso_set = set(sum(list_word_iso_countries, []))

However, in OSM data of selected countries are grouped together.
The full list is provided into the `iso_to_geofk_dict` dictionary, where the keys of the dictionary correspond to the ISO codes and the corresponding values denote the geofabrik code that contains the corresponding information. For more details on the accepted geofabrik codes, please refer to the `earth-osm` documentation.

In [6]:
iso_to_geofk_dict

{'EH': 'MA',
 'SN': 'SN-GM',
 'GM': 'SN-GM',
 'KM': 'comores',
 'SG': 'MY',
 'BN': 'MY',
 'SA': 'QA-AE-OM-BH-KW',
 'KW': 'QA-AE-OM-BH-KW',
 'BH': 'QA-AE-OM-BH-KW',
 'QA': 'QA-AE-OM-BH-KW',
 'AE': 'QA-AE-OM-BH-KW',
 'OM': 'QA-AE-OM-BH-KW',
 'PS': 'PS-IL',
 'IL': 'PS-IL',
 'SM': 'IT',
 'VA': 'IT',
 'HT': 'haiti-and-domrep',
 'DO': 'haiti-and-domrep',
 'PA': 'panama',
 'NF': 'AU',
 'MP': 'american-oceania',
 'GU': 'american-oceania',
 'AS': 'american-oceania',
 'CP': 'ile-de-clipperton',
 'PF': 'polynesie-francaise',
 'VU': 'vanuatu',
 'TK': 'tokelau',
 'MH': 'marshall-islands',
 'PN': 'pitcairn-islands',
 'WF': 'wallis-et-futuna',
 'XK': 'RS-KM',
 'BS': 'bahamas',
 'BB': 'central-america',
 'CU': 'cuba',
 'RE': 'reunion',
 'YT': 'mayotte',
 'GG': 'guernsey-jersey',
 'JE': 'guernsey-jersey',
 'IM': 'isle-of-man',
 'GP': 'guadeloupe',
 'JM': 'jamaica',
 'TT': 'central-america',
 'AG': 'central-america',
 'DM': 'central-america',
 'LC': 'central-america',
 'VC': 'central-america',
 'KN': 'c

### Work with macroregions

A built-in `continent_regions` dictionary contains shortcuts for different regions of the world. To see how it works, let's unpack and hold unique country codes only:

In [7]:
macro_regions_list = list(dict(**continent_regions).values())
# flatten list and keep unique elements only
macro_reg_set = set(sum(macro_regions_list, []))

The macro regions dictionary contains fewer countries as compared with the whole ISO world countries set:

In [8]:
print(len(macro_reg_set))
print(len(iso_set))

199
215


The missed country codes can be translated into a plain language with `getContinentCountry()` transformation function:

In [9]:
for cnt in list(iso_set - macro_reg_set):
    print(getContinentCountry(cnt))

('Oceania', 'niue')
('Africa', 'canary-islands')
('Europe', 'kosovo')
('Europe', 'jersey')
('Oceania', 'cook-islands')
('Europe', 'isle of man')
('SouthAmerica', 'falkland-islands')
('Oceania', 'tokelau')
('NorthAmerica', 'guadeloupe')
('Oceania', 'ile-de-clipperton')
('Oceania', 'pitcairn-islands')
('Europe', 'faroe islands')
('NorthAmerica', 'puerto-rico')
('Africa', 'mayotte')
('Oceania', 'french-polynesia')
('Oceania', 'guam')
('Oceania', 'northern-mariana-islands')
('Oceania', 'american-samoa')
('Europe', 'vatican')
('Africa', 'reunion')
('Oceania', 'wallis-and-futuna')
('Oceania', 'norfolk-island')
('Europe', 'guernsey')


Spatial data on Somalia, Guinea-Bissau and Guyane do not yet present in OSM.

### Work with continents codes

Note that there are three kinds of input for the regions:
1) a two-digit ISO country code `world_iso`
2) a shortcut for a world region `continent_regions`
3) a full name of the continent which should correspond to one of the `world_iso` keys

In [10]:
print(world_iso.keys())

dict_keys(['Africa', 'Asia', 'Oceania', 'Europe', 'NorthAmerica', 'SouthAmerica'])


There are continents ISO codes as well:

In [11]:
print(continents)

{'LA': 'NorthAmerica', 'SA': 'SouthAmerica', 'AS': 'Asia', 'OC': 'Oceania', 'AF': 'Africa', 'EU': 'Europe'}


But continents codes can't be used as geographical inputs as they have some intersections with the countries codes:

In [12]:
for cnt in set(continents).intersection(iso_set):
    print(cnt, getContinentCountry(cnt))

AF ('Asia', 'afghanistan')
LA ('Asia', "lao-people's-democratic-republic")
AS ('Oceania', 'american-samoa')
SA ('Asia', 'saudi-arabia')


# Check Availability of OSM data

The requested geographical code is used to construct an url to request OSM data from the GeoFabrik server. The url consists of the continent and country names defined according to the GeoFabrik conventions. OSM naming is kept in the `iso_to_geofk_dict` dictionary as explained above. A `get_continent_geofk()` function is defined to identify the format of the continent in the geofabrik format. A valid (hopefully) url is formed with a function `build_url()` to find a needed data chunk on the GeoFabrik server.

In [13]:
def get_continent_geofk(cnt):
    dict_format = {
        "africa": "africa",
        "northamerica"
    }


def build_url(country_code, update, verify, iso_to_geofk_dict=iso_to_geofk_dict):
    continent, country_name = getContinentCountry(country_code)
    if country_code in iso_to_geofk_dict:
        geofabrik_filename = f"{iso_to_geofk_dict[country_code]}-latest.osm.pbf"
    else:
        geofabrik_filename = f"{country_name}-latest.osm.pbf"
    
    geofabrik_url = f"https://download.geofabrik.de/{continent.lower()}/{geofabrik_filename}"
    return geofabrik_url

Check how OSM data access work. As an example we'll take only three countries from the codes list as too often requests can cause some troubles:

In [14]:
problem_urls = []
problem_codes = []
problem_domain = []

# flatten list; test Morocco and Nigeria
test_geofk_codes = ["MA", "NG"]

for cnt in test_geofk_codes:
    print(getContinentCountry(cnt))
    url = build_url(country_code=cnt, update=False, verify=False)
    print(url)
    time.sleep(5)

    with requests.get(url, stream=True, verify=True) as r:
        request = requests.head(url)
        if r.status_code == 200:
            print("URL '" + url + "' is working")
        else:
            problem_urls.append(url)
            problem_codes.append(cnt)
            problem_domain.append(getContinentCountry(cnt))

            if r.status_code == 429:
                print(
                    "Error code:"
                    + str(r.status_code)
                    + ". The pause between loads should be increased."
                )
            else:
                print(
                    "There some troubles with "
                    + url
                    + " Error code:"
                    + str(r.status_code)
                )

('Africa', 'morocco')
https://download.geofabrik.de/Africa/morocco-latest.osm.pbf
There some troubles with https://download.geofabrik.de/Africa/morocco-latest.osm.pbf Error code:404
('Africa', 'nigeria')
https://download.geofabrik.de/Africa/nigeria-latest.osm.pbf
There some troubles with https://download.geofabrik.de/Africa/nigeria-latest.osm.pbf Error code:404


Having a look on the results of our shortened check:

In [15]:
if len(problem_urls) > 0:
    print("There were troubles in reaching following urls:")
    print(problem_urls)
    print("Country codes to be checked:")
    print(problem_codes)
    print(problem_domain)
else:
    print("All requested urls are available")

There were troubles in reaching following urls:
['https://download.geofabrik.de/Africa/morocco-latest.osm.pbf', 'https://download.geofabrik.de/Africa/nigeria-latest.osm.pbf']
Country codes to be checked:
['MA', 'NG']
[('Africa', 'morocco'), ('Africa', 'nigeria')]
