# Load Open Street Maps Data

This notebook is aimed to demonstrate how we obtain spatial data on power transmission lines. Our main data source are the Open Street Maps datasets. The `download_osm_data.py` script is used to extract OSM data for a world area requested by a user. The `config_osm_data.py` contains configuration data needed for such an extraction.

## Set working folder

In [1]:
import sys
sys.path.append('../')  # to import helpers

from scripts._helpers import _sets_path_to_root
_sets_path_to_root("pypsa-africa")

This is the repository path:  ./
Had to go 1 folder(s) up.


## Import nessesary packages

Load Python packages and set visibility options:

In [2]:
import logging
import sys
import pandas as pd
import requests
import urllib3
import time

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 70)

logger = logging.getLogger(__name__)

Load local packages written to load OSM data:

In [3]:
from scripts.config_osm_data import continent_regions
from scripts.config_osm_data import continents
from scripts.config_osm_data import iso_to_geofk_dict
from scripts.config_osm_data import world_iso
from scripts.config_osm_data import world_geofk

## Management of geographical data

OSM data are being organized by continents, macroregions and countries. Input data on country codes should correspond to ISO standard and be transformed into a valid OSM data request.

The `world_geofk` and `world_iso` Python two-levels dictionaries are used to keep data on such organization according to OSM and ISO conventions, respectively. Define a couple of supplementary functions to work with these data structures. The first one `list_countries()` transforms an input dictionary into a list while the second `getContinentCountryIso()` retrieves the continent and country names by the country code.

In [4]:
def list_countries(w_dc):
    countries_list = []

    for continent in w_dc:
        country = w_dc[continent]
        countries_list.append(list(country.keys()))
        
    return countries_list 

def getContinentCountryIso(code):
    for continent in world_iso:
        country = world_iso[continent].get(code, 0)
        if country:
            return continent, country
    return continent, country

list_word_iso_countries = list_countries(world_iso)
list_word_geofk_countries = list_countries(world_geofk)

### Tackle ISO-OSM differences

Let see what are the differences between ISO and OSM naming conventions. Flatten each of the countries lists with `sum(a_list, [])` and keep only unique elements by `set()` transformation. Then substraction will give a differences between countries codes used by ISO and OSM:

In [5]:
iso_set = set(sum(list_word_iso_countries, []))
geofk_set = set(sum(list_word_geofk_countries, []))

iso_not_in_geofk = iso_set - geofk_set
geofk_not_in_iso = geofk_set - iso_set

Translate obtained two-digits codes into human readable tulpes and see for which **countries GeoFabrik naming differs from ISO**:

In [6]:
for cnt in list(iso_not_in_geofk):
    print(getContinentCountryIso(cnt))

('africa', 'western-sahara')
('africa', 'gambia')
('asia', 'brunei')
('asia', 'singapore')
('asia', 'malaysia')
('asia', 'palestine')
('europe', 'san-marino')
('asia', 'bahrain')
('asia', 'israel')
('africa', 'senegal')
('asia', 'kuwait')
('asia', 'macao')
('asia', 'united-arab-emirates')
('asia', 'saudi-arabia')
('asia', 'hong kong')
('asia', 'qatar')
('asia', 'oman')


These differences between ISO and OSM are tackled by implementing an `iso_to_geofk_dict` dictionary which is used to transform ISO inputs into codes which are relevant for the OSM server. So, each ISO country code which is not accessible in OSM directly should be included into the `iso_to_geofk_dict` transformation dictionary otherwise this code would be lost for processing:

In [7]:
lost_codes = set(iso_to_geofk_dict.keys()) - set(iso_not_in_geofk)

If everething works properly, the `lost_codes` set set should be empty:

In [8]:
print("Any ISO codes not resolved by GeoFbk and a transform dictionary?")
if len(lost_codes) > 0:
    print(lost_codes)
    for cnt in list(lost_codes):
        print(getContinentCountryIso(cnt))
else:
    print("...everything seems to work properly")

Any ISO codes not resolved by GeoFbk and a transform dictionary?
...everything seems to work properly


### Work with macroregions

A built-in `continent_regions` dictionary contains shortcuts for different regions of the world. To see how it works, let's unpack and hold unique country codes only:

In [9]:
macro_regions_list = list(dict(**continent_regions).values())
# flatten list and keep unique elements only
macro_reg_set = set(sum(macro_regions_list, []))

The macro regions dictionary contains fewer countries as compared with the whole ISO world countries set:

In [10]:
print(len(macro_reg_set))
print(len(iso_set))

167
169


The missed country codes can be translated into a plain language with `getContinentCountryIso()` transformation function:

In [11]:
for cnt in list(iso_set - macro_reg_set):
    print(getContinentCountryIso(cnt))

('africa', 'guinea-bissau')
('africa', 'somalia')


Spatial data on Somalia, Guinea-Bissau and Guyane do not yet present in OSM.

### Work with continents codes

Note that there are three kinds of input for the regions:
1) a two-digit ISO country code `world_iso`
2) a shortcut for a world region `continent_regions`
3) a full name of the continent which should correspond to one of the `world_iso` keys

In [12]:
print(world_iso.keys())

dict_keys(['africa', 'asia', 'australia', 'europe', 'north_america', 'latin_america', 'central_america'])


There are continents ISO codes as well:

In [13]:
print(continents)

{'LA': 'latin_america', 'SA': 'south_america', 'CA': 'central_america', 'AS': 'asia', 'OC': 'australia', 'AF': 'africa', 'EU': 'europe'}


But continents codes can't be used as geographical inputs as they have some intersections with the countries codes:

In [14]:
for cnt in (set(continents).intersection(iso_set)):
    print(cnt, getContinentCountryIso(cnt))     

SA ('asia', 'saudi-arabia')
AF ('asia', 'afghanistan')
LA ('asia', "lao-people's-democratic-republic")
CA ('north_america', 'canada')


# Check Availability of OSM data

The requested geographical code is used to construct an url to request OSM data from the GeoFabrik server. The url consists of the continent and country names defined according to the GeoFabrik conventions. OSM naming is kept in the `world_geofk` dictionary which has a similar two-level structure as `world_iso`. A `getContinentCountry()` function is defined to transform a requested two-digit country code into a `continent, country` tulpe according to OSM naming rules. A valid (hopefully) url is formed with a function `build_url()` to find a needed data chunk on the GeoFabrik server.

In [15]:
def getContinentCountry(code):
    for continent in world_geofk:
        country = world_geofk[continent].get(code, 0)
        if country:
            return continent, country
    return continent, country

def build_url(country_code, update, verify):
    continent, country_name = getContinentCountry(country_code)
    geofabrik_filename = f"{country_name}-latest.osm.pbf"
    geofabrik_url = f"https://download.geofabrik.de/{continent}/{geofabrik_filename}"
    return geofabrik_url


Check how OSM data access work. As an example we'll take only three countries from the codes list as too often requests can cause some troubles:

In [16]:
problem_urls = []
problem_codes = []
problem_domain = []

# flatten list
world_geofk_codes = sum(list_word_geofk_countries, [])

for cnt in world_geofk_codes[1:2]:    
    print(getContinentCountry(cnt))
    url = build_url(country_code=cnt, update=False, verify=False)
    print(url)
    time.sleep(5)
    
    with requests.get(url, stream=True, verify=True) as r:
        request = requests.head(url)
        if r.status_code == 200:
            print("URL '" + url + "' is working")
        else:
            problem_urls.append(url)
            problem_codes.append(cnt)
            problem_domain.append(getContinentCountry(cnt))
            
            if r.status_code == 429:
                print("Error code:" + str(r.status_code) + ". The pause between loads should be increased.")
            else:
                print("There some troubles with " + url + " Error code:" + str(r.status_code))

('africa', 'angola')
https://download.geofabrik.de/africa/angola-latest.osm.pbf
Error code:429. The pause between loads should be increased.


Having a look on the results of our shortened check:

In [17]:
if len(problem_urls) > 0:              
    print("There were troubles in reaching following urls:") 
    print(problem_urls) 
    print("Country codes to be checked:")
    print(problem_codes) 
    print(problem_domain)
else:
    print("All requested urls are available")

There were troubles in reaching following urls:
['https://download.geofabrik.de/africa/angola-latest.osm.pbf']
Country codes to be checked:
['AO']
[('africa', 'angola')]
