# Open Street Map Data

This notebook is used to extract hiking route data from the Overpass API ([Link](https://overpass-turbo.eu/)).

First, we request hiking routes from the API using Overpass QL (short for "Overpass Query Language").
In OpenStreetMap, hiking routes are defined as relations. We search for relations with specific signage and the tags "hiking routes," "local walking network",
within an area slightly larger than Switzerland. Using "Center" as Output, OpenStreetMap calculates the central location of each route.
Since the "name" tag is often missing, we interpolate the name by concatenating the start and end points of each hiking route.
Finally, we retrieve the ID, name, latitude, and longitude as data points. 

The data is then converted into a DataFrame object, and a table is created in an SQL database (hosted on Microsoft Azure).

In [21]:
# Import required libraries
import os
import json
import overpy
import pyodbc
import urllib
import pymssql
import pandas as pd 
from sqlalchemy import Integer, String, Float, DATETIME, create_engine

In [22]:
# Initialize the Overpass API with a custom URL
api = overpy.Overpass(url="http://overpass.osm.ch/api/interpreter")

# Overpass query for hiking trails within Switzerland. Using 'center', we obtain the coordinates in the middle of a hiking trail
query = """
[out:json];
relation
["route"="hiking"]
["name"!~"fixme", i]
["network"="lwn"]
["osmc:symbol"~"yellow::yellow_diamond|red:white:red_bar|yellow:white:yellow_diamond|blue:white:blue_bar"]
(45.8899, 6.0872, 47.8085, 10.4921);
out center tags;
"""

# Execute the request
result = api.query(query)

In [23]:
# Add time and datestamp of API call to dataframe
timestamp_apicall = pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S")

# List to store the extracted information
list = []

# Iterate over all relations
for relation in result.relations:

    # Extract relevant data
    org_name = relation.tags.get('name')
    fix_name = ""
    org_to = relation.tags.get('to')
    org_from = relation.tags.get('from')
    
    # Center is a tuple with latitude and longitude, we want only a single value
    lat = getattr(relation, 'center_lat')
    lon = getattr(relation, 'center_lon')
    
    # If the original name is not available, construct it from 'from' and 'to'
    if not org_name and org_from and org_to:
        fix_name = f"{org_from} - {org_to}"
    else:
        fix_name = org_name

    # Create a dictionary to store the attributes as a tuple
    if fix_name and lat > 0 and lon > 0:
        dict = {    
        'id': relation.id,
        'name': fix_name,
        'lat': lat,
        'lon': lon,
        'timestamp_apicall': timestamp_apicall,
        }

        # Each tuple is now saved in the list as a new row
        list.append(dict)

# Once all data is processed, create the DataFrame
df_wanderwege = pd.DataFrame(list)

# Print the DataFrame
print(df_wanderwege.head())

       id                                          name         lat  \
0   22614  Nationalpark Wanderroute 15 (Munt la Schera)  46.6501430   
1  103607                                 Wanderwege SG  47.4309774   
2  112830                Uetliberg - Uetliberg Uto Kulm  47.3511680   
3  112831                           Folenweid - Baldern  47.3291235   
4  112833                          Felsenegg - Balderen  47.3152439   

          lon    timestamp_apicall  
0  10.2301992  2024-09-24 14:57:08  
1   9.6201700  2024-09-24 14:57:08  
2   8.4897796  2024-09-24 14:57:08  
3   8.5007261  2024-09-24 14:57:08  
4   8.5050559  2024-09-24 14:57:08  


In [24]:
# Convert lat and lon to numeric, timestamp to datetime
df_wanderwege['lat'] = pd.to_numeric(df_wanderwege['lat'], errors='coerce')
df_wanderwege['lon'] = pd.to_numeric(df_wanderwege['lon'], errors='coerce')
df_wanderwege['timestamp_apicall'] = pd.to_datetime(df_wanderwege['timestamp_apicall'], errors='coerce')

In [25]:
# Get current working directory
current_dir = os.getcwd()

# Load database access configuration 
with open('../config/db_config.json', 'r') as f:
    db_config = json.load(f)

# Define the server, database, user and password
server = db_config['server']
database = db_config['database']
db_user = db_config['db_user']
db_password = db_config['db_password']

# Connect to the database
conn = pymssql.connect(server, db_user, db_password, database)

# Create connection string for sqlalchemy
engine = create_engine(f"mssql+pymssql://{db_user}:{db_password}@{server}/{database}")

# Write the DataFrame to the MSSQL database
# df_wanderwege.to_sql(name='overpass', con=engine, if_exists='replace', index=False)
df_wanderwege.to_sql(
    name='overpass',
    con=engine,
    if_exists='replace',
    index=False,
    dtype={
        'id': Integer,              # 'id' als Integer
        'name': String(100),        # 'name' als String mit maximal 100 Zeichen
        'lat': Float,             # 'date' als DATETIME oder TIMESTAMP
        'lon': Float,    # 'temperature_2m' als Float
        'timestamp_apicall': DATETIME,
    }
)

# Close the connection
conn.close()

print("DataFrame erfolgreich in die MSSQL-Datenbank geladen!")

DataFrame erfolgreich in die MSSQL-Datenbank geladen!
