### Notebook purpose
- Connect to Overpass API
- Extract coordinates of all hiking trails within Switzerland
- Convert data into a pandas dataframe object
- Create a table in SQL database (hosted on Microsoft Azure)
- Store coordinates in SQL DB

overpy ist eine Python-Bibliothek, die es ermöglicht, Daten von der Overpass API (eine Schnittstelle für OpenStreetMap-Daten) abzufragen und zu verarbeiten. Die Overpass API ermöglicht es, Wanderwege aus dem OpenStreetMap-Projekt (OSM) abzurufen.

Die Abrage sucht nach Wanderrouten in einem Giebiet welche mit spezifischen Signalisationen ausgeschildert sind. 

Falls das der Name nicht vorhanden ist, aber die Bezeichnungen von und bis existieren, wird der Name zusammengesetzt.



sqlalchemy dient dazu, SQL-Datenbanken mit Python-Code zu verbinden und erleichtert das Arbeiten mit relationalen Datenbanken. sqlalchemy bietet einen direkten Zugriff auf SQL-Datenbanken, was Flexibilität ermöglicht.

In [34]:
# Import required libraries
import pandas as pd 
import overpy
import json
import os
from sqlalchemy import create_engine
import pyodbc
import urllib
import json
import pymssql

In [35]:
# Initialize the Overpass API with a custom URL
api = overpy.Overpass(url="http://overpass.osm.ch/api/interpreter")

# Overpass query for hiking trails within Switzerland. Using 'center', we obtain the coordinates in the middle of a hiking trail
query = """
[out:json];
relation
["route"="hiking"]
/*["name"]*/
["name"!~"fixme", i]
["network"="lwn"]
["osmc:symbol"~"yellow::yellow_diamond|red:white:red_bar|yellow:white:yellow_diamond|blue:white:blue_bar"]
/*(id: 1432463)*/
(45.8899, 6.0872, 47.8085, 10.4921);
out center tags;
"""

# Execute the request
result = api.query(query)

# List to store the extracted information
list = []

# Iterate over all relations
for relation in result.relations:
    
    # Extract relevant data
    org_name = relation.tags.get('name')
    fix_name = ""
    org_to = relation.tags.get('to')
    org_from = relation.tags.get('from')
    
    # Center is a tuple with latitude and longitude, we want only a single value
    lat = getattr(relation, 'center_lat')
    lon = getattr(relation, 'center_lon')
    
    # If the original name is not available, construct it from 'from' and 'to'
    if not org_name and org_from and org_to:
        fix_name = f"{org_from} - {org_to}"
    else:
        fix_name = org_name

    # Create a dictionary to store the attributes as a tuple
    if fix_name and lat > 0 and lon > 0:
        dict = {
        'id': relation.id,
        'name': fix_name,
        'lat': lat,
        'lon': lon
        }

        # Each tuple is now saved in the list as a new row
        list.append(dict)

# Once all data is processed, create the DataFrame
df_wanderwege = pd.DataFrame(list)



In [36]:
# Add time and datestamp of API call to dataframe
df_wanderwege["timestamp_apicall"] = pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S")

# Change column order and print dataframe
col_order = ['timestamp_apicall', 'id', 'name', 'lat', 'lon']
df_wanderwege = df_wanderwege[col_order]

print(df_wanderwege.info())
print("-----------------------------------------")
print(df_wanderwege.head(5))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15196 entries, 0 to 15195
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   timestamp_apicall  15196 non-null  object
 1   id                 15196 non-null  int64 
 2   name               15196 non-null  object
 3   lat                15196 non-null  object
 4   lon                15196 non-null  object
dtypes: int64(1), object(4)
memory usage: 593.7+ KB
None
-----------------------------------------
     timestamp_apicall      id                                          name  \
0  2024-09-20 16:05:45   22614  Nationalpark Wanderroute 15 (Munt la Schera)   
1  2024-09-20 16:05:45  103607                                 Wanderwege SG   
2  2024-09-20 16:05:45  112830                Uetliberg - Uetliberg Uto Kulm   
3  2024-09-20 16:05:45  112831                           Folenweid - Baldern   
4  2024-09-20 16:05:45  112833                          Felseneg

In [37]:
# Get current working directory
current_dir = os.getcwd()
print(current_dir)

# c:\Users\etien\OneDrive\02_Progression\CAS_DataEngineering_ZHAW\03_Leistungsnachweis\Wanderwege\notebooks

/Users/tom/Git/Wanderwege/Wanderwege/notebooks


In [38]:
# Load database access configuration from config/db_config.json
with open('../config/db_config.json', 'r') as f:
    db_config = json.load(f)

# Access db credentials
server = db_config['server']
database = db_config['database']
db_user = db_config['db_user']
db_password = db_config['db_password']

In [39]:
# Erstelle die Verbindungs-Engine für pymssql
engine = create_engine(f"mssql+pymssql://{db_user}:{db_password}@{server}/{database}")


In [40]:


# Datenbankzugriffskonfiguration laden
with open('../config/db_config.json', 'r') as f:
    db_config = json.load(f)

# Zugriff auf die DB-Anmeldeinformationen
server = db_config['server']
database = db_config['database']
db_user = db_config['db_user']
db_password = db_config['db_password']

# Verbindungsaufbau mit pymssql (für direkte Verwendung ohne ODBC)
conn = pymssql.connect(server, db_user, db_password, database)


# Erstelle die Verbindungs-Engine für pymssql mit SQLAlchemy
engine = create_engine(f"mssql+pymssql://{db_user}:{db_password}@{server}/{database}")

# DataFrame in die MSSQL-Datenbank-Tabelle schreiben
df_wanderwege.to_sql(name='www_db', con=engine, if_exists='replace', index=False)

print("DataFrame erfolgreich in die MSSQL-Datenbank geladen!")

# Schließe die Verbindung
conn.close()


DataFrame erfolgreich in die MSSQL-Datenbank geladen!
