## Extract industrial and park area data

This notebook demonstrates how to visualize industrial areas and parks within specified buffer zones around a point of interest in Ulm, Germany, using geospatial data and the Folium library. The process involves loading data on industrial areas and parks, converting this data into GeoDataFrames, creating buffer zones around the specified location, and identifying which areas fall within these buffers. The final output is an interactive map displayed inline within the notebook, showing the point of interest, nearby industrial areas, parks, and the respective buffer zones. This provides a clear, visual representation of the spatial distribution of these features in relation to the specified location.

In [1]:
# !pip install requests geopandas shapely folium

In [2]:
import requests
import json
from datetime import datetime
import time
from bs4 import BeautifulSoup
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon, LineString
import matplotlib.pyplot as plt
import folium
from IPython.display import display
from IPython.core.display import HTML
import ast
import sqlite3
import os

In [3]:
latitude = 48.4029558
longitude = 9.9559714

In [4]:
# Define radii for parks and industrial areas
radius_parks = 1000  # 1 km for parks
radius_industrial = 5000  # 5 km for industrial areas

# Define the Overpass API URL and query
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = f"""
[out:json];
(
  // Industrial areas
  node["landuse"="industrial"](around:{radius_industrial},{latitude},{longitude});
  way["landuse"="industrial"](around:{radius_industrial},{latitude},{longitude});
  relation["landuse"="industrial"](around:{radius_industrial},{latitude},{longitude});
  // Green spaces
  node["leisure"="park"](around:{radius_parks},{latitude},{longitude});
  way["leisure"="park"](around:{radius_parks},{latitude},{longitude});
  relation["leisure"="park"](around:{radius_parks},{latitude},{longitude});
);
out body;
>;
out skel qt;
"""

# Send the request to the Overpass API
response = requests.get(overpass_url, params={'data': overpass_query})
data = response.json()

# Extract nodes with coordinates directly
def extract_nodes_with_coords(elements):
    return [
        {
            "type": element["type"],
            "id": element["id"],
            "lat": element.get("lat"),
            "lon": element.get("lon"),
            "tags": element.get("tags", {})
        }
        for element in elements if element["type"] == "node"
    ]

# Extract ways and relations, we can get the centroid or one of the nodes' coordinates for simplicity
def extract_ways_and_relations(elements):
    ways_relations = []
    for element in elements:
        if element["type"] in ["way", "relation"]:
            nodes = element.get("nodes", [])
            if nodes:
                # Get coordinates of the first node as a simple approach
                node_id = nodes[0]
                node = next((n for n in elements if n["type"] == "node" and n["id"] == node_id), None)
                if node:
                    ways_relations.append({
                        "type": element["type"],
                        "id": element["id"],
                        "lat": node.get("lat"),
                        "lon": node.get("lon"),
                        "tags": element.get("tags", {})
                    })
    return ways_relations

# Extract and combine the data
nodes_with_coords = extract_nodes_with_coords(data['elements'])
ways_and_relations = extract_ways_and_relations(data['elements'])
combined_data = nodes_with_coords + ways_and_relations

# Separate industrial areas and parks
industrial_areas = [element for element in combined_data if element['tags'].get('landuse') == 'industrial']
parks = [element for element in combined_data if element['tags'].get('leisure') == 'park']

# Convert to DataFrames
df_industrial_areas = pd.DataFrame(industrial_areas)
df_parks = pd.DataFrame(parks)

# Display the DataFrames
print("Industrial Areas DataFrame:")
display(df_industrial_areas.head())

print("Parks DataFrame:")
display(df_parks.head())

# Save the DataFrames to CSV files
df_industrial_areas.to_csv('../data/raw/industrial_areas.csv', index=False)
df_parks.to_csv('../data/raw/parks.csv', index=False)

print("Data has been saved to CSV files.")

Industrial Areas DataFrame:


Unnamed: 0,type,id,lat,lon,tags
0,way,15527184,48.393017,9.967492,"{'addr:housenumber': '85', 'addr:postcode': '8..."
1,way,22811953,48.35317,9.923839,"{'landuse': 'industrial', 'name': 'Industriege..."
2,way,28749351,48.429284,9.985314,"{'landuse': 'industrial', 'name': 'Industriege..."
3,way,60779922,48.374191,10.003569,"{'amenity': 'drugs_firm', 'landuse': 'industri..."
4,way,61940676,48.374296,9.965722,"{'landuse': 'industrial', 'name': 'Donaukraftw..."


Parks DataFrame:


Unnamed: 0,type,id,lat,lon,tags
0,way,4706683,48.39835,9.97332,"{'leisure': 'park', 'name': 'Blauinsel'}"
1,way,32891283,48.409391,9.966478,"{'leisure': 'park', 'name': 'Park Fort Unterer..."
2,way,363274190,48.410229,9.955586,{'leisure': 'park'}
3,way,910837410,48.398561,9.962432,{'leisure': 'park'}


Data has been saved to CSV files.


In [5]:
# Define coordinates
latitude = 48.4029558
longitude = 9.9559714

# Load the industrial areas and parks data
df_industrial_areas = pd.read_csv('../data/industrial_areas.csv')
df_parks = pd.read_csv('../data/parks.csv')

# Convert to GeoDataFrame
gdf_industrial_areas = gpd.GeoDataFrame(
    df_industrial_areas, geometry=gpd.points_from_xy(df_industrial_areas.lon, df_industrial_areas.lat))
gdf_parks = gpd.GeoDataFrame(
    df_parks, geometry=gpd.points_from_xy(df_parks.lon, df_parks.lat))

# Set the coordinate reference system (CRS)
gdf_industrial_areas.set_crs(epsg=4326, inplace=True)
gdf_parks.set_crs(epsg=4326, inplace=True)

# Define the point of interest
poi = Point(longitude, latitude)

# Buffer distance in meters for industrial areas and parks
buffer_distance_industrial = 5000 / 111320.0  # 5 km in degrees
buffer_distance_parks = 1000 / 111320.0  # 1 km in degrees

# Create buffers around the point of interest
poi_buffer_industrial = poi.buffer(buffer_distance_industrial)
poi_buffer_parks = poi.buffer(buffer_distance_parks)

# Find industrial areas and parks within the buffers
industrial_areas_within_buffer = gdf_industrial_areas[gdf_industrial_areas.intersects(poi_buffer_industrial)]
parks_within_buffer = gdf_parks[gdf_parks.intersects(poi_buffer_parks)]

# Create a folium map centered around the point of interest
map_center = [latitude, longitude]
m = folium.Map(location=map_center, zoom_start=13)

# Add the point of interest to the map
folium.Marker(
    location=[latitude, longitude],
    popup='Point of Interest',
    icon=folium.Icon(color='red')
).add_to(m)

# Add industrial areas to the map
for idx, row in industrial_areas_within_buffer.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=f"Industrial Area: {row['id']}",
        icon=folium.Icon(color='blue', icon='industry', prefix='fa')
    ).add_to(m)

# Add parks to the map
for idx, row in parks_within_buffer.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=f"Park: {row['id']}",
        icon=folium.Icon(color='green', icon='tree', prefix='fa')
    ).add_to(m)

# Add buffers to the map
folium.GeoJson(
    poi_buffer_industrial,
    style_function=lambda x: {'color': 'blue', 'fillColor': 'blue', 'fillOpacity': 0.1},
    name='5 km Buffer (Industrial)'
).add_to(m)

folium.GeoJson(
    poi_buffer_parks,
    style_function=lambda x: {'color': 'green', 'fillColor': 'green', 'fillOpacity': 0.1},
    name='1 km Buffer (Parks)'
).add_to(m)

# Add layer control to the map
folium.LayerControl().add_to(m)

# Display the map inline in Jupyter Notebook
display(HTML(m._repr_html_()))

---

## Load and Clean Industrial Areas

In [11]:
def preprocess_tags(tags_str: str) -> dict:
    """
    Preprocess the tags column by converting the string representation of a dictionary into an actual dictionary.

    Parameters:
    tags_str (str): String representation of a dictionary.

    Returns:
    dict: Dictionary containing the tags.
    """
    try:
        tags = ast.literal_eval(tags_str)
    except ValueError:
        tags = {}
    return tags

# Load the industrial areas CSV file
file_path_industrial = '../data/raw/industrial_areas.csv'
df_industrial_areas = pd.read_csv(file_path_industrial)

# Apply preprocessing to the tags column
df_industrial_areas['tags'] = df_industrial_areas['tags'].apply(preprocess_tags)

# Expand tags dictionary into separate columns
tags_df_industrial = df_industrial_areas['tags'].apply(pd.Series)
df_processed_industrial = pd.concat([df_industrial_areas.drop(columns=['tags']), tags_df_industrial], axis=1)

# Display the processed dataframe
df_processed_industrial.head()


Unnamed: 0,type,id,lat,lon,addr:housenumber,addr:postcode,addr:street,landuse,name,website,...,old_name,contact:email,contact:fax,contact:phone,description,check_date,cargo,company,logistics,note
0,way,15527184,48.393017,9.967492,85.0,89077.0,Woerthstraße,industrial,Hensoldt/ Airbus - Defence & Space,http://airbusdefenceandspace.com/,...,,,,,,,,,,
1,way,22811953,48.35317,9.923839,,,,industrial,Industriegebiet Donautal,,...,,,,,,,,,,
2,way,28749351,48.429284,9.985314,,,,industrial,Industriegebiet Franzenhauserweg,,...,,,,,,,,,,
3,way,60779922,48.374191,10.003569,,,,industrial,Nuvisan GmbH,,...,,,,,,,,,,
4,way,61940676,48.374296,9.965722,,,,industrial,Donaukraftwerk Wiblingen,,...,,,,,,,,,,


In [12]:
# Print the number of NaN values per column
print("Number of NaN values per column in industrial areas dataframe:")
print(df_processed_industrial.isna().sum())

# Keep only the required columns
required_columns_industrial = ['id', 'lat', 'lon', 'name']
df_processed_industrial = df_processed_industrial[required_columns_industrial]

# Display the optimized dataframe
print("Optimized Industrial Areas DataFrame:")
df_processed_industrial.head()

Number of NaN values per column in industrial areas dataframe:
type                             0
id                               0
lat                              0
lon                              0
addr:housenumber                39
addr:postcode                   40
addr:street                     39
landuse                          0
name                            13
website                         35
amenity                         39
plant:output:electricity        38
plant:source                    38
power                           35
wikidata                        38
wikipedia                       39
comment                         40
source                          40
url                             38
craft                           39
industrial                      39
alt_name                        40
barrier                         40
building                        37
frequency                       40
operator                        33
plant:method               

Unnamed: 0,id,lat,lon,name
0,15527184,48.393017,9.967492,Hensoldt/ Airbus - Defence & Space
1,22811953,48.35317,9.923839,Industriegebiet Donautal
2,28749351,48.429284,9.985314,Industriegebiet Franzenhauserweg
3,60779922,48.374191,10.003569,Nuvisan GmbH
4,61940676,48.374296,9.965722,Donaukraftwerk Wiblingen


In [13]:
# Load the parks CSV file
file_path_parks = '../data/raw/parks.csv'
df_parks = pd.read_csv(file_path_parks)

# Apply preprocessing to the tags column
df_parks['tags'] = df_parks['tags'].apply(preprocess_tags)

# Expand tags dictionary into separate columns
tags_df_parks = df_parks['tags'].apply(pd.Series)
df_processed_parks = pd.concat([df_parks.drop(columns=['tags', 'type']), tags_df_parks], axis=1)

# Display the processed dataframe
df_processed_parks.head()

Unnamed: 0,id,lat,lon,leisure,name
0,4706683,48.39835,9.97332,park,Blauinsel
1,32891283,48.409391,9.966478,park,Park Fort Unterer Eselsberg
2,363274190,48.410229,9.955586,park,
3,910837410,48.398561,9.962432,park,


The industrial areas and parks data provide valuable context about the surroundings of the measurement spot at Lupferbrücke (Ulm in der Wanne). However, since these values are static and do not change over time, they do not add value to the time series analysis or predictions. With no other location data for comparison, this static information does not contribute to our use case focused on the Lupferbrücke sensor. Therefore, while useful for background understanding, these datasets do not enhance the predictive modeling efforts based on the dynamic pollutant data from the sensor.