## Pharmacies

This notebook describes the preprocessing of data about all pharmacies in Germany.
The temporary source is OpenStreetMap (and its amenity=pharmacy tag), official data from ABDA (German Federal Association of Pharmacists) is pending.

The goal is to extract name and location for each pharmacy and store the data in the main pharmalink module.

### Source:

[OpenStreetMap](https://openstreetmap.org/copyright) planet file by [Geofabrik](https://geofabrik.de), Germany extract.

[Website (Germany -> .osm.pbf)](https://download.geofabrik.de/europe.html) and [File](https://download.geofabrik.de/europe/germany-latest.osm.pbf)


In [1]:
import pathlib as path
import requests as req
from tqdm.auto import tqdm
import osmium as osm
import geopandas as gpd
from io import BytesIO
import lzma

In [2]:
# Establish notebook path for handling relative paths in the notebook
notebook_path = path.Path().resolve()

if notebook_path.stem != "pharmacies":
    raise Exception(
        "Notebook file root must be set to parent directory of the notebook. Please resolve and re-run."
    )

In [3]:
# Ensure the osm-planet extract for Germany is available
osm_germany = notebook_path.joinpath("germany.osm.pbf")

if not osm_germany.exists():
    file_url = "https://download.geofabrik.de/europe/germany-latest.osm.pbf"

    request = req.get(file_url, stream=True)
    request.raise_for_status()

    file_size = int(request.headers.get("content-length", 0))

    with tqdm.wrapattr(
        open(osm_germany, "wb"),
        "write",
        miniters=1,
        total=file_size,
        desc=f"Downloading {osm_germany.name}",
    ) as file:
        for chunk in request.iter_content(chunk_size=8192):
            file.write(chunk)

In [4]:
# This filters all nodes and ways with the tag "amenity=pharmacy" and extracts the name and geometry
file_processor = (
    osm.FileProcessor(str(osm_germany))
    .with_areas()
    .with_filter(osm.filter.EmptyTagFilter())
    .with_filter(osm.filter.EntityFilter(osm.osm.NODE | osm.osm.WAY))
    .with_filter(osm.filter.TagFilter(("amenity", "pharmacy")))
    .with_filter(osm.filter.GeoInterfaceFilter(tags=["name"]))
)

pharmacies = gpd.GeoDataFrame.from_features(
    file_processor, crs="EPSG:4326", columns=["name", "geometry"]
)

# Ensure that all pharmacies are represented as points and convert via centroid if necessary
pharmacies["geometry"] = pharmacies["geometry"].apply(
    lambda geom: geom.centroid if not geom.geom_type == "Point" else geom
)

In [5]:
len(pharmacies)  # should be ~ 17.790 as of 2024-11
# Official value reported by ABDA for 2023 is 17.571
# The discrepancy is due to the fact that the OSM data might not be up-to-date or include mis-tagged pharmacies
# With about 1% discrepancy, the data is considered accurate enough for the purpose of this project

17793

In [6]:
# Save the pharmacies to a lzma compressed GeoPackage
output_file = notebook_path.joinpath("pharmacies.gpkg.xz")

# Buffering is needed because Pyogrio does not support writing to open file handlers directly
with BytesIO() as buffer:
    pharmacies.to_file(buffer, layer="pharmacies", driver="GPKG")
    buffer.seek(0)

    with lzma.open(output_file, "wb", preset=9) as file:
        file.write(buffer.read())