## Population Grid - Preprocessing

This notebook describes the preprocessing of census data rendered as a population grid for Germany for the pharmalink project. \
The goal is to create a custom GeoPackage with one layer per Bundesland (state) containing all of its grid cells with a population > 0. \
Said GeoPackage is included in the pharmalink package as an essential part of its internal data.

### Source: [Zensus 2022](https://www.zensus2022.de) 
The source is available as Open Data from the 2022 census conducted by the federal and state statistical offices. 

© Statistische Ämter des Bundes und der Länder, 2024, [Data license Germany – attribution – version 2.0](https://www.govdata.de/dl-de/by-2-0) (Daten verändert)

The source was last accessed on 2024-09-10.

### Description:
Census data rendered as an INSPIRE-conforming 100mx100m grid with an integer value representing the number of people living within each cell. \
For further information, see the "Datensatzbeschreibung_Bevoelkerungszahl_Gitterzellen.xlsx" file.

[Website](https://www.zensus2022.de/DE/Ergebnisse-des-Zensus/_inhalt.html#Gitterdaten2022) and [File](https://www.zensus2022.de/static/Zensus_Veroeffentlichung/Zensus2022_Bevoelkerungszahl.zip)

In [1]:
import pathlib as path
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
import lzma

In [2]:
# Establish notebook path for handling relative paths in the notebook
notebook_path = path.Path().resolve()

if notebook_path.stem != "population_grid":
    raise Exception(
        "Notebook file root must be set to parent directory of the notebook. Please resolve and re-run."
    )

In [3]:
# For source information, see above
pop_cells_file = notebook_path.joinpath(
    "Zensus2022_Bevoelkerungszahl", "Zensus2022_Bevoelkerungszahl_100m-Gitter.csv"
)

# Create a DataFrame from the population grid file
pop_cells = pd.read_csv(
    pop_cells_file,
    sep=";",
    header=0,
    names=["id", "x_mp", "y_mp", "population"],
    dtype={"id": str, "x_mp": int, "y_mp": int, "population": int},
    index_col="id",
)

# Add the polygon described by the centroid x and y coordinates to the DataFrame
pop_cells["geometry"] = pop_cells.apply(
    lambda row: Polygon(
        [
            (row["x_mp"] - 50, row["y_mp"] - 50),
            (row["x_mp"] + 50, row["y_mp"] - 50),
            (row["x_mp"] + 50, row["y_mp"] + 50),
            (row["x_mp"] - 50, row["y_mp"] + 50),
        ]
    ),
    axis=1,
)

# Clean up the DataFrame
pop_cells = pop_cells.reset_index(drop=True)
pop_cells = pop_cells[["geometry", "population"]]

# Create a GeoDataFrame from the DataFrame
pop_cells = gpd.GeoDataFrame(pop_cells, crs="EPSG:3035")

In [4]:
# Import the Bundesländer areas from sources/admin_areas
admin_areas_path = notebook_path.parent.joinpath("admin_areas", "admin_areas.gpkg")

admin_areas = gpd.read_file(admin_areas_path, engine="pyogrio", use_arrow=True)

bundeslaender = admin_areas[admin_areas["level"] == "land"]

bundeslaender = bundeslaender.set_index("regkey")

bundeslaender = bundeslaender.to_crs("EPSG:3035")

In [7]:
# Filter the cells for each Bundesland and write them to a layer in the output GeoPackage

output_file = notebook_path.joinpath("population_grid.gpkg.xz")

if output_file.exists():
    output_file.unlink()

with lzma.open(output_file, "wb", preset=9) as archive:

    for bundesland in bundeslaender.iterrows():

        land_name = bundesland[1]["geo_name"]
        land_geometry = bundesland[1]["geometry"]

        land_cells = pop_cells[pop_cells.intersects(land_geometry)]

        land_cells.to_file(archive, layer=land_name, driver="GPKG")

        print(f"Wrote {len(land_cells)} cells for {land_name}")

Wrote 138894 cells for Schleswig-Holstein
Wrote 29195 cells for Hamburg
Wrote 416520 cells for Niedersachsen
Wrote 12730 cells for Bremen
Wrote 548405 cells for Nordrhein-Westfalen
Wrote 191956 cells for Hessen
Wrote 164209 cells for Rheinland-Pfalz
Wrote 346891 cells for Baden-Württemberg
Wrote 564846 cells for Bayern
Wrote 39650 cells for Saarland
Wrote 41053 cells for Berlin
Wrote 139081 cells for Brandenburg
Wrote 88185 cells for Mecklenburg-Vorpommern
Wrote 175398 cells for Sachsen
Wrote 101430 cells for Sachsen-Anhalt
Wrote 92199 cells for Thüringen
