# Population Density Variables Generation

In this notebook we use population density data from Geodata - Eurostat to generate an exogenous variable for our dataset, which could potentially be useful for the forecasts.

The data is available in the following link, specifically filtering for Austria: https://ec.europa.eu/eurostat/web/gisco/geodata

#### Load libraries

In [None]:
import geopandas as gpd
import pandas as pd
import numpy as np
import h3

#### Load auxiliary data

In [12]:
# Load population density file without geometry, focusing on population and H3 ID
file_path = 'auxiliary_data/Population_density/kontur_population_AT_20231101.gpkg'
data = gpd.read_file(file_path, layer='population', ignore_geometry=True)

In [13]:
df_coord = pd.read_csv('auxiliary_data/gw_coordinates_df.csv')

### Preparing data

In [None]:
# Function to convert from latitude and longitude to H3 cell
def latlon_to_h3(lat, lon, resolution=8):
    return h3.geo_to_h3(lat, lon, resolution)

In [15]:
# Convert latitude and longitude in each row to H3 cell IDs and create a new column 'h3' in df_coord
df_coord['h3'] = df_coord.apply(lambda row: latlon_to_h3(row['latitude'], row['longitude']), axis=1)

# Merge population data with coordinates dataset
df_coord = df_coord.merge(data, on='h3', how='left')

The previous process generates a significant amount of missing values since some of the coordinates are not directly related to any h3 cell. Therefore in the following code we impute the missing values by looking for the nearest h3 cell and taking that value instead.

In [16]:
# Function to find the nearest H3 cell with population data
def find_nearest_h3(h3_cell, data, max_k=3):
    """
    Search for the nearest H3 cell with population density data within a k-ring.
    h3_cell: The original H3 cell.
    data: The DataFrame containing H3 cells and population data.
    max_k: Maximum distance of neighborhood to consider (k-ring).
    """
    # Check in increasing rings of neighbors around the original H3 cell
    for k in range(1, max_k + 1):
        # Get the H3 cells within distance k
        neighboring_cells = h3.k_ring_distances(h3_cell, k)[k]
        
        # Check if any of the neighboring cells have population data
        for neighbor in neighboring_cells:
            if neighbor in data['h3'].values:
                # If a neighboring cell with population data is found, return it
                return data.loc[data['h3'] == neighbor, 'population'].values[0]
    
    # If no nearby cell has population data, return NaN or a default value
    return np.nan

# Fill NaN population values by searching for the nearest H3 cell with data
df_coord['population'] = df_coord.apply(
    lambda row: row['population'] if not np.isnan(row['population']) else find_nearest_h3(row['h3'], data), 
    axis=1
)

### Saving the resulting file

In [17]:
df_coord.to_csv('population_density_full.csv', index = False)