# POI Category by Population

With this example, you can simply select a POI category and get a score weighted by the population indicating how high
the density of the POI category is in a specific region relative to its population.

## 1. Set Parameters

1. Set the H3 resolution to aggregate the results on.

    To see the average size of a hexagon at a given resolution go to the
    [official H3 documentation](https://h3geo.org/docs/core-library/restable). The currently set resolution 8 has on
    average an edge length of 0.46 km which can be freely interpreted as a radius.

In [None]:
resolution = 8

2. These are the current 24 high-level categories for POIs:

    ```zsh
    administration, airport, apartment, art_culture, automobile, beauty, cafe, drinks, education, entertainment, food, groceries, medical, misc, office, public_service, public_transportation, recreation, religious_building, service, shopping, social_service, sport, tourism, wholesaler
    ```

In [None]:
category = 'groceries'

3. Set per how many citizens you want to weight the POI count for a given category.

In [None]:
per_x_citizens = 1000

4. You can provide polygons as GeoJSONs to select a subregion. Otherwise, data form the entire database will be
analyzed. (The default GeoJSON is a rough representation of Lisbon, Portugal.)

In [None]:
polygon_coords = '[[[-9.092559814453125,38.794500078219826],[-9.164314270019531,38.793429729760994],[-9.217529296875,38.76666579487878],[-9.216842651367188,38.68792166352608],[-9.12139892578125,38.70399894245585],[-9.0911865234375,38.74551518488265],[-9.092559814453125,38.794500078219826]]]'

## 2. Send Queries

#### Initialize dbt controller

In [None]:
from kuwala.modules.common import get_dbt_controller

dbt_controller = get_dbt_controller()

#### Get population per hexagon.

In [None]:
from kuwala.modules.population_controller import get_population_in_polygon

population = get_population_in_polygon(dbt_controller=dbt_controller, resolution=resolution, polygon_coords=polygon_coords)

population.head(10)

#### Get number of POIs belonging to a selected category per hexagon.

In [None]:
from kuwala.modules.poi_controller import get_pois_by_category_in_polygon

pois = get_pois_by_category_in_polygon(dbt_controller=dbt_controller, category=category, resolution=resolution, polygon_coords=polygon_coords)

pois.head(n=10)

## 3. Transform the Data

#### Create a Spark session.

In [None]:
from kuwala.modules.common import get_spark_session

sp = get_spark_session(memory_in_gb=16)

#### Calculate number of POIs per x citizens

In [None]:
from pyspark.sql.functions import col

number_of_pois_in_category = f'number_of_{category}'
category_per_x = f'{category}_per_{per_x_citizens}'
population_in_x = f'population_in_{per_x_citizens}'

pois = sp.createDataFrame(pois)
population = sp.createDataFrame(population)
pois = pois.withColumnRenamed('h3_index', 'join_h3_index').withColumnRenamed('count', number_of_pois_in_category)
population = population \
    .withColumn(population_in_x, col('total') / per_x_citizens) \
    .select('h3_index', population_in_x)
result = population \
    .join(pois, population.h3_index == pois.join_h3_index, 'left') \
    .drop('join_h3_index') \
    .fillna(0, subset=[number_of_pois_in_category]) \
    .withColumn(category_per_x, col(number_of_pois_in_category) / col(population_in_x)) \
    .fillna(0, subset=[category_per_x])

result.show(n=10)

#### Normalize the score

In [None]:
from kuwala.modules.common import scale_spark_columns

result = scale_spark_columns(df=result, columns=[category_per_x]) \
    .select('h3_index', f'{category_per_x}_scaled', category_per_x, number_of_pois_in_category, population_in_x)

result.show(n=10)

## 4. Visualize Results

#### Pandas Profiling Report

In [None]:
from pandas_profiling import ProfileReport

result_pd = result.toPandas()
profile = ProfileReport(result_pd, title="Pandas Profiling Report", explorative=True)

profile.to_notebook_iframe()

#### Map

In [None]:
from unfolded.map_sdk import UnfoldedMap
from sidecar import Sidecar
from uuid import uuid4

unfolded_map = UnfoldedMap()
sc = Sidecar(title=f'{category} by Population', anchor='split-right')

with sc:
    display(unfolded_map)

dataset_id=uuid4()

unfolded_map.add_dataset({
    'uuid': dataset_id,
    'label': f'{category} per thousand citizens',
    'data': result_pd
})