# Kuwala - Popularity Correlation

[![open_in_colab][colab_badge]][colab_notebook_link]
<!-- [![open_in_binder][binder_badge]][binder_notebook_link] -->

[colab_badge]: https://colab.research.google.com/assets/colab-badge.svg
[colab_notebook_link]: https://colab.research.google.com/github/foursquare/fsq-studio-sdk-examples/blob/master/python-notebooks/10%20-%20Kuwala%20Popularity%20Correlation.ipynb
<!-- [binder_badge]: https://mybinder.org/badge_logo.svg
[binder_notebook_link]: https://mybinder.org/v2/gh/foursquare/fsq-studio-sdk-examples/master?urlpath=lab/tree/python-notebooks/10%20-%20Kuwala%20Popularity%20Correlation.ipynb -->

With this notebook you can correlate any value associated with a geo-reference with the Google popularity score. You
can upload your own file as a CSV. The only thing that is necessary to make it work is to have columns for latitude and
longitude and column headers.

The value columns can be specific to your use case, e.g., scooter bookings, sales in shops or crimes. The popularity
score is aggregated on a week. So ideally, the value columns that you want to correlate are aggregated on a weekly
timeframe as well.

As an example we are using an open data set from Uber that gives us the traversals of rides through specific hexagons.
You can find the raw data on their [open data platform](https://movement.uber.com/?lang=en-US). We preprocessed the raw
data so that the traversals are already aggregated per week.

### Kuwala

Kuwala is an open-source tool to build data products fast by integrating and connecting different data sources from
various domains. Other pipelines than the popularity score include worldwide scalable POI and demographics data. Check
out Kuwala on [GitHub](https://github.com/kuwala-io/kuwala) to access more data sources and functionalities.

## Dependencies

This notebook requires the following Python dependencies:

- `foursquare.map-sdk`: The Studio Map SDK
- `pandas`: DataFrame library
- `h3`: Hexagonal hierarchical geospatial indexing system
- `geojson`: Helpers for working with GeoJSON data
- `pandas-profiling`: Exploratory data analysis on pandas DataFrames

If running this notebook in Binder, these dependencies should already be installed. If running in Colab, the next cell will install these dependencies.

In [None]:
# If in Colab, install this notebook's required dependencies
import sys
if "google.colab" in sys.modules:
    !pip install 'foursquare.map_sdk>=3.0.1' pandas h3 geojson 'pandas-profiling>=3'

## Imports

In [None]:
import h3
import pandas as pd
from geojson import Polygon
from pandas_profiling import ProfileReport
import foursquare.map_sdk as map_sdk
from uuid import uuid4

## 1. Set Parameters

1. Set the file path to your CSV and the delimiter. Simply place your file under `notebooks/data` from within the
Jupyter environment or on your local file system.

In [None]:
file_path = 'https://4sq-studio-public.s3.us-west-2.amazonaws.com/sdk/examples/sample-data/lisbon_uber_traversals.csv'
delimiter = ';'

2. Set the H3 resolution to aggregate the results on. To see the average size of a hexagon at a given resolution go to
the [official H3 documentation](https://h3geo.org/docs/core-library/restable). The currently set resolution 8 has on
average an edge length of 0.46 km which can be freely interpreted as a radius. For this example we are using the
precalculated aggregations of the popularity score at resolution 8. To use a different resolution head over to
[Kuwala's repository](https://github.com/kuwala-io/kuwala) to set up your data warehouse.

In [None]:
resolution = 8

3. Set the column names for the coordinates and the columns of the file you want to correlate.

In [None]:
lat_column = 'latitude'
lng_column = 'longitude'
value_columns = ['weekly_traversals']

4. You can provide polygon coordinates as a GeoJSON-conforming array to select a subregion. Otherwise, data from the entire
country of Portugal will be analyzed. (The default coordinates are a rough representation of Lisbon, Portugal.)

In [None]:
polygon_coords = [[[-9.092559814453125,38.794500078219826],[-9.164314270019531,38.793429729760994],[-9.217529296875,38.76666579487878],[-9.216842651367188,38.68792166352608],[-9.12139892578125,38.70399894245585],[-9.0911865234375,38.74551518488265],[-9.092559814453125,38.794500078219826]]]

## 2. Load dataframes

#### Load the file to correlate with the popularity score

In [None]:
def add_h3_index_column(row):
    return h3.geo_to_h3(float(row[lat_column]), row[lng_column], resolution)


def polyfill_polygon(poly):
    h3_indexes = h3.polyfill(
        dict(poly),
        resolution,
        geo_json_conformant=True
    )

    return h3_indexes

df_file = pd.read_csv(file_path, sep=delimiter)
df_file['h3_index'] = df_file.apply(add_h3_index_column, axis=1)

if polygon_coords:
    polygon = Polygon(polygon_coords)
    h3_index_in_polygon = list(polyfill_polygon(poly=polygon))
    bool_series = df_file.h3_index.isin(h3_index_in_polygon)
    df_file = df_file[bool_series]

df_file = df_file[['h3_index', *value_columns]].groupby(['h3_index']).sum()

df_file.head(10)

#### Get weekly popularity per hexagon

In [None]:
url = 'https://4sq-studio-public.s3.us-west-2.amazonaws.com/sdk/examples/sample-data/portugal_popularity.csv'
popularity = pd.read_csv(url, sep=';')

if polygon_coords:
    polygon = Polygon(polygon_coords)
    h3_index_in_polygon = list(polyfill_polygon(poly=polygon))
    bool_series = popularity.h3_index.isin(h3_index_in_polygon)
    popularity = popularity[bool_series]

popularity.head(10)

## 3. Join dataframes

In [None]:
result = df_file.merge(popularity, on='h3_index', how='left')
result[['weekly_popularity']] = result[['weekly_popularity']].fillna(value=0)

result.head(10)

## 4. Visualize Results

#### Pandas Profiling Report

In [None]:
profile = ProfileReport(result, title="Pandas Profiling Report", explorative=True)

profile.to_notebook_iframe()

#### Map

In [None]:
map = map_sdk.create_map(
  api_key="<your-api-key>"
)
map

In [None]:
dataset_id_combined=str(uuid4())

map.add_dataset(
    id=dataset_id_combined,
    label='Correlated values',
    data=result,
    auto_create_layers=False
)

map.add_layer({
    'id': 'traversals',
    'type': 'h3',
    'data_id': dataset_id_combined,
    "label": "Traversals",
    'fields': {
        'hex_id': 'h3_index'
    },
    'config': {
        "visual_channels": {
            "colorField": {"name": "weekly_traversals", "type": "real"},
            "colorScale": "quantile"
        },
    }
})

map.set_view(longitude=-9.13314, latitude=38.70086, zoom=10)