### Objective
In this project we will explore how geospatial data can optimize OOH marketing & advertising.
### Problem Statement
Overlay consumers' mobile device data with other data sources to answer following three questions:
1. Which billboards have the highest potential reach i.e. number of unique audiences?
2. When (time of the day) are the audiences most likely to see the advertisement?
3. Where should I build my next billboard?

### Brief Overview of the Data
**Mobile Device Data:** Consumers' data that has following features:
1. advertising_id
2. latitude
3. longitude
4. unix_timestamp
5. place_name
6. brand_name
7. category_name

**Note:** *Please note that actual advertising_id has been encrypted to protect Personally Identifiable Information (PII).*
**Note:** *Provided sample file has been preprocessed to some extent. It excludes place_name, brand_name and category_name.*

**Billboard Locations:** It has following features:
1. billboard_object_id
5. latitude
6. longitude
7. type

**District Map:** Shapefile of the map of the city. To learn more about shapefile, refer [here](https://desktop.arcgis.com/en/arcmap/10.3/manage-data/shapefiles/what-is-a-shapefile.htm#:~:text=A%20shapefile%20is%20a%20simple,%2C%20or%20polygons%20(areas)).

In [1]:
# Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json

# To process mobile device data (> 3GB)
from pyspark.sql.functions import round, udf, col, hour, minute
from pyspark.sql import SparkSession
from pyspark.sql.types import DateType, TimestampType

# To process shapefile
import geopandas as gpd

# To perform spatial join efficiently
from dask import dataframe as dd
import dask.distributed

# To visualize the data
import plotly.graph_objects as go
import plotly.express as px
import json

In [3]:
# Input files
kl_shapefile = "kl_map\kl_map.shp"
device_id = "kl_test_device_id.csv"
billboard_location = "kl_billboard.csv"

### Joining Billboard Locations with District Map
Billboard locations will be joined with district map to determine in which district a billboard is located. The information is useful for media owners in which district they need to build their next billboard.

In [4]:
# Create function to join billboard_location with kl_shapefile
def joinPolygonPoint(df, lat, lon, location):
    # Load KL shape file
    gdf = gpd.read_file(kl_shapefile)
    gdf.crs = {'init' :'epsg:4326'}
    
    # Create copy of dataframe
    local_df = df.copy()
    
    # Convert dask dataframe to geodataframe and perform spatial join
    try:
        joined_df = gpd.GeoDataFrame(local_df, crs={'init': 'epsg:4326'}, geometry = gpd.points_from_xy(df.longitude, df.latitude))
        joined_df = gpd.sjoin(adId, gdf, how = "left", op = "within")
        joined_df.drop("geometry", axis = 1, inplace = True)
        return joined_df.NAMA_DM
    except ValueError as ve:
        print(ve)
        print(ve.stacktrace())

In [6]:
# Load billboard data
billboard = dd.read_csv(billboard_location, encoding= 'unicode_escape')
print("There are {} partitions. ".format(billboard.npartitions))
print(billboard.head())

There are 1 partitions. 
   latitude   longitude    type  \
0  3.213979  101.638397  STATIC   
1  3.214458  101.642018  STATIC   
2  3.228400  101.643491  STATIC   
3  3.227013  101.637439  STATIC   
4  3.227240  101.637501  STATIC   

                                 billboard_object_id  
0  BBFDEFB262A6E49AB168FCB59FD049733B95A49E7DCCDB...  
1  C678ECDE8AD8F4B9D37F16D2ACCB39E870E97231719934...  
2  00DCABF68CAE73603FBC342890CBC5670E720316F1CC4A...  
3  D8F7048E3F00D50AC0390FACF8A64BEBD9857A456AC337...  
4  287DB8C2BB57EE452DBBFFE282EFCBE2C05920B27A6A12...  
