# Where are your predictions located?

I have read the discussion: [How to create grid points automatically](https://www.kaggle.com/c/indoor-location-navigation/discussion/237776) where I found interesting answers. My work is inspired by such answers.

In this notebook, you can check whether your predicted waypoints really stand or not. 
*Are your waypoints located between your predicted floor bounds?*
*Are your waypoints inside stores?*

Smarter usage of post-process techniques such as snap-to-grid can be applied after a careful exploration of your predictions. This notebook aims to help you decide which waypoints should be post-processed and which ones should remain as they are. 

Do not hesitate to comment if you spot anything wrong. For instance, I did not understand why some of the Polygons of the GeoJSON data were defined as MultiPolygons. I treated them equally. 

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import shapely
import json

## Example - How to use Shapely to work with GeoJSON data

Lets take an example to show how to use Shapely to get: floor's polygons, store's polygons and their difference.

We are going to use GeoJSON data located in metadata folder. Then, we will use shapely to compute the shapes of our points of interest. 

In [None]:
site = '5a0546857ecc773753327266'
floorNo = 'B1' 
with open(f"../input/indoor-location-navigation/metadata/{site}/{floorNo}/geojson_map.json") as json_file:
    geofloor_data = json.load(json_file)

The first value of the geometry coordinates is always the **floor coordinates**. The rest is data from stores.

In [None]:
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
import shapely.ops as so

floor_polygon = Polygon(np.array(geofloor_data['features'][0]['geometry']['coordinates'][0]))  # Floor polygon is always the first value
store_polygons_l = [Polygon(features['geometry']['coordinates'][0]) for features in geofloor_data['features'][1:]]  # The rest are stores data

While the floor is computed just with one polygon the stores require several. Then, we use *unary_union* function from shapely to join them. 

In [None]:
store_polygons = so.unary_union(store_polygons_l)
safe_area_polygons = floor_polygon.difference(store_polygons)

### Floor Polygon

In [None]:
floor_polygon

### Store Polygons

In [None]:
store_polygons

### Safe Zone (floor - store)

In [None]:
safe_area_polygons

## Check where are located your predictions

First, we will compute the polygons for each floor. Then we can use Shapely to detect where our predicted points are located.

In [None]:
def split_col(df):
    """
    Split submission site/path/timestamp into individual columns.
    """
    df = pd.concat(
        [
            df["site_path_timestamp"]
            .str.split("_", expand=True)
            .rename(columns={0: "site", 1: "path", 2: "timestamp"}),
            df,
        ],
        axis=1,
    ).copy()
    return df

sub = split_col(
    pd.read_csv("../input/multioutput-mlp-weighted-loss/submission.csv")
)
true_locs = pd.read_csv("../input/indoor-location-train-waypoints/train_waypoints.csv")
# Add floor No to sub file
sub = sub.merge(true_locs[["site", "floor", "floorNo"]].drop_duplicates())

In [None]:
def add_predictions_location(args):
    (site, floorNo) , df_submission = args
    df_result = df_submission.copy()
    with open(f"../input/indoor-location-navigation/metadata/{site}/{floorNo}/geojson_map.json") as json_file:
        geofloor_data = json.load(json_file)
    with open(f"../input/indoor-location-navigation/metadata/{site}/{floorNo}/floor_info.json") as json_file:
        floor_info = json.load(json_file)
    type_poly = geofloor_data['features'][0]['geometry']['type']
    if type_poly == 'Polygon':
        polygon = np.array(geofloor_data['features'][0]['geometry']['coordinates'][0])
    else:
        polygon = np.array(geofloor_data['features'][0]['geometry']['coordinates'][0][0])
    floor_polygons = Polygon(polygon)
    store_polygons_l = [Polygon(features['geometry']['coordinates'][0]) for features in geofloor_data['features'][1:]]
    store_polygons = so.unary_union(store_polygons_l)
    safe_area_polygons = floor_polygons.difference(store_polygons)
    x_max, x_min = polygon[:, 0].max(), polygon[:, 0].min()
    y_max, y_min = polygon[:, 1].max(), polygon[:, 1].min()
    df_result['x_scaled'] = x_min + df_result['x'] * (x_max - x_min) / floor_info['map_info']['width']
    df_result['y_scaled'] = y_min + df_result['y'] * (y_max - y_min) / floor_info['map_info']['height']
    df_result['InFloor'] = df_result.apply(lambda row: floor_polygons.contains(Point(row['x_scaled'], row['y_scaled'])), axis=1)
    df_result['InStore'] = df_result.apply(lambda row: store_polygons.contains(Point(row['x_scaled'], row['y_scaled'])), axis=1)
    df_result['InSafe'] = df_result.apply(lambda row: safe_area_polygons.contains(Point(row['x_scaled'], row['y_scaled'])), axis=1)
    return df_result

In [None]:
import multiprocessing
from tqdm import tqdm

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(add_predictions_location, sub.groupby(['site', 'floorNo']))
    dfs = tqdm(dfs)
    dfs = list(dfs)
sub = pd.concat(dfs).sort_values('site_path_timestamp')

In [None]:
sub['InFloor'].value_counts(normalize=True)

In [None]:
sub['InStore'].value_counts(normalize=True)

In [None]:
sub['InSafe'].value_counts(normalize=True)

Thanks for reading! Hopefully this notebook can help you make a smarter post-process of your raw predictions.