# MultiOutput MLP - Weighted loss

## References

This code uses the dataset [Indoor Unified Wifi](https://www.kaggle.com/kokitanisaka/indoorunifiedwifids). Some code snippets are inspired by [MLP by Keras with Unified Wi-Fi Feats](https://www.kaggle.com/jerrymark611/mlp-by-keras-with-unified-wi-fi-feats). Thank you for your nice contributions.

## The code

The code proposes an architecture based on shared and specialized layers for each output. The network is optimized using custom loss that weights the RMSE of (x,y) predictions and the accuracy of floor predictions.


### Libraries and custom regression metric

In the first cell we define a custom metric that just computes the root mean squared error of our predictions.

In [None]:
import argparse
import pickle
import random
import os

import keras
import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from tensorflow_addons.layers import WeightNormalization
import tensorflow as tf
import tensorflow.keras.layers as L
import tensorflow.keras.models as M
import tensorflow.keras.backend as K
from tensorflow.keras import metrics

def root_mean_squared_error(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_pred - y_true)))

### Data reading

In [None]:
feature_dir = "../input/indoorunifiedwifids"
data = pd.read_csv(feature_dir + '/train_all.csv')
df_submission = pd.read_csv('../input/simple-99-accurate-floor-model/submission.csv')
df_submission.set_index('site_path_timestamp', inplace=True)
df_test = pd.read_csv(feature_dir + '/test_all.csv')
df_test.rename({'site_path_timestamp': 'site_path_timestamp_2'}, axis=1, inplace=True)
df_test.set_index('site_path_timestamp_2', inplace=True)
df_test.loc[df_submission.index, 'floor'] = df_submission.floor

### Most basic parameters

Play with the number of epochs and the batch size in order to improve the results!

In [None]:
# Most basic parameters to tune
N_FOLDS = 5
EPOCHS = 400
NUM_FEATS = 100
BATCH_SIZE = 600

### Categorical encoding

This part of the code encodes the wifi bssids, the sites and the floors (targets). In contrast to [MLP by Keras with Unified Wi-Fi Feats](https://www.kaggle.com/jerrymark611/mlp-by-keras-with-unified-wi-fi-feats) we do not scale the RRSSID values.

In [None]:
BSSID_FEATS = [f'bssid_{i}' for i in range(NUM_FEATS)]
RSSI_FEATS = [f'rssi_{i}' for i in range(NUM_FEATS)]

wifi_bssids = []
for i in range(100):
    wifi_bssids.extend(data.iloc[:, i].values.tolist())
    wifi_bssids_test = []
for i in range(100):
    wifi_bssids_test.extend(df_test.iloc[:, i].values.tolist())
wifi_bssids_test = list(set(wifi_bssids_test))
wifi_bssids = list(set(wifi_bssids))
wifi_bssids.extend(wifi_bssids_test)

# Encoders
le = LabelEncoder()
le.fit(wifi_bssids)
le_site = LabelEncoder()
le_site.fit(data['site_id'])
floors_onehot = pd.get_dummies(data['floor'])
for i in BSSID_FEATS:
    data.loc[:, i] = le.transform(data.loc[:, i])
    df_test.loc[:, i] = le.transform(df_test.loc[:, i])
data.loc[:, 'site_id'] = le_site.transform(data.loc[:, 'site_id'])
df_test.loc[:, 'site_id'] = le_site.transform(df_test.loc[:, 'site_id'])

### Training and inference

For simplicity the training and inference code is provided together. The contribution of this part of the code is the addition of shared and specialized layers in the architecture. Also, a weighted loss is provided in order to target better the competition metric.

In [None]:
preds_x = np.zeros(df_test.shape[0])
preds_y = np.zeros(df_test.shape[0])
floor_preds = np.zeros((df_submission.shape[0], 11))
kf = GroupKFold(n_splits=N_FOLDS)
for fold, (trn_idx, val_idx) in enumerate(kf.split(data, groups=data.loc[:, 'path'])):
    train, floors_train = data.iloc[trn_idx], floors_onehot.iloc[trn_idx]
    val, floors_val = data.iloc[val_idx], floors_onehot.iloc[val_idx]
    train_x, train_y, train_f = train['x'], train['y'], floors_train
    val_x, val_y, val_f = val['x'], val['y'], floors_val

    # NN definition

    # Embedding BSSIDS
    input_embd_layer = L.Input(shape=(NUM_FEATS,))
    x1 = L.Embedding(len(wifi_bssids), 64)(input_embd_layer)
    x1 = L.Flatten()(x1)

    input_layer = L.Input(NUM_FEATS, )

    x2 = L.BatchNormalization()(input_layer)
    x2 = L.Dense(NUM_FEATS * 64, activation='selu')(x2)
    x2 = L.LeakyReLU()(x2)

    # site
    input_site_layer = L.Input(shape=(1,))
    x3 = L.Embedding(len(data['site_id'].unique()), 8)(input_site_layer)
    x3 = L.Flatten()(x3)

    x = L.Concatenate(axis=1)([x1, x2, x3])

    x = L.BatchNormalization()(x)
    x = L.Dropout(0.3)(x)
    x = WeightNormalization(L.Dense(512, activation='selu'))(x)

    x = L.BatchNormalization()(x)
    x = L.Dropout(0.3)(x)
    x_shared = WeightNormalization(L.Dense(256, activation='selu'))(x)

    x = L.BatchNormalization()(x_shared)
    x = L.Dropout(0.3)(x)
    x = WeightNormalization(L.Dense(128, activation='selu'))(x)

    x = L.BatchNormalization()(x)
    x = L.Dropout(0.3)(x)
    x = WeightNormalization(L.Dense(64, activation='selu'))(x)

    y = L.BatchNormalization()(x_shared)
    y = L.Dropout(0.3)(y)
    y = WeightNormalization(L.Dense(128, activation='selu'))(y)

    y = L.BatchNormalization()(y)
    y = L.Dropout(0.3)(y)
    y = WeightNormalization(L.Dense(64, activation='selu'))(y)

    f = L.BatchNormalization()(x_shared)
    f = L.Dropout(0.3)(f)
    f = WeightNormalization(L.Dense(128, activation='selu'))(f)

    f = L.BatchNormalization()(f)
    f = L.Dropout(0.3)(f)
    f = WeightNormalization(L.Dense(64, activation='selu'))(f)

    output_layer_x = L.Dense(1, name='output_x')(x)
    output_layer_y = L.Dense(1, name='output_y')(y)
    output_layer_f = L.Dense(11, activation='softmax', name='output_f')(f)
    model = M.Model([input_embd_layer, input_layer, input_site_layer],
                    [output_layer_x, output_layer_y, output_layer_f])

    model.compile(optimizer=tf.optimizers.Adam(lr=0.005),
                  loss={'output_x': root_mean_squared_error,
                        'output_y': root_mean_squared_error,
                        'output_f': 'categorical_crossentropy'},
                  loss_weights={'output_x': 0.5, 'output_y': 0.5, 'output_f': 15},
                  metrics={'output_x': metrics.RootMeanSquaredError(),
                           'output_y': metrics.RootMeanSquaredError(),
                           'output_f': metrics.CategoricalAccuracy()})
    callback = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=35)
    model.fit(x=[train[BSSID_FEATS], train[RSSI_FEATS], train['site_id']],
              y=[train_x, train_y, train_f], shuffle=True, use_multiprocessing=False,
              validation_data=([val[BSSID_FEATS], val[RSSI_FEATS], val['site_id']],
                               [val_x, val_y, val_f]), batch_size=BATCH_SIZE, epochs=EPOCHS,
              callbacks=[callback], verbose=True)
    keras.backend.clear_session()
    preds = model.predict([df_test[BSSID_FEATS], df_test[RSSI_FEATS], df_test['site_id']])
    pr_x, pr_y = preds[0], preds[1]
    preds_x += pr_x.reshape(pr_x.shape[0]) / N_FOLDS
    preds_y += pr_y.reshape(pr_x.shape[0]) / N_FOLDS
    floor_preds += preds[2]
final_preds = np.argmax(floor_preds, axis=1) - 2
df_test['x'] = preds_x
df_test['y'] = preds_y
df_test['floor'] = final_preds
df_submission.update(df_test)
df_submission.to_csv('submission.csv')

### Results

As reported in the metrics the floor accuracy in the validation set is >99.6% with no post-process at all.

Public LB: 

I leave the post-processing for the reader.
Some great contributions that can boost up significatively the results are:
* [Indoor Navigation - "Snap to Grid" Post Processing](https://www.kaggle.com/robikscube/indoor-navigation-snap-to-grid-post-processing)
* [Postprocessing based on leakage](https://www.kaggle.com/tomooinubushi/postprocessing-based-on-leakage)
* [indoor - Post-processing by Cost Minimization](https://www.kaggle.com/saitodevel01/indoor-post-processing-by-cost-minimization)
* [with magn - Cost Minimization](https://www.kaggle.com/museas/with-magn-cost-minimization)

Thank you for reading and good luck!
