<div>
    <h1 align="center">Cost Minimization & Floor - Part(1)</h1></h1>
    <h2 align="center">Identify the position of a smartphone in a shopping mall</h2>
    <h3 align="center">By: Somayyeh Gholami & Mehran Kazeminia</h3>
</div>

<div class="alert alert-success">  
</div>

# Description:

### - In this notebook (No. 1), we used the following magic notebook for "Cost Minimization".

https://www.kaggle.com/saitodevel01/indoor-post-processing-by-cost-minimization

### - We used the following creative notebook for "Fix the floor prediction".

https://www.kaggle.com/nigelhenry/simple-99-accurate-floor-model

### - In this notebook (No. 1), we improve the results of ten public notebooks by the above methods. Then in the notebook (No. 2) we will use "Ensembling" and "Comparative Method". Finally, the so-called "Snap to Grid" notebook (No. 3) produces the final result. Thanks to everyone who shared their notebooks, the addresses of some of the used notebooks are as follows:

https://www.kaggle.com/robikscube/indoor-navigation-snap-to-grid-post-processing

https://www.kaggle.com/therocket290/lstm-unified-wi-fi-training-x-and-y-with-floor

https://www.kaggle.com/kokitanisaka/lstm-by-keras-with-unified-wi-fi-feats

https://www.kaggle.com/oxzplvifi/indoor-gbm-postprocessing-xy-prediction

https://www.kaggle.com/nigelhenry/simple-99-accurate-floor-model

https://www.kaggle.com/hiro5299834/wifi-features-with-lightgbm-kfold

https://www.kaggle.com/ebinan92/time-series-rnn-xy-prediction


### - As we explained in previous notebooks; If you upgrade the score of all notebooks with the "Snap to Grid" method before "Ensembling" and then perform the "Ensembling" operation, all the errors will add up and you will not get a good result. This means using the "Snap to Grid" method only in the last step. But "Cost Minimization" can be done before or after "Ensembling". Of course, as you can see, we do "Cost Minimization" for all results from the beginning.

### =======================================================

### For more information, you can refer to the following address:

https://www.kaggle.com/c/indoor-location-navigation/discussion/230153

## >>> Good Luck <<<


<div class="alert alert-success">  
</div>

# If you find this work useful, please don't forget upvoting :)

<div class="alert alert-success">  
</div>

# Import & Data Set

In [None]:
!git clone --depth 1 https://github.com/location-competition/indoor-location-competition-20 indoor_location_competition_20
    
!rm -rf indoor_location_competition_20/data

In [None]:
import numpy as np
import pandas as pd
import scipy.sparse
import scipy.interpolate

from tqdm import tqdm
import multiprocessing
import matplotlib.pyplot as plt

from indoor_location_competition_20.io_f import read_data_file
import indoor_location_competition_20.compute_f as compute_f

%matplotlib inline

In [None]:
INPUT_PATH = '../input/indoor-location-navigation'

In [None]:
# Kernels Data (Public Score & File Path)

dfk = pd.DataFrame({ 
    'Kernel ID': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],  
    'Score':     [9.773, 9.530, 8.500, 8.418, 8.333, 8.073, 7.745, 7.661, 7.274, 7.285],   
    'File Path': ['../input/indoor9773/Indoor9773.csv', '../input/indoor9530/Indoor9530.csv', '../input/indoor8500/Indoor8500.csv', '../input/indoor8418/Indoor8418.csv', '../input/indoor8333/Indoor8333.csv', '../input/indoor8073/Indoor8073.csv', '../input/indoornav7745sub/submission.csv', '../input/indoor7661/Indoor7661.csv', '../input/indoor-wifi-floor/submission.csv', '../input/time-series-rnn-xy-prediction/submission.csv']     
})    
    
dfk

<div class="alert alert-success">  
</div>

# Functions

These descriptions and codes are copied from the following notebook:

https://www.kaggle.com/saitodevel01/indoor-post-processing-by-cost-minimization


To combine machine learning (wifi features) predictions with sensor data (acceleration, attitude heading),
I defined cost function as follows,
$$
L(X_{1:N}) = \sum_{i=1}^{N} \alpha_i \| X_i - \hat{X}_i \|^2 + \sum_{i=1}^{N-1} \beta_i \| (X_{i+1} - X_{i}) - \Delta \hat{X}_i \|^2
$$
where $\hat{X}_i$ is absolute position predicted by machine learning and $\Delta \hat{X}_i$ is relative position predicted by sensor data.

Since the cost function is quadratic, the optimal $X$ is solved by linear equation $Q X = c$
, where $Q$ and $c$ are derived from above cost function.
Because the matrix $Q$ is tridiagonal,
each machine learning prediction is corrected by *all* machine learning predictions and sensor data.

The optimal hyperparameters ($\alpha$ and $\beta$) can be estimated by expected error of machine learning and sensor data,
or just tuned by public score.

In [None]:
def compute_rel_positions(acce_datas, ahrs_datas):
    
    step_timestamps, step_indexs, step_acce_max_mins = compute_f.compute_steps(acce_datas)
    headings = compute_f.compute_headings(ahrs_datas)
    stride_lengths = compute_f.compute_stride_length(step_acce_max_mins)
    step_headings = compute_f.compute_step_heading(step_timestamps, headings)
    rel_positions = compute_f.compute_rel_positions(stride_lengths, step_headings)
    
    return rel_positions

In [None]:
def correct_path(args):
    path, path_df = args
    
    T_ref  = path_df['timestamp'].values
    xy_hat = path_df[['x', 'y']].values
    
    example = read_data_file(f'{INPUT_PATH}/test/{path}.txt')
    rel_positions = compute_rel_positions(example.acce, example.ahrs)
    if T_ref[-1] > rel_positions[-1, 0]:
        rel_positions = [np.array([[0, 0, 0]]), rel_positions, np.array([[T_ref[-1], 0, 0]])]
    else:
        rel_positions = [np.array([[0, 0, 0]]), rel_positions]
    rel_positions = np.concatenate(rel_positions)
    
    T_rel = rel_positions[:, 0]
    delta_xy_hat = np.diff(scipy.interpolate.interp1d(T_rel, np.cumsum(rel_positions[:, 1:3], axis=0), axis=0)(T_ref), axis=0)

    N = xy_hat.shape[0]
    delta_t = np.diff(T_ref)
    alpha = (8.1)**(-2) * np.ones(N)
    beta  = (0.30 + 0.30 * 1e-3 * delta_t)**(-2)
    A = scipy.sparse.spdiags(alpha, [0], N, N)
    B = scipy.sparse.spdiags( beta, [0], N-1, N-1)
    D = scipy.sparse.spdiags(np.stack([-np.ones(N), np.ones(N)]), [0, 1], N-1, N)

    Q = A + (D.T @ B @ D)
    c = (A @ xy_hat) + (D.T @ (B @ delta_xy_hat))
    xy_star = scipy.sparse.linalg.spsolve(Q, c)

    return pd.DataFrame({
        'site_path_timestamp' : path_df['site_path_timestamp'],
        'floor' : path_df['floor'],
        'x' : xy_star[:, 0],
        'y' : xy_star[:, 1],
    })


<div class="alert alert-success">  
</div>

# Kernel: A

https://www.kaggle.com/deepijongwonkim/wifi-features-neural-networks-starter

Public Score: 9.773

In [None]:
sub = pd.read_csv(dfk.iloc[0, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('a9773cm99.csv', index=False)

a9773cm99 = sub

### After "Cost Minimization" & "Fix the floor prediction"

### a9773cm99.csv |  Public Score: 7.110

<div class="alert alert-success">  
</div>

# Kernel: B

https://www.kaggle.com/byfone/indoor-location-wi-fi-features-catboost-starter

Public Score: 9.530

In [None]:
sub = pd.read_csv(dfk.iloc[1, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('b9530cm99.csv', index=False)

b9530cm99 = sub

### After "Cost Minimization" & "Fix the floor prediction"

### b9530cm99.csv |  Public Score: 6.674

<div class="alert alert-success">  
</div>

# Kernel: C

https://www.kaggle.com/hiro5299834/wifi-features-with-lightgbm-groupkfold

Public Score: 8.500

In [None]:
sub = pd.read_csv(dfk.iloc[2, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('c8500cm99.csv', index=False)

c8500cm99 = sub

### After "Cost Minimization" & "Fix the floor prediction"

### c8500cm99.csv |  Public Score: 6.290


<div class="alert alert-success">  
</div>

# Kernel: D

https://www.kaggle.com/hiro5299834/wifi-features-with-lightgbm-and-xgboost-kfold

Public Score: 8.418

In [None]:
sub = pd.read_csv(dfk.iloc[3, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('d8418cm99.csv', index=False)

d8418cm99 = sub

### After "Cost Minimization" & "Fix the floor prediction"

### d8418cm99.csv |  Public Score: 6.189

<div class="alert alert-success">  
</div>

# Kernel: E

https://www.kaggle.com/hiro5299834/wifi-features-with-lightgbm-kfold

Public Score: 8.333

In [None]:
sub = pd.read_csv(dfk.iloc[4, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('e8333cm99.csv', index=False)

e8333cm99 = sub

### After "Cost Minimization" & "Fix the floor prediction"

### e8333cm99.csv |  Public Score: 6.077


<div class="alert alert-success">  
</div>

# Kernel: F

https://www.kaggle.com/nigelhenry/simple-99-accurate-floor-model

Public Score: 8.073

In [None]:
sub = pd.read_csv(dfk.iloc[5, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
# simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

# sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('f8073cm99.csv', index=False)

f8073cm99 = sub

### After "Cost Minimization" & "Fix the floor prediction"

### f8073cm99.csv |  Public Score: 6.062

<div class="alert alert-success">  
</div>

# Kernel: G

https://www.kaggle.com/oxzplvifi/indoor-gbm-postprocessing-xy-prediction

Public Score: 7.745

In [None]:
sub = pd.read_csv(dfk.iloc[6, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('g7745cm99.csv', index=False)

g7745cm99 = sub

### After "Cost Minimizat" & "Fix the floor prediction"

### g7745cm99.csv |  Public Score: 5.995

<div class="alert alert-success">  
</div>

# Kernel: H

https://www.kaggle.com/kokitanisaka/lstm-by-keras-with-unified-wi-fi-feats

Public Score: 7.661

In [None]:
sub = pd.read_csv(dfk.iloc[7, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('h7661cm99.csv', index=False)

h7661cm99 = sub

### After "Cost Minimizat" & "Fix the floor prediction"

### h7661cm99.csv |  Public Score: 5.694


<div class="alert alert-success">  
</div>

# Kernel: I

https://www.kaggle.com/therocket290/lstm-unified-wi-fi-training-x-and-y-with-floor

Public Score: 7.274

In [None]:
sub = pd.read_csv(dfk.iloc[8, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('i7274cm99.csv', index=False)

i7274cm99 = sub

### After "Cost Minimizat" & "Fix the floor prediction"

### i7274cm99.csv |  Public Score: 5.471

<div class="alert alert-success">  
</div>

# Kernel: J

https://www.kaggle.com/ebinan92/time-series-rnn-xy-prediction

Public Score: 7.285

In [None]:
sub = pd.read_csv(dfk.iloc[9, 2])

tmp = sub['site_path_timestamp'].apply(lambda s : pd.Series(s.split('_')))
sub['site'] = tmp[0]
sub['path'] = tmp[1]
sub['timestamp'] = tmp[2].astype(float)

processes = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=processes) as pool:
    dfs = pool.imap_unordered(correct_path, sub.groupby('path'))
    dfs = tqdm(dfs)
    dfs = list(dfs)    
sub = pd.concat(dfs).sort_values('site_path_timestamp')

### Fix the floor prediction

In [None]:
simple_accurate_99 = pd.read_csv(dfk.iloc[5, 2])

sub['floor'] = simple_accurate_99['floor'].values

sub.to_csv('j7285cm99.csv', index=False)

j7285cm99 = sub

### After "Cost Minimizat" & "Fix the floor prediction"

### j7285cm99.csv |  Public Score: 5.847

<div class="alert alert-success">  
</div>

# Results

In [None]:
gfk = pd.DataFrame({ 
    
    'Kernel ID'   : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],  
    
    'Before Score': [9.773, 9.530, 8.500, 8.418, 8.333, 8.073, 7.745, 7.661, 7.274, 7.285], 
    
    'After Score' : [7.110, 6.674, 6.290, 6.189, 6.077, 6.062, 5.995, 5.694, 5.471, 5.847], 
    
    'File Name'   : ['a9773cm99.csv', 'b9530cm99.csv', 'c8500cm99.csv', 'd8418cm99.csv', 'e8333cm99.csv', 'f8073cm99.csv', 'g7745cm99.csv', 'h7661cm99.csv', 'i7274cm99.csv', 'j7285cm99.csv'] 
    
})    
    
gfk

<div class="alert alert-success">  
</div>

# Submission

In [None]:
sub = i7274cm99
sub.to_csv("submission.csv", index=False)

a9773cm99.to_csv("a9773cm99.csv", index=False)
b9530cm99.to_csv("b9530cm99.csv", index=False)
c8500cm99.to_csv("c8500cm99.csv", index=False)
d8418cm99.to_csv("d8418cm99.csv", index=False)
e8333cm99.to_csv("e8333cm99.csv", index=False)
f8073cm99.to_csv("f8073cm99.csv", index=False)
g7745cm99.to_csv("g7745cm99.csv", index=False)
h7661cm99.to_csv("h7661cm99.csv", index=False)
i7274cm99.to_csv("i7274cm99.csv", index=False)
j7285cm99.to_csv("j7285cm99.csv", index=False)

!ls

<div class="alert alert-success">  
</div>