# Indoor localization

An indoor positioning system (IPS) is a system to locate objects or people inside a building using radio waves, magnetic fields, acoustic signals, or other sensory information collected by mobile devices. There are several commercial systems on the market, but there is no standard for an IPS system.

IPSes use different technologies, including distance measurement to nearby anchor nodes (nodes with known positions, e.g., WiFi access points), magnetic positioning, dead reckoning. They either actively locate mobile devices and tags or provide ambient location or environmental context for devices to get sensed.

According to the [report](https://www.marketsandmarkets.com/Market-Reports/indoor-positioning-navigation-ipin-market-989.html), the global indoor location market size is expected to grow from USD 7.11 Billion in 2017 to USD 40.99 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 42.0% during the forecast period. Hassle-free navigation, improved decision-making, and increased adoption of connected devices are boosting the growth of the indoor location market across the globe.

In this problem, you are going to use signals from seven different wi-fi access points to define in which room the user is located.

In [88]:
import pandas
import numpy as np
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV   #Perforing grid search

Loading the data and breaking it into training and cross-validation sets.

In [89]:
train_set = pandas.read_csv('train_set.csv')
cv_set = pandas.read_csv('cv_set.csv')

train_data = train_set[['wifi'+str(i) for i in range(1, len(train_set.columns) - 1)]]
train_labels = train_set['room']
cv_data = cv_set[['wifi'+str(i) for i in range(1, len(cv_set.columns) - 1)]]
cv_labels = cv_set['room']

In [90]:
print(train_data[:10])
print(train_labels[:10])


   wifi1  wifi2  wifi3  wifi4  wifi5  wifi6  wifi7
0    -68    -57    -61    -65    -71    -85    -85
1    -63    -60    -60    -67    -76    -85    -84
2    -61    -60    -68    -62    -77    -90    -80
3    -65    -61    -65    -67    -69    -87    -84
4    -61    -63    -58    -66    -74    -87    -82
5    -62    -60    -66    -68    -80    -86    -91
6    -65    -59    -61    -67    -72    -86    -81
7    -63    -57    -61    -65    -73    -84    -84
8    -66    -60    -65    -62    -70    -85    -83
9    -67    -60    -59    -61    -71    -86    -91
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
9    1
Name: room, dtype: int64


In [91]:
print(cv_data[:10])
print(cv_labels[:10])

   wifi1  wifi2  wifi3  wifi4  wifi5  wifi6  wifi7
0    -64    -56    -61    -66    -71    -82    -81
1    -63    -65    -60    -63    -77    -81    -87
2    -64    -55    -63    -66    -76    -88    -83
3    -65    -60    -59    -63    -76    -86    -82
4    -67    -61    -62    -67    -77    -83    -91
5    -61    -59    -65    -63    -74    -89    -87
6    -63    -56    -63    -65    -72    -82    -89
7    -66    -59    -64    -68    -68    -97    -83
8    -67    -57    -64    -71    -75    -89    -87
9    -63    -57    -59    -67    -71    -82    -93
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    1
8    1
9    1
Name: room, dtype: int64


In [92]:
print(len(train_labels))
print(len(cv_labels))

1603
397


### Training XGBoost regressor

In [102]:
import warnings
warnings.filterwarnings('ignore')

In [117]:
# fit XGBoost regressor without parameters to training data

model = XGBClassifier()
model.fit(train_data, train_labels)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
       n_jobs=1, nthread=None, objective='multi:softprob', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1)

In [118]:
def predict(model):
    # make predictions for CV data
    pred_labels = model.predict(cv_data)

    # evaluate predictions
    accuracy = accuracy_score(cv_labels, pred_labels)
    print("Accuracy: %.2f%%" % (accuracy * 100.0))

In [119]:
predict(model)

Accuracy: 98.24%


### Tuning hyperparameters

In [174]:
# Here I was playing with aprameters. I know that GridSearchCV does CV on the given set whereas 
# in our task we need to maximize the accuracy on a given CV data set
# But at least by GridSearchCV a had a more narrow place, where to find optimal parameters

param_test1 = {
 'learning_rate':np.arange(0.5,1.0,0.02)
}

param_test2 = {
 'subsample':np.arange(0.5,1.01,0.02)
}

param_test3 = {
 'max_depth':range(3,10,2),
 'min_child_weight':range(1,6,2)
}

param_test4 = {
 'gamma':[i/10.0 for i in range(0,5)]
}

In [175]:
gsearch = GridSearchCV(estimator = XGBClassifier(n_estimators=200, learning_rate=0.82, seed = 123, subsample = 1),
                      param_grid = param_test4,
                      scoring = 'accuracy')

gsearch.fit(train_data,train_labels)
gsearch.grid_scores_, gsearch.best_params_, gsearch.best_score_

([mean: 0.97442, std: 0.00781, params: {'gamma': 0.0},
  mean: 0.97567, std: 0.00608, params: {'gamma': 0.1},
  mean: 0.97193, std: 0.00662, params: {'gamma': 0.2},
  mean: 0.97442, std: 0.00533, params: {'gamma': 0.3},
  mean: 0.97380, std: 0.00547, params: {'gamma': 0.4}],
 {'gamma': 0.1},
 0.975670617592015)

In [186]:
#the final resut after I've played around with parameteres

tuned_model =  XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.82, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=200,
       n_jobs=1, nthread=None, objective='multi:softprob', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=123,
       silent=True, subsample=1)
           
tuned_model.fit(train_data, train_labels)

predict(tuned_model)

Accuracy: 99.24%
