# Location Pining Using Wifi Singal Strength

We use the data set from Kaggle database UjiIndoorLoc (https://www.kaggle.com/giantuji/UjiIndoorLoc).

The dataset consists of wifi signal strength from 520 access points. They are used as inputs for a model to locate the mobile device in a building complex. The outputs of the model are longitude, latitude, building number and floor number. 

The code below can be found in Train.py and Validate.py for commandline execution.

## Imput Necessary Packages

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor as KNR
from sklearn.linear_model import LinearRegression as LR
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeClassifier as DTC
from sklearn.tree import DecisionTreeRegressor as DTR
from sklearn.ensemble import RandomForestClassifier as RFC
from sklearn.ensemble import RandomForestRegressor as RFR
from sklearn.ensemble import GradientBoostingClassifier as GBC
from sklearn.ensemble import GradientBoostingRegressor as GBR
from sklearn.ensemble import AdaBoostClassifier as ABC
from sklearn.ensemble import AdaBoostRegressor as ABR
from xgboost import XGBRegressor as XGR
from xgboost import XGBClassifier as XGC
from sklearn.neural_network import MLPRegressor as NNR
from sklearn.neural_network import MLPClassifier as NNC



## Load the Data and Analysis

In [2]:
df=pd.read_csv('trainingData.csv')

In [3]:
df.describe().iloc[0].unique()

array([19937.])

In [4]:
df.describe()

Unnamed: 0,WAP001,WAP002,WAP003,WAP004,WAP005,WAP006,WAP007,WAP008,WAP009,WAP010,...,WAP520,LONGITUDE,LATITUDE,FLOOR,BUILDINGID,SPACEID,RELATIVEPOSITION,USERID,PHONEID,TIMESTAMP
count,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,...,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0,19937.0
mean,99.823644,99.820936,100.0,100.0,99.613733,97.130461,94.733661,93.820234,94.693936,99.163766,...,100.0,-7464.275947,4864871.0,1.674575,1.21282,148.429954,1.833024,9.068014,13.021869,1371421000.0
std,5.866842,5.798156,0.0,0.0,8.615657,22.93189,30.541335,33.010404,30.305084,12.634045,...,0.0,123.40201,66.93318,1.223078,0.833139,58.342106,0.372964,4.98872,5.36241,557205.4
min,-97.0,-90.0,100.0,100.0,-97.0,-98.0,-99.0,-98.0,-98.0,-99.0,...,100.0,-7691.3384,4864746.0,0.0,0.0,1.0,1.0,1.0,1.0,1369909000.0
25%,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,-7594.737,4864821.0,1.0,0.0,110.0,2.0,5.0,8.0,1371056000.0
50%,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,-7423.0609,4864852.0,2.0,1.0,129.0,2.0,11.0,13.0,1371716000.0
75%,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,-7359.193,4864930.0,3.0,2.0,207.0,2.0,13.0,14.0,1371721000.0
max,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,-7300.81899,4865017.0,4.0,2.0,254.0,2.0,18.0,24.0,1371738000.0


In [5]:
df_ind=df.groupby(['LONGITUDE','LATITUDE','FLOOR','BUILDINGID','SPACEID']).size().reset_index()
df_ind=df_ind.reset_index().drop(0,axis=1)
df_ind.shape

(933, 6)

In [6]:
df.groupby(['LONGITUDE','LATITUDE','FLOOR']).size().shape

(933,)

In [7]:
df.groupby(['BUILDINGID','FLOOR','SPACEID']).size().shape

(735,)

We notice that, the number of combinations of longitude, latitude and floornumber (3D coordinates) is limited to only 933. It is determined by the way data is collected. Realistically, we will build regression models on longitude and latitude, and classification models on building number and floor number.


One-hot coding on the building number. However we keep the floor number as is since it is an ordinal variable.

In [8]:
limiter = 0
df=pd.get_dummies(df, columns=['BUILDINGID'], prefix='BUILDING')
df['BUILDINGID']=df[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
df['BUILDINGID']=df['BUILDINGID'].apply(lambda x: x[-1])
df_X=df.drop(df.columns[-11:], axis=1)
for col in df_X.columns:
    df_X[col]=df_X[col].apply(lambda x:limiter if x==100 else float(x)/100+1)
df_y=df[['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2','BUILDINGID']]


We also transform the signal strength data. Since no signal can be treated as very long distance from the AP, we map it to -1, when other values are shifted to positive side. The function also scales the data to [-1,1]

## Prepare for training

In [10]:
X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.25, random_state=25)
y_test=y_test.reset_index()
y_train=y_train.reset_index()


The dataset also provide a validation set. However, when we investigate, the validation data has greater range of longitude and latitude. So we will use part of the first set as test set to better tune the data. The out of range cases are partially addressed by models that can extrapolate. 

In [11]:
df_val=pd.read_csv('ValidationData.csv')
df_val=pd.get_dummies(df_val, columns=['BUILDINGID'], prefix='BUILDING')
df_val['BUILDINGID']=df_val[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
df_val['BUILDINGID']=df_val['BUILDINGID'].apply(lambda x: x[-1])
df_val_X=df_val.drop(df_val.columns[-11:], axis=1)
for col in df_val_X.columns:
    df_val_X[col]=df_val_X[col].apply(lambda x:limiter if x==100 else float(x)/100+1)
df_val_y=df_val[['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2','BUILDINGID']]

## Building the Models

The plan is to build several basic models using various algorithms. After picking the good performing ones, we will build an ensemble models with the original data and previous prediction as input. The idea is that, by such combination, we will figure out the regions that each basoic model is good at and conbine the strengh of each model into the final prediction.

### Model 0, KNN

Due to the natural of the problem, a similar signal profiles will result in similar prediction of location. This the reason behind our first model. We tried several set of parameter to find the best number of K.

In [12]:
p=1
n_neighbors=3
model_0=KNR(n_neighbors=n_neighbors, weights='uniform',p=p)
model_0.fit(X_train, y_train[['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2']])

y_0=model_0.predict(X_train)
y_0=pd.DataFrame(y_0, columns=['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2'])
y_0['FLOOR_FLOAT']=y_0['FLOOR']
y_0['FLOOR']=y_0['FLOOR'].astype(int)
y_0['BUILDINGID']=y_0[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
y_0['BUILDINGID']=y_0['BUILDINGID'].apply(lambda x: x[-1])

y_prid_0=model_0.predict(X_test)
y_prid_0=pd.DataFrame(y_prid_0, columns=['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2'])
y_prid_0['FLOOR_FLOAT']=y_prid_0['FLOOR']
y_prid_0['FLOOR']=y_prid_0['FLOOR'].astype(int)
y_prid_0['BUILDINGID']=y_prid_0[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
y_prid_0['BUILDINGID']=y_prid_0['BUILDINGID'].apply(lambda x: x[-1])

We use the Scoring function to calculate the accuracy of predictions on the training and test set.

In [15]:
def Scoring(train, test):
    print('Total train:', len(train))
    print((train[['FLOOR','BUILDINGID']]==y_train[['FLOOR','BUILDINGID']]).sum())
    print((train[['FLOOR','BUILDINGID']]==y_train[['FLOOR','BUILDINGID']]).sum()/len(train))

    print('\n')

    print('Total test:', len(test))
    print((test[['FLOOR','BUILDINGID']]==y_test[['FLOOR','BUILDINGID']]).sum())
    print((test[['FLOOR','BUILDINGID']]==y_test[['FLOOR','BUILDINGID']]).sum()/len(test))
    
    return

In [16]:
Scoring(y_0, y_prid_0)

Total train: 14952
FLOOR         14927
BUILDINGID    14951
dtype: int64
FLOOR         0.998328
BUILDINGID    0.999933
dtype: float64


Total test: 4985
FLOOR         4973
BUILDINGID    4985
dtype: int64
FLOOR         0.997593
BUILDINGID    1.000000
dtype: float64


For the regression results, we track both the MSE and the max error on the test set.

In [17]:
def Score_Reg(pred):
    total=0
    maxx=0
    for i in range(pred.shape[0]):
        temp=np.sqrt((pred.iloc[i][0]-y_test.iloc[i][1])**2+(pred.iloc[i][1]-y_test.iloc[i][2])**2+(pred.iloc[i][2]-y_test.iloc[i][3])**2)
        total+=temp
        maxx=max(maxx,temp)
    
    print('Average Error:',total/(i+1))
    print('Max Error:',maxx)
    
    return

In [18]:
Score_Reg(y_prid_0)

Average Error: 0.8856877959572254
Max Error: 38.08066098288826


In [21]:
y_val_prid_0=model_0.predict(df_val_X)
y_val_prid_0=pd.DataFrame(y_val_prid_0, columns=['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2'])
y_val_prid_0['FLOOR_FLOAT']=y_val_prid_0['FLOOR']
y_val_prid_0['FLOOR']=y_val_prid_0['FLOOR'].astype(int)
y_val_prid_0['BUILDINGID']=y_val_prid_0[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
y_val_prid_0['BUILDINGID']=y_val_prid_0['BUILDINGID'].apply(lambda x: x[-1])

In [22]:
def Score_Val(pred):
    print(len(pred))
    print((pred[['FLOOR','BUILDINGID']]==df_val_y[['FLOOR','BUILDINGID']]).sum())
    print((pred[['FLOOR','BUILDINGID']]==df_val_y[['FLOOR','BUILDINGID']]).sum()/len(pred))
    
    total=0
    maxx=0
    for i in range(pred.shape[0]):
        temp=np.sqrt((pred.iloc[i][0]-df_val_y.iloc[i][0])**2+(pred.iloc[i][1]-df_val_y.iloc[i][1])**2+(pred.iloc[i][2]-df_val_y.iloc[i][2])**2)
        total+=temp
        maxx=max(maxx,temp)
    
    print('\n')
    
    print('Average Error:', total/(i+1))
    print('Max Error:', maxx)
    
    return

In [23]:
Score_Val(y_val_prid_0)

1111
FLOOR          982
BUILDINGID    1111
dtype: int64
FLOOR         0.883888
BUILDINGID    1.000000
dtype: float64
Average Error: 7.483081868035265
Max Error: 100.77312128316551


### Model 1, Linear Regression

We consider the linear regression for its extrapolation ability.

In [24]:
model_1_long = Ridge()
model_1_long.fit(X_train, y_train['LONGITUDE'])
y_prid_1_long=model_1_long.predict(X_test)
y_1_long=model_1_long.predict(X_train)

In [25]:
model_1_lat = Ridge()
model_1_lat.fit(X_train, y_train['LATITUDE'])
y_prid_1_lat=model_1_lat.predict(X_test)
y_1_lat=model_1_lat.predict(X_train)

In [26]:
model_1_floor = Ridge()
model_1_floor.fit(X_train, y_train['FLOOR'])
y_prid_1_floor=model_1_floor.predict(X_test)
y_1_floor=model_1_floor.predict(X_train)

In [27]:
model_1_buld0 = Ridge()
model_1_buld0.fit(X_train, y_train['BUILDING_0'])
y_prid_1_buld0=model_1_buld0.predict(X_test)
y_1_buld0=model_1_buld0.predict(X_train)

In [28]:
model_1_buld1 = Ridge()
model_1_buld1.fit(X_train, y_train['BUILDING_1'])
y_prid_1_buld1=model_1_buld1.predict(X_test)
y_1_buld1=model_1_buld1.predict(X_train)

In [29]:
model_1_buld2 = Ridge()
model_1_buld2.fit(X_train, y_train['BUILDING_2'])
y_prid_1_buld2=model_1_buld2.predict(X_test)
y_1_buld2=model_1_buld2.predict(X_train)

In [30]:
y_prid_1=pd.DataFrame({'LONGITUDE': y_prid_1_long,'LATITUDE': y_prid_1_lat,'FLOOR': y_prid_1_floor,'BUILDING_0': y_prid_1_buld0,'BUILDING_1': y_prid_1_buld1,'BUILDING_2': y_prid_1_buld2})

In [31]:
y_prid_1['BUILDINGID']=y_prid_1[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
y_prid_1['BUILDINGID']=y_prid_1['BUILDINGID'].apply(lambda x: x[-1])
y_prid_1['FLOOR_FLOAT']=y_prid_1['FLOOR']
y_prid_1['FLOOR']=y_prid_1['FLOOR'].astype(int)

In [32]:
y_1=pd.DataFrame({'LONGITUDE': y_1_long,'LATITUDE': y_1_lat,'FLOOR': y_1_floor,'BUILDING_0': y_1_buld0,'BUILDING_1': y_1_buld1,'BUILDING_2': y_1_buld2})

In [33]:
y_1['BUILDINGID']=y_1[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
y_1['BUILDINGID']=y_1['BUILDINGID'].apply(lambda x: x[-1])
y_1['FLOOR_FLOAT']=y_1['FLOOR']
y_1['FLOOR']=y_1['FLOOR'].astype(int)

In [34]:
Scoring(y_1, y_prid_1)

Total train: 14952
FLOOR          8009
BUILDINGID    14929
dtype: int64
FLOOR         0.535647
BUILDINGID    0.998462
dtype: float64


Total test: 4985
FLOOR         2602
BUILDINGID    4978
dtype: int64
FLOOR         0.521966
BUILDINGID    0.998596
dtype: float64


In [35]:
Score_Reg(y_prid_1)

Average Error: 8.925230319135412
Max Error: 55.26657246450463


In [36]:
y_val_prid_1_long=model_1_long.predict(df_val_X)
y_val_prid_1_lat=model_1_lat.predict(df_val_X)
y_val_prid_1_floor=model_1_floor.predict(df_val_X)
y_val_prid_1_buld0=model_1_buld0.predict(df_val_X)
y_val_prid_1_buld1=model_1_buld1.predict(df_val_X)
y_val_prid_1_buld2=model_1_buld2.predict(df_val_X)

y_val_prid_1=pd.DataFrame({'LONGITUDE': y_val_prid_1_long,'LATITUDE': y_val_prid_1_lat,'FLOOR': y_val_prid_1_floor,'BUILDING_0': y_val_prid_1_buld0,'BUILDING_1': y_val_prid_1_buld1,'BUILDING_2': y_val_prid_1_buld2})
y_val_prid_1['FLOOR_FLOAT']=y_val_prid_1['FLOOR']
y_val_prid_1['FLOOR']=y_val_prid_1['FLOOR'].astype(int)
y_val_prid_1['BUILDINGID']=y_val_prid_1[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
y_val_prid_1['BUILDINGID']=y_val_prid_1['BUILDINGID'].apply(lambda x: x[-1])

In [37]:
Score_Val(y_val_prid_1)

1111
FLOOR          632
BUILDINGID    1087
dtype: int64
FLOOR         0.568857
BUILDINGID    0.978398
dtype: float64
Average Error: 12.25702987028336
Max Error: 48.85371006733079


### Model 2, Decision Tree

In [39]:
max_depth=100

model_2_long = DTR(max_depth=max_depth)
model_2_long.fit(X_train, y_train['LONGITUDE'])
y_prid_2_long=model_2_long.predict(X_test)
y_2_long=model_2_long.predict(X_train)

model_2_lat = DTR(max_depth=max_depth)
model_2_lat.fit(X_train, y_train['LATITUDE'])
y_prid_2_lat=model_2_lat.predict(X_test)
y_2_lat=model_2_lat.predict(X_train)

model_2_floor = DTC(max_depth=max_depth, )
model_2_floor.fit(X_train, y_train['FLOOR'])
y_prid_2_floor=model_2_floor.predict(X_test)
y_2_floor=model_2_floor.predict(X_train)

model_2_buld = DTC(max_depth=max_depth)
model_2_buld.fit(X_train, y_train['BUILDINGID'])
y_prid_2_buld=model_2_buld.predict(X_test)
y_2_buld=model_2_buld.predict(X_train)

y_prid_2=pd.DataFrame({'LONGITUDE': y_prid_2_long,'LATITUDE': y_prid_2_lat,'FLOOR': y_prid_2_floor,'BUILDINGID': y_prid_2_buld})
y_2=pd.DataFrame({'LONGITUDE': y_2_long,'LATITUDE': y_2_lat,'FLOOR': y_2_floor,'BUILDINGID': y_2_buld})


In [42]:
Scoring(y_2, y_prid_2)

Total train: 14952
FLOOR         14952
BUILDINGID    14952
dtype: int64
FLOOR         1.0
BUILDINGID    1.0
dtype: float64


Total test: 4985
FLOOR         4852
BUILDINGID    4980
dtype: int64
FLOOR         0.973320
BUILDINGID    0.998997
dtype: float64


In [43]:
Score_Reg(y_prid_2)

Average Error: 0.7149691609387997
Max Error: 105.2231519147811


In [44]:
y_val_prid_2_long=model_2_long.predict(df_val_X)
y_val_prid_2_lat=model_2_lat.predict(df_val_X)
y_val_prid_2_floor=model_2_floor.predict(df_val_X)
y_val_prid_2_buld=model_2_buld.predict(df_val_X)

y_val_prid_2=pd.DataFrame({'LONGITUDE': y_val_prid_2_long,'LATITUDE': y_val_prid_2_lat,'FLOOR': y_val_prid_2_floor,'BUILDINGID': y_val_prid_2_buld})
y_val_prid_2['FLOOR_FLOAT']=y_val_prid_2['FLOOR']
y_val_prid_2['FLOOR']=y_val_prid_2['FLOOR'].astype(int)


In [45]:
Score_Val(y_val_prid_2)

1111
FLOOR          885
BUILDINGID    1105
dtype: int64
FLOOR         0.796580
BUILDINGID    0.994599
dtype: float64
Average Error: 7.659473438738563
Max Error: 109.19574738874927


### Model 3, Random Forest

In [46]:
n_estimators=200
max_depth=None

model_3_long = RFR(n_estimators=n_estimators, max_depth=max_depth)
model_3_long.fit(X_train, y_train['LONGITUDE'])
y_prid_3_long=model_3_long.predict(X_test)
y_3_long=model_3_long.predict(X_train)

model_3_lat = RFR(n_estimators=n_estimators, max_depth=max_depth)
model_3_lat.fit(X_train, y_train['LATITUDE'])
y_prid_3_lat=model_3_lat.predict(X_test)
y_3_lat=model_3_lat.predict(X_train)

model_3_floor = RFC(n_estimators=n_estimators, max_depth=max_depth)
model_3_floor.fit(X_train, y_train['FLOOR'])
y_prid_3_floor=model_3_floor.predict(X_test)
y_3_floor=model_3_floor.predict(X_train)

model_3_buld = RFC(n_estimators=n_estimators, max_depth=max_depth)
model_3_buld.fit(X_train, y_train['BUILDINGID'])
y_prid_3_buld=model_3_buld.predict(X_test)
y_3_buld=model_3_buld.predict(X_train)

y_prid_3=pd.DataFrame({'LONGITUDE': y_prid_3_long,'LATITUDE': y_prid_3_lat,'FLOOR': y_prid_3_floor,'BUILDINGID': y_prid_3_buld})
y_3=pd.DataFrame({'LONGITUDE': y_3_long,'LATITUDE': y_3_lat,'FLOOR': y_3_floor,'BUILDINGID': y_3_buld})



In [47]:
Scoring(y_3, y_prid_3)

Total train: 14952
FLOOR         14952
BUILDINGID    14952
dtype: int64
FLOOR         1.0
BUILDINGID    1.0
dtype: float64


Total test: 4985
FLOOR         4981
BUILDINGID    4985
dtype: int64
FLOOR         0.999198
BUILDINGID    1.000000
dtype: float64


In [48]:
Score_Reg(y_prid_3)

Average Error: 0.7855965582503579
Max Error: 70.62044657834707


In [49]:
y_val_prid_3_long=model_3_long.predict(df_val_X)
y_val_prid_3_lat=model_3_lat.predict(df_val_X)
y_val_prid_3_floor=model_3_floor.predict(df_val_X)
y_val_prid_3_buld=model_3_buld.predict(df_val_X)

y_val_prid_3=pd.DataFrame({'LONGITUDE': y_val_prid_3_long,'LATITUDE': y_val_prid_3_lat,'FLOOR': y_val_prid_3_floor,'BUILDINGID': y_val_prid_3_buld})
y_val_prid_3['FLOOR_FLOAT']=y_val_prid_3['FLOOR']
y_val_prid_3['FLOOR']=y_val_prid_3['FLOOR'].astype(int)


In [50]:
Score_Val(y_val_prid_3)

1111
FLOOR         1012
BUILDINGID    1110
dtype: int64
FLOOR         0.910891
BUILDINGID    0.999100
dtype: float64
Average Error: 6.891515553941267
Max Error: 99.57908130611776


### Model 4, Gradient Boosting

In [51]:
n_estimators=200
learning_rate=0.2

model_4_long = GBR(n_estimators=n_estimators, learning_rate=learning_rate)
model_4_long.fit(X_train, y_train['LONGITUDE'])
y_prid_4_long=model_4_long.predict(X_test)
y_4_long=model_4_long.predict(X_train)

model_4_lat = GBR(n_estimators=n_estimators, learning_rate=learning_rate)
model_4_lat.fit(X_train, y_train['LATITUDE'])
y_prid_4_lat=model_4_lat.predict(X_test)
y_4_lat=model_4_lat.predict(X_train)

model_4_floor = GBC(n_estimators=n_estimators, learning_rate=learning_rate)
model_4_floor.fit(X_train, y_train['FLOOR'])
y_prid_4_floor=model_4_floor.predict(X_test)
y_4_floor=model_4_floor.predict(X_train)

model_4_buld = GBC(n_estimators=n_estimators, learning_rate=learning_rate)
model_4_buld.fit(X_train, y_train['BUILDINGID'])
y_prid_4_buld=model_4_buld.predict(X_test)
y_4_buld=model_4_buld.predict(X_train)

y_prid_4=pd.DataFrame({'LONGITUDE': y_prid_4_long,'LATITUDE': y_prid_4_lat,'FLOOR': y_prid_4_floor,'BUILDINGID': y_prid_4_buld})
y_4=pd.DataFrame({'LONGITUDE': y_4_long,'LATITUDE': y_4_lat,'FLOOR': y_4_floor,'BUILDINGID': y_4_buld})

In [52]:
Scoring(y_4, y_prid_4)

Total train: 14952
FLOOR         14950
BUILDINGID    14952
dtype: int64
FLOOR         0.999866
BUILDINGID    1.000000
dtype: float64


Total test: 4985
FLOOR         4975
BUILDINGID    4985
dtype: int64
FLOOR         0.997994
BUILDINGID    1.000000
dtype: float64


In [53]:
Score_Reg(y_prid_4)

Average Error: 4.993340239780464
Max Error: 46.485715109781644


In [54]:
y_val_prid_4_long=model_4_long.predict(df_val_X)
y_val_prid_4_lat=model_4_lat.predict(df_val_X)
y_val_prid_4_floor=model_4_floor.predict(df_val_X)
y_val_prid_4_buld=model_4_buld.predict(df_val_X)

y_val_prid_4=pd.DataFrame({'LONGITUDE': y_val_prid_4_long,'LATITUDE': y_val_prid_4_lat,'FLOOR': y_val_prid_4_floor,'BUILDINGID': y_val_prid_4_buld})
y_val_prid_4['FLOOR_FLOAT']=y_val_prid_4['FLOOR']
y_val_prid_4['FLOOR']=y_val_prid_4['FLOOR'].astype(int)

In [55]:
Score_Val(y_val_prid_4)

1111
FLOOR          998
BUILDINGID    1111
dtype: int64
FLOOR         0.89829
BUILDINGID    1.00000
dtype: float64
Average Error: 9.03088472637858
Max Error: 72.78401532409684


### Model 5, AdaBoost

In [56]:
n_estimators=1000
learning_rate=0.1

model_5_long = ABR(n_estimators=n_estimators, learning_rate=learning_rate)
model_5_long.fit(X_train, y_train['LONGITUDE'])
y_prid_5_long=model_5_long.predict(X_test)
y_5_long=model_5_long.predict(X_train)

model_5_lat = ABR(n_estimators=n_estimators, learning_rate=learning_rate)
model_5_lat.fit(X_train, y_train['LATITUDE'])
y_prid_5_lat=model_5_lat.predict(X_test)
y_5_lat=model_5_lat.predict(X_train)

model_5_floor = ABC(n_estimators=n_estimators, learning_rate=learning_rate)
model_5_floor.fit(X_train, y_train['FLOOR'])
y_prid_5_floor=model_5_floor.predict(X_test)
y_5_floor=model_5_floor.predict(X_train)

model_5_buld = ABC(n_estimators=n_estimators, learning_rate=learning_rate)
model_5_buld.fit(X_train, y_train['BUILDINGID'])
y_prid_5_buld=model_5_buld.predict(X_test)
y_5_buld=model_5_buld.predict(X_train)

y_prid_5=pd.DataFrame({'LONGITUDE': y_prid_5_long,'LATITUDE': y_prid_5_lat,'FLOOR': y_prid_5_floor,'BUILDINGID': y_prid_5_buld})
y_5=pd.DataFrame({'LONGITUDE': y_5_long,'LATITUDE': y_5_lat,'FLOOR': y_5_floor,'BUILDINGID': y_5_buld})

In [57]:
Scoring(y_5, y_prid_5)

Total train: 14952
FLOOR          8396
BUILDINGID    10629
dtype: int64
FLOOR         0.561530
BUILDINGID    0.710875
dtype: float64


Total test: 4985
FLOOR         2800
BUILDINGID    3547
dtype: int64
FLOOR         0.561685
BUILDINGID    0.711535
dtype: float64


In [58]:
Score_Reg(y_prid_5)

Average Error: 20.766527507204
Max Error: 51.13172168896582


In [59]:
y_val_prid_5_long=model_5_long.predict(df_val_X)
y_val_prid_5_lat=model_5_lat.predict(df_val_X)
y_val_prid_5_floor=model_5_floor.predict(df_val_X)
y_val_prid_5_buld=model_5_buld.predict(df_val_X)

y_val_prid_5=pd.DataFrame({'LONGITUDE': y_val_prid_5_long,'LATITUDE': y_val_prid_5_lat,'FLOOR': y_val_prid_5_floor,'BUILDINGID': y_val_prid_5_buld})
y_val_prid_5['FLOOR_FLOAT']=y_val_prid_5['FLOOR']
y_val_prid_5['FLOOR']=y_val_prid_5['FLOOR'].astype(int)

In [60]:
Score_Val(y_val_prid_5)

1111
FLOOR         542
BUILDINGID    971
dtype: int64
FLOOR         0.487849
BUILDINGID    0.873987
dtype: float64
Average Error: 20.067315884807726
Max Error: 56.586676198444756


### Model 6, XGBoost

In [61]:
n_estimators=200
learning_rate = 0.5

model_6_long = XGR(n_estimators=n_estimators, learning_rate=learning_rate)
model_6_long.fit(X_train, y_train['LONGITUDE'])
y_prid_6_long=model_6_long.predict(X_test)
y_6_long=model_6_long.predict(X_train)

model_6_lat = XGR(n_estimators=n_estimators, learning_rate=learning_rate)
model_6_lat.fit(X_train, y_train['LATITUDE'])
y_prid_6_lat=model_6_lat.predict(X_test)
y_6_lat=model_6_lat.predict(X_train)

model_6_floor = XGC(n_estimators=n_estimators, learning_rate=learning_rate)
model_6_floor.fit(X_train, y_train['FLOOR'])
y_prid_6_floor=model_6_floor.predict(X_test)
y_6_floor=model_6_floor.predict(X_train)

model_6_buld = XGC(n_estimators=n_estimators, learning_rate=learning_rate)
model_6_buld.fit(X_train, y_train['BUILDINGID'])
y_prid_6_buld=model_6_buld.predict(X_test)
y_6_buld=model_6_buld.predict(X_train)

y_prid_6=pd.DataFrame({'LONGITUDE': y_prid_6_long,'LATITUDE': y_prid_6_lat,'FLOOR': y_prid_6_floor,'BUILDINGID': y_prid_6_buld})
y_6=pd.DataFrame({'LONGITUDE': y_6_long,'LATITUDE': y_6_lat,'FLOOR': y_6_floor,'BUILDINGID': y_6_buld})

In [62]:
Scoring(y_6, y_prid_6)

Total train: 14952
FLOOR         14951
BUILDINGID    14952
dtype: int64
FLOOR         0.999933
BUILDINGID    1.000000
dtype: float64


Total test: 4985
FLOOR         4977
BUILDINGID    4985
dtype: int64
FLOOR         0.998395
BUILDINGID    1.000000
dtype: float64


In [63]:
Score_Reg(y_prid_6)

Average Error: 4.519621534788449
Max Error: 48.26143965331609


In [64]:
y_val_prid_6_long=model_6_long.predict(df_val_X)
y_val_prid_6_lat=model_6_lat.predict(df_val_X)
y_val_prid_6_floor=model_6_floor.predict(df_val_X)
y_val_prid_6_buld=model_6_buld.predict(df_val_X)

y_val_prid_6=pd.DataFrame({'LONGITUDE': y_val_prid_6_long,'LATITUDE': y_val_prid_6_lat,'FLOOR': y_val_prid_6_floor,'BUILDINGID': y_val_prid_6_buld})
y_val_prid_6['FLOOR_FLOAT']=y_val_prid_6['FLOOR']
y_val_prid_6['FLOOR']=y_val_prid_6['FLOOR'].astype(int)

In [65]:
Score_Val(y_val_prid_6)

1111
FLOOR         1004
BUILDINGID    1111
dtype: int64
FLOOR         0.90369
BUILDINGID    1.00000
dtype: float64
Average Error: 9.195721744393918
Max Error: 64.91842608935353


### Model 7, Neural Network

In [66]:
hidden_layer=(500,100,50)

model_7_long = NNR(solver='adam', hidden_layer_sizes=hidden_layer, activation='relu', alpha=0.001)
model_7_long.fit(X_train, y_train['LONGITUDE'])
y_prid_7_long=model_7_long.predict(X_test)
y_7_long=model_7_long.predict(X_train)

model_7_lat = NNR(solver='adam', hidden_layer_sizes=hidden_layer, activation='relu', alpha=0.01)
model_7_lat.fit(X_train, y_train['LATITUDE'])
y_prid_7_lat=model_7_lat.predict(X_test)
y_7_lat=model_7_lat.predict(X_train)

model_7_floor = NNC(solver='adam', hidden_layer_sizes=hidden_layer, activation='relu')
model_7_floor.fit(X_train, y_train['FLOOR'])
y_prid_7_floor=model_7_floor.predict(X_test)
y_7_floor=model_7_floor.predict(X_train)

model_7_buld = NNC(solver='adam', hidden_layer_sizes=hidden_layer, activation='relu')
model_7_buld.fit(X_train, y_train['BUILDINGID'])
y_prid_7_buld=model_7_buld.predict(X_test)
y_7_buld=model_7_buld.predict(X_train)

y_prid_7=pd.DataFrame({'LONGITUDE': y_prid_7_long,'LATITUDE': y_prid_7_lat,'FLOOR': y_prid_7_floor,'BUILDINGID': y_prid_7_buld})
y_7=pd.DataFrame({'LONGITUDE': y_7_long,'LATITUDE': y_7_lat,'FLOOR': y_7_floor,'BUILDINGID': y_7_buld})

In [67]:
Scoring(y_7, y_prid_7)

Total train: 14952
FLOOR         14912
BUILDINGID    14920
dtype: int64
FLOOR         0.997325
BUILDINGID    0.997860
dtype: float64


Total test: 4985
FLOOR         4958
BUILDINGID    4978
dtype: int64
FLOOR         0.994584
BUILDINGID    0.998596
dtype: float64


In [68]:
Score_Reg(y_prid_7)

Average Error: 13570.246575969204
Max Error: 98857.38551662456


## Model Ensembling

In this section, we build the ensemble model based on the previous predictions. Some of the code is rewritten into functions for cleaner presentation.

In [69]:
def Data_Read(filename='trainingData.csv', limiter=0):
    df=pd.read_csv(filename)
    df=pd.get_dummies(df, columns=['BUILDINGID'], prefix='BUILDING')
    df['BUILDINGID']=df[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
    df['BUILDINGID']=df['BUILDINGID'].apply(lambda x: x[-1])
    df_X=df.drop(df.columns[-12:], axis=1)
    for col in df_X.columns:
        df_X[col]=df_X[col].apply(lambda x:limiter if x==100 else float(x)/100+1)
    df_y=df[['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2','BUILDINGID']]
    
    return df_X, df_y

In [70]:
def Data_Split(df_X, df_y, test_size=0.25, random_state=None):
    X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=test_size, random_state=random_state)

    X_test=X_test.reset_index()
    X_train=X_train.reset_index()
    y_test=y_test.reset_index()
    y_train=y_train.reset_index()
    
    X_test=X_test.drop(['index'], axis=1)
    X_train=X_train.drop(['index'], axis=1)
    y_test=y_test.drop(['index'], axis=1)
    y_train=y_train.drop(['index'], axis=1)
    
    return X_train, X_test, y_train, y_test

In [71]:
def Model_Tree_Work(X_train, X_test, X_val, y_train, model_reg=None, model_clf=None):
    pred_train=pd.DataFrame()
    pred_test=pd.DataFrame()
    pred_val=pd.DataFrame()
    
    for col in ['LONGITUDE','LATITUDE']:
        model_temp=model_reg
        model_temp.fit(X_train, y_train[col])
        pred_train[col] = model_temp.predict(X_train)
        pred_test[col] = model_temp.predict(X_test)
        pred_val[col] = model_temp.predict(X_val)

    for col in ['FLOOR','BUILDINGID']:
        model_temp=model_clf
        model_temp.fit(X_train, y_train[col])
        pred_train[col] = model_temp.predict(X_train)
        pred_test[col] = model_temp.predict(X_test)
        pred_val[col] = model_temp.predict(X_val)
        
    return pred_train, pred_test, pred_val


In [72]:
def KNN_Work(X_train, X_test, X_val, y_train, n_neighbor=1, p=1):
    model_0=KNR(n_neighbors=n_neighbor, weights='uniform',p=p)
    model_0.fit(X_train, y_train[['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2']])

    y_prid_0=model_0.predict(X_test)
    y_prid_0=pd.DataFrame(y_prid_0, columns=['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2'])
    y_prid_0['FLOOR']=y_prid_0['FLOOR'].astype(int)
    y_prid_0['BUILDINGID']=y_prid_0[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
    y_prid_0['BUILDINGID']=y_prid_0['BUILDINGID'].apply(lambda x: x[-1])
    y_prid_0=y_prid_0.drop(['BUILDING_0','BUILDING_1','BUILDING_2'], axis=1)


    y_0=model_0.predict(X_train)
    y_0=pd.DataFrame(y_0, columns=['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2'])
    y_0['FLOOR']=y_0['FLOOR'].astype(int)
    y_0['BUILDINGID']=y_0[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
    y_0['BUILDINGID']=y_0['BUILDINGID'].apply(lambda x: x[-1])
    y_0=y_0.drop(['BUILDING_0','BUILDING_1','BUILDING_2'], axis=1)

    y_val_prid_0=model_0.predict(X_val)
    y_val_prid_0=pd.DataFrame(y_val_prid_0, columns=['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2'])
    y_val_prid_0['FLOOR']=y_val_prid_0['FLOOR'].astype(int)
    y_val_prid_0['BUILDINGID']=y_val_prid_0[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
    y_val_prid_0['BUILDINGID']=y_val_prid_0['BUILDINGID'].apply(lambda x: x[-1])
    y_val_prid_0=y_val_prid_0.drop(['BUILDING_0','BUILDING_1','BUILDING_2'], axis=1)

    return y_0, y_prid_0, y_val_prid_0


In [73]:
def Linear_Reg_Work(X_train, X_test, X_val, y_train):
    pred_train_dic={}
    pred_test_dic={}
    pred_val_dic={}
    
    for col in ['LONGITUDE','LATITUDE','FLOOR','BUILDING_0','BUILDING_1','BUILDING_2']:
        model_temp = Ridge()
        model_temp.fit(X_train, y_train[col])
        pred_train_dic[col] = model_temp.predict(X_train)
        pred_test_dic[col] = model_temp.predict(X_test)
        pred_val_dic[col] = model_temp.predict(X_val)
    
    pred_train=pd.DataFrame(pred_train_dic)
    pred_test=pd.DataFrame(pred_test_dic)
    pred_val=pd.DataFrame(pred_val_dic)

    pred_train['FLOOR']=pred_train['FLOOR'].astype(int)
    pred_train['BUILDINGID']=pred_train[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
    pred_train['BUILDINGID']=pred_train['BUILDINGID'].apply(lambda x: x[-1])
    pred_train=pred_train.drop(['BUILDING_0','BUILDING_1','BUILDING_2'], axis=1)

    pred_test['FLOOR']=pred_test['FLOOR'].astype(int)
    pred_test['BUILDINGID']=pred_test[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
    pred_test['BUILDINGID']=pred_test['BUILDINGID'].apply(lambda x: x[-1])
    pred_test=pred_test.drop(['BUILDING_0','BUILDING_1','BUILDING_2'], axis=1)

    pred_val['FLOOR']=pred_val['FLOOR'].astype(int)
    pred_val['BUILDINGID']=pred_val[['BUILDING_0','BUILDING_1','BUILDING_2']].idxmax(axis=1)
    pred_val['BUILDINGID']=pred_val['BUILDINGID'].apply(lambda x: x[-1])
    pred_val=pred_val.drop(['BUILDING_0','BUILDING_1','BUILDING_2'], axis=1)
    
    return pred_train, pred_test, pred_val


In [74]:
def Score_Work(train, test, val, y_train, y_test, y_val):
    
    print()
    print('Total Trained:', len(y_train))
    print((train[['FLOOR','BUILDINGID']]==y_train[['FLOOR','BUILDINGID']]).sum()/len(y_train))
    total=0
    maxx=0
    for i in range(y_train.shape[0]):
        temp=np.sqrt((train.iloc[i][0]-y_train.iloc[i][0])**2+(train.iloc[i][1]-y_train.iloc[i][1])**2+(train.iloc[i][2]-y_train.iloc[i][2])**2)
        total+=temp
        maxx=max(maxx,temp)
    print('Average Error: %6f' %(total/(i+1)))
    print('Max Error: %6f' %maxx)
    
    print()
    print('Total Tested:', len(y_test))
    print((test[['FLOOR','BUILDINGID']]==y_test[['FLOOR','BUILDINGID']]).sum()/len(y_test))
    total=0
    maxx=0
    for i in range(y_test.shape[0]):
        temp=np.sqrt((test.iloc[i][0]-y_test.iloc[i][0])**2+(test.iloc[i][1]-y_test.iloc[i][1])**2+(test.iloc[i][2]-y_test.iloc[i][2])**2)
        total+=temp
        maxx=max(maxx,temp)
    print('Average Error: %6f' %(total/(i+1)))
    print('Max Error: %6f' %maxx)

    print()
    print('Total Validated:', len(y_val))
    print((val[['FLOOR','BUILDINGID']]==y_val[['FLOOR','BUILDINGID']]).sum()/len(y_val))
    total=0
    maxx=0
    for i in range(y_val.shape[0]):
        temp=np.sqrt((val.iloc[i][0]-y_val.iloc[i][0])**2+(val.iloc[i][1]-y_val.iloc[i][1])**2+(val.iloc[i][2]-y_val.iloc[i][2])**2)
        total+=temp
        maxx=max(maxx,temp)
    print('Average Error: %6f'  %(total/(i+1)))
    print('Max Error: %6f'  %maxx)

    return ((val[['FLOOR','BUILDINGID']]==y_val[['FLOOR','BUILDINGID']]).sum()/len(y_val)).tolist()+[total/(i+1),maxx]

In [75]:
def Rename_Col(df, text):
    for i in df.columns:
        df=df.rename(index=str, columns={i: i+text})
        
    return df

We run a few iterations for accuracy of the model. At the end of each run, we will show the performance of base models and the ensemble one. It is clear that the ensemble model has the best scores for each criteria among base models, which is the purpose of itself.

In [78]:
def main_AD(df_X, df_y, X_val, y_val):
    import numpy as np
    import pandas as pd
    
    import time

    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsRegressor as KNR
    from sklearn.linear_model import Ridge
    from sklearn.tree import DecisionTreeClassifier as DTC
    from sklearn.tree import DecisionTreeRegressor as DTR
    from sklearn.ensemble import RandomForestClassifier as RFC
    from sklearn.ensemble import RandomForestRegressor as RFR
    from sklearn.ensemble import GradientBoostingClassifier as GBC
    from sklearn.ensemble import GradientBoostingRegressor as GBR
    from sklearn.ensemble import AdaBoostClassifier as ABC
    from sklearn.ensemble import AdaBoostRegressor as ABR

    limiter=0
    n_neighbor=1
    p=1
    test_size=0.25
    max_depth_DT=100
    n_estimators=200
    max_depth=None
    learning_rate=0.2
    
    score=pd.DataFrame()
    
    score_col=['LONGITUDE','LATITUDE','FLOOR','BUILDINGID']
    
    start_time=time.time()
    
    X_train, X_test, y_train, y_test = Data_Split(df_X, df_y)

    train=X_train
    test=X_test
    val=X_val
    
    start_time=time.time()
    
    print('KNN Started.')
    a, b, c = KNN_Work(X_train, X_test, X_val, y_train, n_neighbor=n_neighbor, p=p)
    score['KNN'] = Score_Work(a, b, c, y_train[score_col], y_test[score_col], y_val[score_col])
    a=Rename_Col(a, '_KNN')
    b=Rename_Col(b, '_KNN')
    c=Rename_Col(c, '_KNN')
    train=train.join(a.reset_index()[a.columns])
    test=test.join(b.reset_index()[b.columns])
    val=val.join(c.reset_index()[c.columns])
    print()
    print('KNN Done. Time Spend: %.4f sec.' %(time.time()-start_time),'\n')

    start_time=time.time()
    
    print('Random Forest Started.')
    a, b, c = Model_Tree_Work(X_train, X_test, X_val, y_train, model_reg=RFR(n_estimators=n_estimators, max_depth=max_depth), model_clf=RFC(n_estimators=n_estimators, max_depth=max_depth))
    score['RF'] = Score_Work(a, b, c, y_train[score_col], y_test[score_col], y_val[score_col])
    a=Rename_Col(a, '_RF')
    b=Rename_Col(b, '_RF')
    c=Rename_Col(c, '_RF')
    train=train.join(a.reset_index()[a.columns])
    test=test.join(b.reset_index()[b.columns])
    val=val.join(c.reset_index()[c.columns])
    print()
    print('Random Forest done. Time Spend: %.4f sec.' %(time.time()-start_time),'\n')

    start_time=time.time()
    
    print('Gradient Boost Started.')
    a, b, c = Model_Tree_Work(X_train, X_test, X_val, y_train, model_reg=GBR(n_estimators=n_estimators, learning_rate=learning_rate), model_clf=GBC(n_estimators=n_estimators, learning_rate=learning_rate))
    score['GB'] = Score_Work(a, b, c, y_train[score_col], y_test[score_col], y_val[score_col])
    a=Rename_Col(a, '_GB')
    b=Rename_Col(b, '_GB')
    c=Rename_Col(c, '_GB')
    train=train.join(a.reset_index()[a.columns])
    test=test.join(b.reset_index()[b.columns])
    val=val.join(c.reset_index()[c.columns])
    print()
    print('Gradient Boost done. Time Spend: %.4f sec.' %(time.time()-start_time),'\n')
        
    start_time=time.time()
    
    print('XGBoost Started.')
    a, b, c = Model_Tree_Work(X_train, X_test, X_val, y_train, model_reg=XGR(n_estimators=n_estimators, learning_rate=learning_rate), model_clf=XGC(n_estimators=n_estimators, learning_rate=learning_rate))
    score['XG'] = Score_Work(a, b, c, y_train[score_col], y_test[score_col], y_val[score_col])
    a=Rename_Col(a, '_XG')
    b=Rename_Col(b, '_XG')
    c=Rename_Col(c, '_XG')
    train=train.join(a.reset_index()[a.columns])
    test=test.join(b.reset_index()[b.columns])
    val=val.join(c.reset_index()[c.columns])
    print()
    print('Gradient Boost done. Time Spend: %.4f sec.' %(time.time()-start_time),'\n')

    start_time=time.time()
    
    print('Final Ensemble Started.')
    a, b, c = Model_Tree_Work(train, test, val, y_train, 
                              model_reg=ABR(base_estimator=DTR(max_depth=max_depth_DT), n_estimators=n_estimators, learning_rate=learning_rate), 
                              model_clf=ABC(base_estimator=DTC(max_depth=max_depth_DT), n_estimators=n_estimators, learning_rate=learning_rate))
    score['Ensemble'] = Score_Work(a, b, c, y_train[score_col], y_test[score_col], y_val[score_col])
    train=train.join(a.reset_index()[a.columns])
    test=test.join(b.reset_index()[b.columns])
    val=val.join(c.reset_index()[c.columns])
    print()
    print('Final Ensemble done. Time Spend: %.4f sec.' %(time.time()-start_time),'\n')

    print(score)
    return score
    


In [79]:
scores=[]

df_X, df_y = Data_Read(filename='trainingData.csv', limiter=limiter)
X_val, y_val = Data_Read(filename='ValidationData.csv', limiter=limiter)

for i in range(2):
    print('Iteration %d Started' %(i+1), '\n')
    scores.append(main_AD(df_X, df_y, X_val, y_val))
    print('\n')

print(scores)

Iteration 1 Started 

KNN Started.

Total Trained: 14952
FLOOR         0.99806
BUILDINGID    0.99806
dtype: float64
Average Error: 0.956502
Max Error: 333.142710

Total Tested: 4985
FLOOR         0.996790
BUILDINGID    0.998195
dtype: float64
Average Error: 1.814181
Max Error: 224.144655

Total Validated: 1111
FLOOR         0.891089
BUILDINGID    0.983798
dtype: float64
Average Error: 12.186214
Max Error: 335.589136

KNN Done. Time Spend: 83.0644 sec. 

Random Forest Started.

Total Trained: 14952
FLOOR         0.99806
BUILDINGID    0.99806
dtype: float64
Average Error: 2.155462
Max Error: 229.320537

Total Tested: 4985
FLOOR         0.997593
BUILDINGID    0.998195
dtype: float64
Average Error: 4.123891
Max Error: 137.671841

Total Validated: 1111
FLOOR         0.909091
BUILDINGID    0.999100
dtype: float64
Average Error: 13.205934
Max Error: 128.706531

Random Forest done. Time Spend: 165.1995 sec. 

Gradient Boost Started.

Total Trained: 14952
FLOOR         0.99806
BUILDINGID    0.9

# Pickles

We pickles our model for quick running for future predictions.

In [80]:
import pickle

In [81]:
pickle.dump([model_4_long, model_4_lat], open('try_pickle.pkl', 'wb'))

In [82]:
with open('try_pickle.pkl', 'rb') as f:
    tt=pickle.load(f)

In [83]:
tt[1].predict(X_test)

array([4865008.41704583, 4864853.31149589, 4864810.75332708, ...,
       4865000.10990597, 4864943.08910673, 4864760.20718034])

### We further convert the code into two python code, Train.py and Validate.py for direct execution. 