<a href="https://colab.research.google.com/github/mahtabhossain/Indoor-Localisation/blob/master/ProbabilityComputation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Hide all warning messages
import warnings
warnings.filterwarnings('ignore')

  import pandas.util.testing as tm


<h1>1. Input Preprocessing</h1>
Read the training dataset. Then, from the unique location identifier from the location attributes of this dataset - they are FLOOR, BUILDINGID, SPACEID, RELATIVEPOSITION. Combine these four into one uniqe identifier, and associate it to each row of fingerprint (RSS values from wireless APs).

In [2]:
# Reading the input file from the URL of the trainingData
filename = 'http://www.cs.ucc.ie/~mh23/inputData/DoCalculus/trainingData.csv'
df1 = pd.read_csv(filename, ',')
#End of file read part

print (df1.dtypes)  # viewing the actual data types

print ('The number of  features in the data frame is: ',df1.shape[1])
print ('The number of datapoints in the data frame is: ',df1.shape[0])

WAP001              int64
WAP002              int64
WAP003              int64
WAP004              int64
WAP005              int64
                    ...  
SPACEID             int64
RELATIVEPOSITION    int64
USERID              int64
PHONEID             int64
TIMESTAMP           int64
Length: 529, dtype: object
The number of  features in the data frame is:  529
The number of datapoints in the data frame is:  19937


In [3]:
# formulating the column names for the different APs:
# AP1 index is: 0, the output: WAP000
# AP2 index is: 1, the output: WAP002
# .....
def formatWPString(i):
  # if only one number, add two leading zeros, if two, add one leading zero, if three, no zero
  s = 'WAP'+'00'+str(i) if (i < 10) else ('WAP'+'0'+str(i) if (i<100) else 'WAP'+str(i))
  return s

# copying the whole dataset
df2 = df1.copy()
# dropping phoneID, userID and Timestamp columns
df2.drop(df2.columns[df2.shape[1]-3:df2.shape[1]], axis=1, inplace=True)

# FLOOR, BUILDINGID, SPACEID, RELATIVEPOSITION
# indoor location identifier 'location_id' column by combining them
df2['location_id'] = df2[df2.columns[522:526]].apply(
    lambda x: ''.join(x.dropna().astype(str)),
    axis=1
)

# dropping all the other columns except the APs, and indoor location identifier
df2.drop(df2.columns[df2.shape[1]-7:df2.shape[1]-1], axis=1, inplace=True)
# shows the number of unique indoor locations
print('The number of unique locations: ', len(df2['location_id'].unique().tolist()))
print('The dataset shape with only location identifier and AP columns: ', df2.shape)
print('There are %d records, and %d APs across the testbed' % (df2.shape[0], df2.shape[1]-1))

The number of unique locations:  905
The dataset shape with only location identifier and AP columns:  (19937, 521)
There are 19937 records, and 520 APs across the testbed


<h1>2. Probability Computation</h1>
<h2>2.1 First Approach</h2>
<ul>
<li>P(APy)</li>
<li>P(APy | APx)</li>
<li>P(location_id | APx)</li>
<li>P(APy | location_id, APx)</li>
</ul>

In [4]:
## suppose, there are n APs
## their names are ordered here using 1, 2, ...., n
## this is quite an important part to remember: identifier indices for each AP
n = 10  # number of total APs
lst = [x for x in range(n)]

## for each AP, build the dictionary
## histogram is kept as dictionary, e.g., if AP1=-60 is seen 4 times
## and AP1=-90 is seen 8 times, the following is the resultant dictionary for histogram
## do not consider where the values are 100: it means the AP is not seen
AP1 = {-60:4, -90:8}    # AP1 : apriori values
sumAP1 = sum(AP1.values()) # keeping track the total number of times AP1 is seen

## Also required dictionary for each AP when it is seen together with AP1
## AP2 values histogram when it is seen together with AP1
AP21 = {-60:2, -70:1} # AP2 given AP1: posteriori values
sumAP21 = sum(AP21.values()) # keeping track the total number of times AP1 & AP2 seen together

## resultant data structure
posteriori = dict() # stores all the above calculated histograms....
posteriori[0] = dict()  # stores the values where AP1 is seen

# first the apriori always as [x][x] index
# for AP1: it is [0][0], for AP2, it will be [1][1], etc.
posteriori[0][0] = [sumAP1, AP1]    # keeping as list: not as tuple - just to be safe if they need to be changed later
posteriori[0][1] = [sumAP21, AP21] 
# ..... repeat for all the other APs where it can be seen together with AP1

## this process has to be repeated for all APs
## after which P(APx=-60) and P(APx=-60|APy) can be computed from posteriori variable
print(posteriori)

{0: {0: [12, {-60: 4, -90: 8}], 1: [3, {-60: 2, -70: 1}]}}


In [5]:
## suppose, there are L locations
## for each AP, build the dictionary for P(Location_1|APx), P(Location_2|APx)......P(Location_L|APx)
AP1 = {'Location_1':2, 'Location_2':10}
sumAP1 = sum(AP1.values()) # keeping track the total number of times AP1 across all rooms

## resultant data structure for locations....
location = dict() # stores all the above calculated histograms....
location[0] = [sumAP1, AP1]    # keeping as list: not as tuple - just to be safe if they need to be changed later
# ..... repeat for all the locations where it can be seen together with AP1

## this process has to be repeated for all APs
## after which P(Room1|APx) can be computed from location variable
print(location)

## it might be even concatenated with the posteriori variable above
## may be keep it separate for simplicity

{0: [12, {'Location_1': 2, 'Location_2': 10}]}


In [0]:
## P(APy = -60 | Room1, APx = -70) type probability needs to be computed....
## may be a function
## def findProb(location, APx-identifier, APx-value, APy-identifier, APy-value):
##    need to utilise groupby, etc.
##    return the probability given the five arguments

<h2>2.2 Second Approach</h2>