# Architectural Decisions Document
## Location identification by using RSSI from beacons
### By: Luis Carlos Manrique Ruiz


## Objective
The goal of this project is to predict the location of a user by using a smartphone and recording the RSSI signals from different beacons.


## What is the RSSI?
*The Received Signal Strength Indicator (RSSI) is a measure of the power level at the receiver. When a device scans for Bluetooth devices, the Bluetooth radio inside the device provides a measurement of the RSSI for each seen device. It's measured in decibels, dBm, on a logarithmic scale and is negative. A more negative number indicates the device is further away. For example, a value of -20 to -30 dBm indicates the device is close while a value of -120 indicates the device is near the limit of detection*.[https://www.bluetoothle.wiki/rssi]

In our case, we expected an RSSI between -40 to -60 dBm in each room when the user was recording the experiments. And of course, the RSSI was weakened from remote locations due to physical constraints.

## 1. Data source
The data was collected on my own by using BLE beacons placed in my own apartment. In the beginning, 1 beacon was placed in the study room, kitchen, main room, and living room. However, it was seen that 1 beacon in these areas was not enough to predict the location of the user. For that reason, a second beacon is placed in each area and thus the RSSI was high in the areas where the user took the information.

## 2. Enterprise data
I took the data by using a free Android app that allowed me to read the RSSI from different sources. It was read every 100ms and exported as a CSV file.
Later, the raw data was stored in a folder on Google Drive with a structured name that provide the experiment number, user location, and date.

Given that amount of data was not big, at this moment it was not required to store it in a different way. However, I have done experiments with this type of data and the data was stored in a PostgreSQL database.

## 3. Streaming analytics
The Exploratory Data Analysis (EDA) is done in a Jupyter notebook since we don’t require to stream analytics in another way at the moment. However, they could be streamed by using a database and by creating some dashboards too.

## 4. Data Integration
As well as the EDA, the data integration is performed in a Jupyter notebook. I merged the data frames and studied the collected distribution from a beacon in order to see what kind of behavior I could expect from the data.
I did not require other software different than python or google drive to store the data and start its transformation process.

## 5. Data Repository
The notebooks are placed in a Github repository under this structure:

**Repository:**
- Experiments_1_beacon
- Experiments_multiple_beacons

A deeper analysis of EDA as well as the data transformation such as the loess smoothing process among other methods was conducted and located in the Experiments_1_beacon folder.


## 6. Discovery and exploration
### Technology guidelines

In order to conduct the discovery and exploration, Jupyter notebooks are written by using Python, scikit-learn, pandas, matplotlib, seaborn among other libraries.

###  Architectural decision guidelines
Python, pandas, scikit-learn, as well as Keras and TensorFlow libraries, were necessary to create the models.

Some questions can be clear here:

#### Why have I chosen a specific method for data quality assessment?
I analyzed different parts for the data quality assessment. First of all, I checked for the Missing values and replaced those with median RSSI depending on the group. This is because the reading from the different beacons gave us some clues about the user location. Data types were important here because we had our information each 100ms however it is not necessary to know the location of the user by this amount of time, but for every second. Outliers were considered because they affected the prediction related to the location of the user. For that reason, smoothing methods were applied but a non-parametric one was chosen because of its simplicity.

#### Why have I chosen a specific method for feature engineering?
I have chosen methods such as the loess smoothing method since it is a nonparametric algorithm and it was easy to tune for our expected data. By using this final method it was possible to remove outliers and clean the information. Also, the frequency was changed by a second instead of 100 ms, and the median was taken as well to preserve the shape and behavior of the signal

#### Why have I chosen a specific algorithm?
There are multiple algorithms to predict the location of someone inside indoors by using the RSSI. Algorithms such as triangulation, however, don’t behave properly when there are no cleaning or smoothing methods applied. In other words, when there is a high presence of noise. 
For that reason first of all a baseline model is applied, a decision tree algorithm for classification. And although it behave well in the training process it did not work well in the testing dataset. Then, later a deep neural network for classification was applied and it did work well by using 2 beacons in each area. One beacon was not enough in 1 area and even though the information was cleaned and the model was changed to improve the accuracy and other metrics it did not perform well. The results of these experiments are located at:
*Experiments_1_beacon*

####  Why have I chosen a specific framework?
A Keras framework was chosen since it was easy to create the different layers in a deep learning algorithm. Also, a callouts argument was introduced in order to stop the iterations when the model did not improve its performance. Although the author of this experiment is more familiar with pytorch, it was interesting how to create the model, add some layers and continually improve the performance of the testing dataset.

Different tests were conducted by changing the structure of the model and evaluating the results by calculating specific metrics.

#### Why have I chosen a specific model performance indicator?
Since this is a multi-classification problem I chose metrics such as accuracy, F1, and balanced accuracy score too to evaluate the performance in the testing dataset.

## 7. Applications and data products
As a data product, I exported the model as an h5 file and it can be used by using the same number of features and characteristics to predict the location of the user.

## 8. Security, information governance, and systems management
At this moment we are not sharing any personal or private information. However, if in the future that data needs to be included it shall be encrypted and placed on specific servers to prevent any kind of leakage.

## 9. Summary
This project allows me to identify the location of a user in a specific area by using BLE beacons and their respective RSSI. Also, to learn more about the architectural decisions needed to create and conclude a project in the best way taking into account all the aspects from end-to-end.


