This project focuses on developing a well-evaluated method of training decision trees to high accuracy on WiFi localization datasets (included in wifidb/).
Requirements are numpy and matplotlib (outlined in requirements.txt) and Python 3 (we recommend 3.8).
The script takes a single parameter (a commandline argument, the path for the dataset to use).
The syntax is as follows:
python3 main.py path/to/dataset.txtTo run training & evaluation on both datasets, use:
python3 main.py wifidb/clean_dataset.txt
python3 main.py wifidb/noisy_dataset.txtNote: on certain machines the visualization may display an erroneous GDK asertion error. Please ignore this; the visualization uses the AGG backend but this does not seem to be correctly recognized by matplotlib, so the error message is still displayed although the visualization and script will run consistently and correctly regardless.
The training utility is provided in train.py.
This can be accessed programmatically (separately from the full pipeline) via the train function.
The train function's exact nature is described further in train.py -- in summary, it requires simply the numpy parsed dataset, and will return both the tree and its total depth.
Example:
from train import train
data = np.loadtxt('path/to/my/dataset.txt')
tree, depth = train(data)The prediction utility is provided in evaluate.py.
This can be accessed programmatically via the predict function, which is described in evaluate.py -- in summary, it requires the input features (wifi signals) and a tree, and will return the predicted Room.
Example:
from evaluate import predict
...
tree = ...
X = ...
Y = predict(X, tree)The evaluation utility is provided in evaluate.py.
This can be accessed programmatically via the evaluate and calculate_measures functions.
These are described in evaluate.py. In summary, evaluate requires a testing dataset and the trained tree as input, and will output both the accuracy and confusion matrix. calculate_measures simply requires the confusion matrix, and will print out relevant metrics.
Example:
from evaluate import evaluate calculate_measures
...
tree = ...
test_data = ...
accuracy, confusion_matrix = evaluate(test_data, tree)
calculate_measures(confusion_matrix) # print out relevant metrics, e.g. F1, recall, precisionThis repository only has two requirements (numpy and matplotlib) and as a result will work on most environments. We've tested it on:
- MacOS 10 and 11
- Windows 10
- Ubuntu 16 and 18
We highly recommend using a Linux environment such as Ubuntu when running this repository. The CI/CD pipeline configured with this repository utilizes an Ubuntu environment when linting and running main.py for training and evaluation on both datasets.