Decision Trees

This project focuses on developing a well-evaluated method of training decision trees to high accuracy on WiFi localization datasets (included in wifidb/).

Running Training & Evaluation

Requirements are numpy and matplotlib (outlined in requirements.txt) and Python 3 (we recommend 3.8).

The script takes a single parameter (a commandline argument, the path for the dataset to use).

The syntax is as follows:

python3 main.py path/to/dataset.txt

To run training & evaluation on both datasets, use:

python3 main.py wifidb/clean_dataset.txt
python3 main.py wifidb/noisy_dataset.txt

Note: on certain machines the visualization may display an erroneous GDK asertion error. Please ignore this; the visualization uses the AGG backend but this does not seem to be correctly recognized by matplotlib, so the error message is still displayed although the visualization and script will run consistently and correctly regardless.

Training (programmatic)

The training utility is provided in train.py.

This can be accessed programmatically (separately from the full pipeline) via the train function.

The train function's exact nature is described further in train.py -- in summary, it requires simply the numpy parsed dataset, and will return both the tree and its total depth.

Example:

from train import train

data = np.loadtxt('path/to/my/dataset.txt')
tree, depth = train(data)

Prediction (programmatic)

The prediction utility is provided in evaluate.py.

This can be accessed programmatically via the predict function, which is described in evaluate.py -- in summary, it requires the input features (wifi signals) and a tree, and will return the predicted Room.

Example:

from evaluate import predict
...
tree = ...
X = ...
Y = predict(X, tree)

Evaluation (programmatic)

The evaluation utility is provided in evaluate.py.

This can be accessed programmatically via the evaluate and calculate_measures functions.

These are described in evaluate.py. In summary, evaluate requires a testing dataset and the trained tree as input, and will output both the accuracy and confusion matrix. calculate_measures simply requires the confusion matrix, and will print out relevant metrics.

Example:

from evaluate import evaluate calculate_measures
...
tree = ...
test_data = ...
accuracy, confusion_matrix = evaluate(test_data, tree)
calculate_measures(confusion_matrix) # print out relevant metrics, e.g. F1, recall, precision

Compatibility

This repository only has two requirements (numpy and matplotlib) and as a result will work on most environments. We've tested it on:

MacOS 10 and 11
Windows 10
Ubuntu 16 and 18

We highly recommend using a Linux environment such as Ubuntu when running this repository. The CI/CD pipeline configured with this repository utilizes an Ubuntu environment when linting and running main.py for training and evaluation on both datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github/workflows		.github/workflows
.idea		.idea
wifidb		wifidb
.gitignore		.gitignore
2020_DTcoursework.pdf		2020_DTcoursework.pdf
README.md		README.md
constants.py		constants.py
evaluate.py		evaluate.py
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py
tree.py		tree.py
visualizer.py		visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Decision Trees

Running Training & Evaluation

Training (programmatic)

Prediction (programmatic)

Evaluation (programmatic)

Compatibility

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

ossa-ma/decision-tree-learning

Folders and files

Latest commit

History

Repository files navigation

Decision Trees

Running Training & Evaluation

Training (programmatic)

Prediction (programmatic)

Evaluation (programmatic)

Compatibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages