Proximity Forest (Using only DTW measure)

An effective and scalable distance-based classifier for time series classification. This repostitory contains the source code for the time series classification algorithm Proximity Forest, published in the paper [https://arxiv.org/abs/1808.10594]

Check out the original project developed in Java: https://github.com/fpetitjean/ProximityForest

Proximity Forests in Python using dtai DTW distance measure

This is the Proximity Forest algorithm implemented to classify Time Series.

Requirements:

You can only run this project if you have a UNIX-based OS (Linux or OSX)
You must have python3.7
Give full permission to your project directory and files
We recommend you locate your project in your $HOME directory.

Packages to Install Previously:

pip install dtaidistance or pip3 install dtaidistance
pip install -v --force-reinstall --no-deps --no-binary dtaidistance dtaidistance
pip install scipy or pip3 install scipy

How to run a simple execution:

Download the project
Give full permissions to your project
Go to the directory launchers
Execute the following command:
- sh TermLauncherParam.sh [DATASET_NAME], where [DATASET_NAME] is the name of the dataset you have stored in the project. The Names available are:
  - ItalyPowerDemand
  - MoteStrain
  - Plane
  - GesturePebbleZ2
Try this example: - sh TermLauncherParam.sh ItalyPowerDemand

Parameters of python execution:

Run the sh file application/TermLauncher.sh setting the following parameters:
name: The name of the experiment
-train: The training dataset path. It must have a .ts or .arff extension.
-test: The testing dataset path. It must have a .ts or .arff extension.
The training and testing datasets can be found in the folder datasets/
-repeat: Number of repeats of the experiment
-trees: Number of trees of the PForest
-candidates: Number of candidates per tree
-targetlast: It indicates if the last column of the dataset is the class of the serie.
- P.e. if the line is 0.32, 0.45, ..., -0.12, X_AXIS, setting -targetlast=True means the X_AXIS is the series class and it's located in the last column

Comparison with Sktime.ProximityForest

sktime is a Python machine learning toolbox for time series with a unified interface for multiple learning tasks. You can find the package through this URL: https://sktime.org

If you want to compare this Proximity Forest Project to the sktime project:

Run the sh file launchers/launchersktime.sh, which can be found in the launchers directory, setting the following arguments:
- name: The name of the dataset. The Names available are:
  - ItalyPowerDemand
  - MoteStrain
  - Plane
  - GesturePebbleZ2
  - Chinatown
  - ERing
  - FacesUCR
  - FreezerRegularTrain
  - SmoothSubspace
  - ElectricDevices
- trees: The number of trees for the algorithm
- candidates: Number of candidates per split
Example: sh launchersktime.sh Plane 20 3

Installing Package

You can install the package using the command:

pip install Pforests-dtw

Nevertheless, you must have installed the following packages:

numpy
dtaidistance
pytest
scipy

Example

from trees import ProximityForest
from core import FileReader
import random
train_dataset = FileReader.FileReader.load_arff_data("/Users/moradisten/Projects/PForests/datasets/Plane/Plane_TRAIN.arff")
test_dataset = FileReader.FileReader.load_arff_data("/Users/moradisten/Projects/PForests/datasets/Plane/Plane_TEST.arff")
Pforest = ProximityForest.ProximityForest(1, n_trees=100, n_candidates=5)
Pforest.train(train_dataset)
results = Pforest.test(test_dataset)
print(results.accuracy)

Data Structure: ListDataset

Normally, in Data Science, we are used to handle datasets with Pandas. But for these case, as researchers did, a data structure has been developed, which is the ListDataset. This data structure contains:

series_data: Contains a list of the time series (X).
classes: Contains a list of the labels (y).
class_counter: Dictionary which contains the count of each label -> <label, label count>
series_map: Map which indicates the number of series per label -> <label, nº series>

File Format

This project reads csv and arff files.

My Bachelor Thesis

If you speak Spanish, I invite you to read my bachelor's thesis about this matter

https://figshare.com/articles/thesis/Distance-based_Time_Series_Classifiers/13269005

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.idea		.idea
TestUnits		TestUnits
application		application
core		core
csvresults		csvresults
dataStructures		dataStructures
datasets		datasets
distance		distance
launchers		launchers
outputs		outputs
trees		trees
util		util
.DS_Store		.DS_Store
README.md		README.md
__init__.py		__init__.py

moradabaz/ProximityForests-python

Folders and files

Latest commit

History

Repository files navigation

Proximity Forest (Using only DTW measure)

Proximity Forests in Python using dtai DTW distance measure

Comparison with Sktime.ProximityForest

Installing Package

Example

Data Structure: ListDataset

File Format

My Bachelor Thesis

About

Topics

Resources

Stars

Watchers

Forks

Languages