Skip to content

mateus558/UFJF-Machine-Learning-Toolkit

Repository files navigation

UFJF - Machine Learning Toolkit

Open Source Love License: MIT Generic badge Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. CMake Meson Status Doxygen Status GitHub last commit codecov Documentation Status

This project aims to provide researchers and developers basic tools for manipulation of datasets, implementation and test of ML algorithms and some already implemented methods.
It's not intended to be just a collection of algorithms, but also to auxiliate and create a pattern in future ML algorithms implementations through a set of interconected modules that can be used in most ML projects.

Overview

Documentation

You can find the documentation at the project page: UFJF-MLTK.
And for examples and other information you can access the wiki.

Installation

In order to make the project available for the majority of users and to be cross-platform, the project was adapted to CMake and Meson,the most wide used build systems. Therefore, there are two install methods for the project that can be seen below.

Requirements

  • meson or cmake
  • g++ >= 8
  • c++ >= 17
  • gnuplot >= 5 (only for visualization module)

CMake

mkdir build
cd build
cmake ..
make
sudo make install

Meson

meson build
meson compile -C build
meson install -C build

After that, the library will be available system wide and it can be used as any library.

Code Example

The framework is intended to make easier the usage of machine learning algorithms in C++, in the following example we output the 10-fold cross validation accuracy of the kNN algorithm with 3 neighbors, as we can see, we can do it with few lines of code.

main.cpp

#include <ufjfmltk/Core.hpp>
#include <ufjfmltk/Validation.hpp>
#include <ufjfmltk/Classifier.hpp>

int main(){
  mltk::Data<double> data("iris.data");
  mltk::classifier::KNNClassifier<double> knn(data, 3);

  std::cout << "Dataset size: " << data.size() << std::endl;
  std::cout << "Dataset dimension: " << data.dim() << std::endl;

  std::cout << "KNN accuracy: ";
  std::cout << mltk::validation::kfold(data, knn, 10, 42, 0).accuracy
            << "%" << std::endl;
}

Compiling:

g++ -std=c++17 main.cpp -o main -lufjfmltk

This program outputs the following:

Dataset size: 150
Dataset dimension: 4
KNN accuracy: 100%

Modules status

  • Data manipulation Generic badge
  • Artificial datasets Generic badge
  • Data visualization Generic badge
  • Classifiers (Primal and Dual) Generic badge
  • Ensemble Generic badge
  • Regression Generic badge
  • Validation (K-Fold Cross-Validation) Generic badge
  • Feature Selection Generic badge
  • Documentation Generic badge

Authors

Mateus Coutinho Marim (mateus.marim@ice.ufjf.br)
Saulo Moraes Villela (saulo.moraes@ufjf.edu.br)
Alessandreia Marta de Oliveira Julio (alessandreia.oliveira@ice.ufjf.br)

Universidade Federal de Juiz de Fora
Departamento de Ciência da Computação