Skip to content

Commit

Permalink
First version of paper
Browse files Browse the repository at this point in the history
  • Loading branch information
firefly-cpp committed Dec 10, 2020
1 parent c3eaef3 commit 0b78925
Show file tree
Hide file tree
Showing 3 changed files with 92 additions and 0 deletions.
Binary file added paper/niaamlFlow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
51 changes: 51 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
@incollection{Fister2020Continuous,
title={Continuous Optimizers for Automatic Design and Evaluation of Classification Pipelines},
author={Fister Jr., Iztok and Zorman, Milan and Fister, Du{\v{s}}an and Fister, Iztok},
booktitle={Frontier Applications of Nature Inspired Computation},
pages={281--301},
year={2020}
}
@article{Vrbančič2018,
doi = {10.21105/joss.00613},
url = {https://doi.org/10.21105/joss.00613},
year = {2018},
publisher = {The Open Journal},
volume = {3},
number = {23},
pages = {613},
author = {Grega Vrbančič and Lucija Brezočnik and Uroš Mlakar and Dušan Fister and Iztok Fister},
title = {NiaPy: Python microframework for building nature-inspired algorithms},
journal = {Journal of Open Source Software}
}
@inproceedings{NIPS2015_11d0e628,
author = {Feurer, Matthias and Klein, Aaron and Eggensperger, Katharina and Springenberg, Jost and Blum, Manuel and Hutter, Frank},
booktitle = {Advances in Neural Information Processing Systems},
editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
pages = {2962--2970},
publisher = {Curran Associates, Inc.},
title = {Efficient and Robust Automated Machine Learning},
url = {https://proceedings.neurips.cc/paper/2015/file/11d0e6287202fced83f79975ec59a3a6-Paper.pdf},
volume = {28},
year = {2015}
}
@INPROCEEDINGS{7280767,
author={I. {Guyon} and K. {Bennett} and G. {Cawley} and H. J. {Escalante} and S. {Escalera} and {Tin Kam Ho} and N. {Macià} and B. {Ray} and M. {Saeed} and A. {Statnikov} and E. {Viegas}},
booktitle={2015 International Joint Conference on Neural Networks (IJCNN)},
title={Design of the 2015 ChaLearn AutoML challenge},
year={2015},
volume={},
number={},
pages={1-8},
doi={10.1109/IJCNN.2015.7280767}}
@inproceedings{tpot,
title={TPOT: A tree-based pipeline optimization tool for automating machine learning},
author={Olson, Randal S and Moore, Jason H},
booktitle={Workshop on automatic machine learning},
pages={66--74},
year={2016},
organization={PMLR}
}
41 changes: 41 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: 'NiaAML: AutoML framework based on nature-inspired algorithms'
tags:
- Python
- AutoML
- classification
- hyperparameter optimization
- nature-inspired algorithms
authors:
- name: Luka Pečnik
orcid:
affiliation: "1, 2" # (Multiple affiliations must be quoted)
- name: Iztok Fister Jr.
orcid:
affiliation: 1
affiliations:
- name: University of Maribor, Faculty of Electrical Engineering and Computer Science
index: 1
date: 20 December 2020
bibliography: paper.bib

# Summary

Searching for the optimal classification pipeline in Machine Learning (ML) that provides the best results for a particular classification task involves a lot of domain-specific knowledge and numerous trial-and-error approaches. For instance, several conditions for an ML method must be fulfilled, such as proper preparation and preprocessing of input data, and selection of appropriate classifiers, their parameters, etc. Due to the specifics, mostly ML experts and data scientists were able to handle such tasks.
Automated Machine Learning methods (AutoML) are intended to automate some phases of ML tasks [@Fister2020Continuous]. With the rise of AutoML methods, dealing with the ML also became available to non-experts from other related research areas. Empirical evidence shows that automation of the optimal classifier selection process, feature preprocessing steps and their hyperparameters, can be dealt with by non-experts [@NIPS2015_11d0e628; @7280767; @Fister2020Continuous].

AutoML can be modeled as a continuous optimization problem with several potential optimization methods applied. Nature-inspired algorithms are a kind of very efficient tool for dealing with such continuous optimization problems. We developed the AutoML framework NiaAML, written fully in the Python programming language. The framework incorporates nature-inspired population-based algorithms to search for a good classification pipeline [@Fister2020Continuous]. NiaAML is easy to use, is expandable and can be customized to the users’ needs.

The framework is developed in a layer style layout architecture, consisting of several components, i.e. Feature Selection algorithms, Feature Transformation algorithms and classifiers. Its task is to find a perfect combination of components with proper classifiers hyperparameters` settings to build an efficient, yet customizable classification pipeline with the help of a popular collection of nature-inspired algorithms, named NiaPy [@Vrbančič2018]. Two types of optimizations in the NiaAML follow: (1) The first to find the optimal set of components for the pipeline, and (2) The second to tune the hyperparameters. Users can choose the ML components to be included into the optimization process freely, as well as select suitable fitness functions to be used for evaluation of candidate pipelines. Input data can be in the form of numerical and categorical features, as well as missing attributes, while pipelines are exported and imported as binary files for post-hoc use. Further, they can be exported as user-friendly text files that contain all of the relevant information about the pipeline and its components.

![NiaAML flow.\label{fig:NiaAMLflow}](niaamlFlow.png)

Although some similar Python frameworks for AutoML such as TPOT [@tpot] and auto-sklearn [@NIPS2015_11d0e628] exist, NiaAML is slightly different, regarding the search for a perfect pipeline with the use of nature-inspired population-based algorithms.

In conclusion, NiaAML is an AutoML framework based on nature-inspired population based algorithms. Thus, the NiaAML framework is able to find the optimal classification pipelines built on preprocessing steps and classifiers, and, simultaneously, assuring a user-friendly experience. Working with ML is also, with the help of the NiaPy framework, more easy, adaptable, expandable and customizable. Last but not least, we can assure that more of the ML components and functionalities to the NiaAML framework are yet to come.

# Statement of need

A lot of users became interested in developing practical ML applications, solving industry-related problems with ML, or even studying out of curiosity. Some of the users may not be familiar with the background or characteristics of ML algorithms and the data science field, which means that hiring some experts for consulting is unavoidable in such cases for them. In here, we see the bits of NiaAML to the ML community. Simplicity to use, extendability, adaptability and customizability are just a sample of NiaAML recognitions. The reason for its simplicity lies within the layer style architecture, which is easy to understand and allows the user to add customizable components in an easy way. An available Graphical User Interface as a separate application makes the use of NiaAML even easier.

# References

0 comments on commit 0b78925

Please sign in to comment.