Skip to content

Commit

Permalink
Merge pull request #45 from adi3/master
Browse files Browse the repository at this point in the history
Restructuring of paper and editorial changes
  • Loading branch information
lukapecnik committed Jan 14, 2021
2 parents 61eef62 + e8ff31f commit b7472b4
Showing 1 changed file with 11 additions and 13 deletions.
24 changes: 11 additions & 13 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,30 +22,28 @@ bibliography: paper.bib

# Summary

The growing interest and development of Machine Learning (ML) methods has led to the search for ways to use these methods in the simplest way, as in general it is considered that data preprocessing and selection of appropriate ML algorithms is a demanding and time-consuming task that requires a lot of domain-specific knowledge and the use of trial-and-error approaches. That is why Automated Machine Learning (AutoML) methods have been developed. Their purpose is to automate data preprocessing and search for ML algorithms together with their hyperparameters, in order to discover the best possible ML pipeline depending on the input data [@bookHutter; @Fister2020Continuous].
The field of Automated Machine Learning (AutoML) has been developed to automate data preprocessing and search for optimal algorithms together with their hyperparameters in order to discover the best possible ML pipeline for an input dataset [@bookHutter]. AutoML can be modeled as a continuous optimization problem with several potential optimization methods considered. Stochastic population-based nature-inspired algorithms [@yang2014; @engelbrecht2007computational] are a popular class of tools for dealing with such continuous optimization problems. These algorithms are inspired mainly by the biological behavior of various species living in nature [@fister2013brief]. Such algorithms are composed of a population of individuals that undergo different variation operations during the evolution process which results in new populations. The Python framework we have developed, NiaAML, incorporates these stochastic algorithms to search for the most suitable classification pipeline in a dataset [@Fister2020Continuous].

AutoML can be modeled as a continuous optimization problem with several potential optimization methods applied. Stochastic population-based nature-inspired algorithms [@yang2014; @engelbrecht2007computational] are a very popular tool for dealing with such continuous optimization problems. These algorithms are inspired mainly by the biological principles and phenomena of the behavior of various species living in nature [@fister2013brief]. Each stochastic population-based nature-inspired algorithm is composed of a population of individuals that undergo different variation operators during the evolution process that results in new populations. We developed the AutoML framework NiaAML, written fully in the Python programming language. NiaAML is based mainly on the method initially proposed in [@Fister2020Continuous]. The framework incorporates stochastic population-based nature-inspired algorithms to search for the best classification pipeline [@Fister2020Continuous].

The framework is developed in a layer style layout architecture, consisting of several components, i.e. Feature Selection algorithms, Feature Transformation algorithms and classifiers. Its task is to find a perfect combination of components with proper classifiers hyperparameters` settings to build an efficient, yet customizable classification pipeline with the help of a popular collection of nature-inspired algorithms, named NiaPy [@Vrbančič2018]. Two types of optimizations in the NiaAML follow: (1) The first to find the optimal set of components for the pipeline, and (2) The second to tune the hyperparameters. Users can choose the ML components to be included into the optimization process freely, as well as select suitable fitness functions to be used for evaluation of candidate pipelines. Input data can be in the form of numerical and categorical features, as well as missing attributes, while pipelines are exported and imported as binary files for post-hoc use. Further, they can be exported as user-friendly text files that contain all of the relevant information about the pipeline and its components. A graphical outline of the NiaAML method is presented in Fig.\autoref{fig:NiaAMLflow}.
The framework is developed in a layer style layout architecture consisting of several components, including feature selection algorithms, feature transformation algorithms and classifiers. Its task is to find a perfect combination of components with proper classifier hyperparameter settings to build an efficient, yet customizable classification pipeline with the help of a popular collection of nature-inspired algorithms, named NiaPy [@Vrbančič2018]. NiaAML incorporates two types of optimizations, the first involves finding the optimal set of components for the pipeline, and the second involves tuning the hyperparameters. Users can freely choose the ML components to be included into the optimization process, as well as select suitable fitness functions to be used for evaluation of candidate pipelines. Input data can be in the form of numerical and categorical features, as well as missing attributes, while pipelines are exported and imported as binary files for post-hoc use. Further, they can be exported as user-friendly text files that contain all of the relevant information about the pipeline and its components. A graphical outline of the NiaAML method is presented in \autoref{fig:NiaAMLflow}.

![NiaAML flow.\label{fig:NiaAMLflow}](niaamlFlow.png)

NiaAML is different: Compared to similar Python AutoML frameworks, such as TPOT [@tpot] and auto-sklearn [@NIPS2015_11d0e628], NiaAML [@Fister2020Continuous] comes with the following benefits:

- it is fully modeled as a continuous optimization problem, which means that arbitrary stochastic nature-inspired population-based algorithms [@tzanetos2020comprehensive] can be used for solving this task (without any special modifications of their internal mechanisms),
# Statement of need

Searching for an optimal classification pipeline in ML that provides the best results for a particular classification task involves a lot of domain-specific knowledge and numerous trial-and-error approaches. For instance, several conditions must be fulfilled to adequately apply an ML algorithm, such as proper preparation and preprocessing of input data, and selection of appropriate classifiers as well as their parameters. Due to this, mostly ML experts and data scientists have been able to handle such tasks in the past. However, empirical evidence shows that automation of the optimal classifier selection process, feature preprocessing steps and their hyperparameters, can be dealt with by non-experts as well [@NIPS2015_11d0e628; @7280767].

- the search for the optimal combination of ML components and proper classifiers` hyperparameters can be conducted concurrently,
With the rise of AutoML methods, dealing with ML has also become available to researchers from other fields. Compared to similar Python AutoML frameworks, such as TPOT [@tpot] and auto-sklearn [@NIPS2015_11d0e628], NiaAML offers the following benefits:

- its layer style architecture allows straightforward adding of new ML components,
1. It is fully modeled as a continuous optimization problem, which means that arbitrary stochastic nature-inspired population-based algorithms can be used for solving this task without any special modifications of their internal mechanisms.

- every pipeline in the output is feasible and functional in cases of the correct specified hyperparameters' domains,
2. The search for the optimal combination of ML components and proper classifier hyperparameters can be conducted concurrently.

- the included GUI simplifies the work for regular users.
3. Its layer style architecture allows the straightforward addition of new ML components.

In conclusion, NiaAML is an AutoML framework based on stochastic population-based nature-inspired algorithms. Thus, the NiaAML framework is able to find the optimal classification pipelines built on preprocessing steps and classifiers, and, simultaneously, assuring a user-friendly experience. Working with ML is also, with the help of the NiaPy framework, more easy, adaptable, expandable and customizable. Last but not least, we can assure that more of the ML components and functionalities to the NiaAML framework are yet to come.
4. Every pipeline in the output is feasible and functional in cases of the correct specified hyperparameters domains.

# Statement of need
5. A Graphical User Interface (GUI) further simplifies the work for users without expert domain knowledge.

Searching for the optimal classification pipeline in ML that provides the best results for a particular classification task involves a lot of domain-specific knowledge and numerous trial-and-error approaches. For instance, several conditions must be fulfilled for an ML method, such as proper preparation and preprocessing of input data, and selection of appropriate classifiers as well as their parameters. Due to the aforementioned specifics, mostly ML experts and data scientists were able to handle such tasks in the past. Empirical evidence shows that automation of the optimal classifier selection process, feature preprocessing steps and their hyperparameters, can be dealt with by non-experts [@NIPS2015_11d0e628; @7280767; @Fister2020Continuous]. With the rise of AutoML methods, dealing with the ML also became available to non-experts from other related research areas for the purpose of developing practical ML applications, solving industry-related problems with ML, or even studying out of curiosity. In here, we see the bits of NiaAML to the ML community. Simplicity to use, extendability, adaptability and customizability are just a sample of NiaAML recognitions. The reason for its simplicity lies within the layer style architecture, which is easy to understand and allows the user to add customizable components in an easy way. An available Graphical User Interface as a separate application makes the use of NiaAML even easier.

# References

0 comments on commit b7472b4

Please sign in to comment.