Skip to content

gAldeia/experiments-ITEA-paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Experiments of the ITEA paper

OBS: although there is a standalone implementation of ITEA in this repository, it is not the newest version, and is outdated. This specific implementation was made to serve the specific purpose of the paper. I highly recommend to use the high-performing Haskell version (that comes with a python wrapper) by @folivetti, or to use my most updated version (the only one that I am maintaining).


In order to validade our proposed algorithm, several other methods were fine tuded through a gridsearch process, and then applied to the same set of problems.

We also performed some particular investigations, like the Marginal Effect of the expressions generated by the ITEA, SymTree, and FEAT algorithms.

Finally, a Bonferroni-adjusted wilcoxon test was performed between the ITEA and every other algorithm.

To make results more transparent and share our metodology, this repository organizes the source code, data set and results utilized in the paper.

In the next topics, the folder structure will be presented, then a detailed description of the main folders wil be given.

Citing us:
@misc{defranca2020interactiontransformation,
      title={Interaction-Transformation Evolutionary Algorithm for Symbolic Regression}, 
      author={Fabricio Olivetti de Franca and Guilherme Seidyo Imai Aldeia},
      year={2020},
      eprint={1902.03983},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Folder structure

.
├── datasets 
│   ├── commaSeparated
│   └── tabSeparated
├── docs
│   └── GSGP
│       ├── GSGP_documentation
│       ├── GSGP_examples
│       └── GSGP_original_code
├── results
│   ├── disentanglement
│   ├── gridsearch
│   ├── iteaMarginalEffect
│   └── rmse
└── src
    ├── analysis
    │   ├── RMSEs
    │   ├── disentanglement
    │   ├── hypothesysTesting
    │   └── marginalEffect
    └── gridsearch
        ├── GSGP-gridsearch
        └── itea-gridsearch

22 directories                                                                     
  • datasets: contains the data sets used in the paper, already separated in a 5-fold configuration. This way, we assure that every algorithm is tested over the same train and test configurations, no matter how the random generator is setup. For the GSGP, the input needs to be tabular separated data, so there is two folders holding the same data but with different separators. Also, a script to split a dataset into the 5 fold files is inside this folder.
  • docs: the original documentation of the GSGP is in this folder.
  • results
    • disentanglement: csv files containing the expression and disentanglement measures for the ITEA, SymTree, and FEAT (full) algorithms;
    • gridsearch: some of the studied algorithms are slow - performing a gridsearch makes it more time consuming. In order to overcome this problems, the gridsearch of specific algorithms can be interupted and start from checkpoints. This is achieved through a file to store the RMSE of different folds for different configurations. All files here does not have direct use, as they are used just as checkpoints.
    • iteaMarginalEffect: notebook to plot marginal effects for ITEA expressions.
    • rmse: files with the RMSE on the train and test partition on every fold of every data set, obtained by using the best configuration found in the gridsearch. Those are the reported results in the paper.
  • src
    • analysis: statistical tests, marginal effects analysis, disentanglement, etc. Source code of all analysis made in the paper.
    • gridsearch: source code to perform the gridsearch and evaluate all gorithms, obtaining the results inside ./results/rmse.
      • GSGP is executed on a jupyter notebook to facilitate debugging (python is used mainly to run shell instructions.), and is on a separated folder because the C++ implementation requires auxiliary files. There is also a python script with the content of the notebook.
      • Lasso, LassoLars, Rigde, Tree, Forest, kNN, and elnet results are obtained with the gridsearchCV and regressor implementations provided by scikit-learn. The use of the script is pretty straightfoward.
      • dcgp, feat, gplearn and itea are evolutionary algorithms and use our implementation of gridsearch. The implementation creates a file enumerating the possible configurations for the gridsearch as a reference.
        • gplearn, itea and feat are designed to create checkpoints even during the gridsearch, due to the slow time of execution.

Acknowledgments

This project is supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), grant number 2018/14173-8.

About

Source code for the experiments in the ITEA paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published