Skip to content

majobasgall/big_data_reduction_recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems

FDR2 -BD is a methodological data condensation approach for reducing tabular big datasets in classification problems. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark.

For further information, please refer to related article: https://www.mdpi.com/2079-9292/10/15/1757

Please, cite this software as:

Basgall, M.J.; Naiouf, M.; Fernández, A. FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics 2021, 10, 1757. https://doi.org/10.3390/electronics10151757

About

FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages