Using Imblearn To Tackle Imbalanced Data Sets
Imbalanced data is a frequently occuring feature of data sets found in various fields such as epidemiology, marketing and fraud detection. Here I show examples of some methods for dealing with such data. The data used came from the KEEL data set repository. I used a data set called 'yeast3' which had a class imbalance ratio of 1:8.1.
Resources used:
- Imbalanced-learn documentation.
http://contrib.scikit-learn.org/imbalanced-learn/index.html
- Data mining with imbalanced class distributions concepts and methods (Prati et al 2009).
http://conteudo.icmc.usp.br/pessoas/gbatista/files/iicai2009.pdf
-
Resampling techniques and other strategies - Ajinkya More.
https://www.youtube.com/watch?v=-Z1PaqYKC1w -
KEEL data set repository.