Skip to content

This repo represents all the resampling techniques needed to achieve better results in highly unbalanced or skewed data that has 77 % of data in one class and rest in others.

Notifications You must be signed in to change notification settings

epicure24/Classifier-for-highly-unbalanced-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Notebook might not open because of heavy visuals, you can see it on kaggle also https://www.kaggle.com/shweta2407/oversampling-vs-undersampling-techniques

How to classify a highly unbalanced or skewed data ?

An unbalanced data or skewed data is the dataset that has its most of the data falling in one class and rest in others.

unbalanced-data-graph

To classify this type of data, we need to first balance the data.

How to balance the unbalanced data ?

Apply different resampling techniques to balance the data : there are 2 kinds of resampling techniques - OVERSAMPLING & UNDERSAMPLING techniques.

balanced-data-graph

OVERSAMPLING Techniques

SMOTE - Synthetic Minority Oversampling Technique

UNDERSAMPLING Techniques

NearMiss Version 1, 2, 3

Tomek Links

Condensed Nearest Neighbor

Edited Nearest Neighbor

Combination of Oversampling & Undersampling Techniques

One Sided Selection (Tomek Links and the Condensed Nearest Neighbor (CNN)

Neighborhood Cleaning Rule (Condensed Nearest Neighbor & Edited Nearest Neighbors )

About

This repo represents all the resampling techniques needed to achieve better results in highly unbalanced or skewed data that has 77 % of data in one class and rest in others.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published