AROSS: Area-based Representative points OverSampling with Shifting in Imbalance Learning

Area-based Representative Points Oversampling with Shifting (AROSS) is an algorithm targeting the class imbalance problem, it balances dataset by generating synthetic instances of minority class in safe and half-safe areas populated surrounding representative points, which is efficient with capturing the disjoints subsets of minority class and avoiding imposing class overlapping to the dataset.

Cite AROSS

If you wish to refer our work, please use the following BibTeX citation:

Soon to be replenished

Installation

The AROSS algorithm is created under python 3.9 with related dependencies:

scikit-learn (1.1.2)
pandas (1.4.2)
numpy (1.21.5)
pyclustering (0.10.1.2)
kneed (0.8.1)
scipy (1.8.1)

Basic usage

from AROSS import AROSS
from utils.utils import read_data
from utils.visualize import show_oversampled

X,y = read_data('Datasets/sampledata_new_3.csv')

ar = AROSS(n_cluster=5,linkage='ward')
X_oversampled,y_oversampled = ar.fit_sample(X,y)
show_oversampled(X,y,X_oversampled,y_oversampled)

Output figure:

About AROSS

AROSS can be parsed into four steps：

Clustering the input features using agglomerative clustering [1]

When the n_cluster is not given, the algorithm will determine it by BIC automatically [2]
When the linkage is not givem, the algorithm will determine it by CPCC automatically [3]

Extracting the representative points from clustering results [4]
Populating and classifying areas surrounding representative points
Generating synthetic instances using the Gaussian Generator

AROSS - shifting

Shifting refers to that, one more operation of shifting reps toward the centroid of the cluster will be conducted after extracting representative points (step 2) when the alpha given is not 0. The greater the alpha is, the more reps will be shifted toward the centroid.

Reference

[1] Fabian Pedregosa et al. “Scikit-learn: Machine learning in Python”. In: the Journal of machine Learning research 12 (2011), pp. 2825–2830.

[2] Gideon Schwarz. “Estimating the dimension of a model”. In: The annals of statistics (1978), pp. 461–464.

[3] James S Farris. “On the cophenetic correlation coefficient”. In: Systematic Zoology 18.3 (1969), pp. 279–285.

[4] Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. “CURE: An efficient clustering algorithm for large databases”. In: ACM Sigmod record 27.2 (1998), pp. 73–84.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
Datasets		Datasets
Detailed experimental results		Detailed experimental results
aros		aros
baseline		baseline
cluster		cluster
images		images
utils		utils
AROSS.py		AROSS.py
CPCC.csv		CPCC.csv
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AROSS: Area-based Representative points OverSampling with Shifting in Imbalance Learning

Cite AROSS

Installation

Basic usage

About AROSS

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AROSS: Area-based Representative points OverSampling with Shifting in Imbalance Learning

Cite AROSS

Installation

Basic usage

About AROSS

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages