## [Oversampling and Undersampling](https://towardsdatascience.com/oversampling-and-undersampling-explained-a-visual-guide-with-mini-2d-dataset-1155577d3091/)

> Artificially generating and deleting data for the greater good

#### Oversampling
Oversampling make a dataset more balanced when one group has a lot fewer examples than the other. The way it works is by making more copies of the examples from the smaller group. This helps the dataset represent both groups more equally.

#### Undersampling
On the other hand, undersampling works by deleting some of the examples from the bigger group until it’s almost the same in size to the smaller group. In the end, the dataset is smaller, sure, but both groups will have a more similar number of examples.

#### Hybrid Sampling
Combining oversampling and undersampling can be called "hybrid sampling". It increases the size of the smaller group by making more copies of its examples and also, it removes some of example of the bigger group by removing some of its examples. It tries to create a dataset that is more balanced – not too big and not too small.

In [None]:
!pip install -q pandas numpy scikit-learn matplotlib imbalanced-learn

In [None]:
import pandas as pd
import numpy as np

from imblearn.over_sampling import SMOTE, ADASYN, RandomOverSampler
from imblearn.under_sampling import TomekLinks, NearMiss, RandomUnderSampler
from imblearn.combine import SMOTETomek, SMOTEENN

import warnings
warnings.filterwarnings('ignore')

# Create a DataFrame from the dataset
data = {
    'Temperature': [1, 0, 1, 3, 2, 3, 1, 3, 4],
    'Humidity': [0, 2, 1, 1, 3, 2, 3, 4, 4],
    'Activity': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']
}
df = pd.DataFrame(data)

# Split the data into features (X) and target (y)
X, y = df[['Temperature', 'Humidity']], df['Activity'].astype('category')

df.head()

In [None]:
print("Features(X):\n", X)

print("Target(y)\n", y)

In [None]:
# Random OverSampler for oversampling
sampler = RandomOverSampler()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (Random OverSampler): ")
print(X_resampled)
print(y_resampled)

In [None]:
# SMOTE for oversampling
sampler = SMOTE()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (SMOTE): ")
print(X_resampled)
print(y_resampled)

In [None]:
# ADASYN for oversampling
sampler = ADASYN()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (ADASYN): ")
print(X_resampled)
print(y_resampled)

In [None]:
# Random UnderSampler for undersampling
sampler = RandomUnderSampler()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (Random UnderSampler): ")
print(X_resampled)
print(y_resampled)

In [None]:
# Tomek Links for undersampling
sampler = TomekLinks()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (TomekLinks): ")
print(X_resampled)
print(y_resampled)

In [None]:
# NearMiss-1 for undersampling
sampler = NearMiss(version=1)

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (NearMiss-1): ")
print(X_resampled)
print(y_resampled)

In [None]:
# ENN for undersampling
sampler = EditedNearestNeighbours()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (ENN): ")
print(X_resampled)
print(y_resampled)

In [None]:
# SMOTETomek for a combination of oversampling &amp; undersampling
sampler = SMOTETomek()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (SMOTETomek): ")
print(X_resampled)
print(y_resampled)

In [None]:
# ENN for undersampling
sampler = EditedNearestNeighbours()

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (ENN Undersampling): ")
print(X_resampled)
print(y_resampled)

In [None]:
# SMOTEENN for a combination of oversampling &amp; undersampling
sampler = SMOTEENN()                

# Apply the resampling method
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Print the resampled dataset
print("Resampled dataset (SMOTEENN): ")
print(X_resampled)
print(y_resampled)