## SMOTE 

SMOTE(Synthetic Minority Over-sampling Technique) is a popular technique used to address class imbalance in datasets, particularly for classification tasks. The method works by generating synthetic samples for the minority class to balance the dataset, rather than just duplicating existing minority class instances.

* **How SMOTE Works:**
* Select a minority class instance.
* Find its k-nearest neighbors in the feature space.
* Generate synthetic samples by choosing random points along the line segment between the selected instance and its neighbors.
* This process helps to create a more balanced training dataset, which can improve the performance of classification models on imbalanced datasets.

In [5]:
!pip install imblearn
!pip install scikit-learn
!pip install imbalanced-learn


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [7]:
import imblearn


In [8]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
from collections import Counter
from sklearn.model_selection import train_test_split

# Generate an imbalanced classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2,
                            n_clusters_per_class=1, weights=[0.9, 0.1], flip_y=0, random_state=42)

# Check the original class distribution
print("Original class distribution:", Counter(y))

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Apply SMOTE to the training set
smote = SMOTE(sampling_strategy='auto', random_state=42)  # 'auto' balances the classes to the majority class
X_res, y_res = smote.fit_resample(X_train, y_train)

# Check the new class distribution after applying SMOTE
print("Resampled class distribution:", Counter(y_res))

# You can now proceed with training a classifier using the resampled data


Original class distribution: Counter({0: 900, 1: 100})
Resampled class distribution: Counter({0: 630, 1: 630})
