# SMOTE - N Analysis

<blockquote>

- First, we will calculate the difference between values and observations using the Value Difference Metric.

- Second, we will implement SMOTE-N with imbalanced learn.</blockquote>

## Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import make_blobs
from sklearn.preprocessing import OrdinalEncoder

from imblearn.metrics.pairwise import ValueDifferenceMetric
from imblearn.over_sampling import SMOTEN

---

## Distance between values

In [2]:
# create a dataset with 1 feature

X = np.array(["green"] * 10 + ["red"] * 10 + ["blue"] * 10).reshape(-1,1)
y = [1] * 8 + [0] * 5 + [1] * 7 + [0] * 9 + [1]

# the function "ValueDifferenceMetric" works
# only with encoded variables, so we need to transform
# the strings into numbers first

encoder = OrdinalEncoder(dtype=np.int32)
X_enc = encoder.fit_transform(X)

# Now, we can learn the distances
# I put r=1 so we have the same results that I showed
# previously in the slides, for comparison

vdm = ValueDifferenceMetric(r = 1).fit(X_enc, y)

# the conditional probabilities of a value given the
# class are stored for each value

vdm.proba_per_class_

[array([[0.9, 0.1],
        [0.2, 0.8],
        [0.3, 0.7]])]

In [3]:
# The classes are stored in the categories_ attribute of the encoder

encoder.categories_

[array(['blue', 'green', 'red'], dtype='<U5')]