# Naive Bayes algorithm

The assumption is that:
$$ p(x_1, ... , x_n | y) = \prod^n_{j=1} p(x_j|y) $$

so likelihood of the data can be written as:

$$ L(\phi_y, \phi_{j|y}) = \prod^n_{i=1}p(x^{(i)}, y^{(i)}) $$

and parameters that maximize that likelihood are (for $k$ classes and $p$ possible $x_j$ values):

(Laplace smoothing is also applied - adding 1 to nominator and k to denominator of $p(x_j = p | y = k)$


$$ p(y = k ) = \frac{\sum^n_{i=1} 1 \{ y^{(i)} = k \} }{n} $$


$$ p(x_j = p | y = k) \frac{\sum^n_{i=1} 1 \{ y^{(i)} = k \wedge x_j^{(i)}=p \} + 1}{\sum^n_{i=1} 1 \{ y^{(i)} = k \} + k}$$

so 
$$ p(y=k|x) = \frac{p(x|y=k)p(y=k)}{p(x)} $$
$p(x)$ is constant so when predicting we can only consider the nominator

In [None]:
def predict(x_p, args, labels):
    preds = np.empty(0)
    classes_cnt = labels.shape[1]
    n = labels.shape[0]
    for i in range(classes_cnt):
        cnt_y_i = np.count_nonzero(labels[:, i])
        x_given_y_k = ((np.count_nonzero(np.multiply(args, labels[:, i].reshape(n, 1)) == x_p, axis=0)) + 1) \
            / (cnt_y_i + classes_cnt)
        preds = np.append(preds, np.prod(x_given_y_k) * (cnt_y_i / n))
    return preds


For presentation purposes additional predict function for k=2 classes is added

In [None]:
def pred_0_1(x_p, args, labels):
    a = predict(x_p, args, labels)
    if a[0] < a[1]:
        return 1
    else:
        return 0

Since we are dealing (in case of examplary data) with continous values, a discretization is needed

In [None]:
def train(x, y, k):
    discretization_constant = 1
    n = len(y)
    labels = np.zeros((len(y), k))
    args = np.zeros(x.shape)
    for i in range(n):
        labels[i, y[i]] = 1
        for j in range(x.shape[1]):
            args[i, j] = x[i, j] // discretization_constant

Visualisation of resulting split in data:

(Correctly classified data is colored red/blue depending on class, and missclassified points are green)

In [None]:
    err = 0
    for i in range(n):
        a = pred_0_1(x[i, :].reshape(1, x.shape[1])//discretization_constant, args, labels)
        if y[i] != a:
            color = '#01ff01'
            err += 1
        elif a == 1:
            color = '#ff2200'
        else:
            color = '#1f77b4'
        plt.scatter(x[i, 0], x[i, 1], c=color)

    plt.show()
    print(err)

!["plot"](plot.png)