## Description

Naive Bayes is a classifiction algorithm that is using Bayes Theorem in order to provide prediction based on conditional probability of an event A given another event B occured.

#### Bayes Theorem

$$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$$

where:

- $ P(A|B) $ - conditional probability of A given B
- $ P(B|A) $ - conditional probability of B given A
- $ P(A) $ - probability of A
- $ P(B) $ - probability of B

---
## Example #1 - Coin Flip

#### Problem
Given two coins what's the probability of second coin being heads given first was tails.

#### Let's write down probabilities

Each coin has two posible states: head and tails. Therefore let's state:

- $P(A)$ - probability of first coin being head
- $P(B)$ - probability of second coin being head

equals to:

$P(A) = \frac{1}{2}$ (1 - state which is head, divided by 2 - number of states)

$P(B) = \frac{1}{2}$ (1 - state which is head, divided by 2 - number of states)

At the same time we can state that:

- $P(A)'$ - probability of first coin being tails
- $P(B)'$ - probability of second coin being tails

equals to:

$P(A') == P(A)$

$P(B') == P(B)$

Logic in this case says that both coins are not depended on each other so it shouldn't matter what will be the result of the first coin flip, probability of second coin being tails is always the same. So using already stated nomenclature it is possible to say:

- $P(A'|B)$ - probability of first coin being tails given that second coin being head
- $P(B|A')$ - probability of second coin being head given that first coin being tails

$P(A'|B) = \frac{1}{2}$

$P(B|A') = \frac{1}{2}$

Now let's see if the Bayes Theorem will give the same result:

$P(B|A') = \frac{P(A'|B) P(B)}{P(A')} = \frac{\frac{1}{2} \frac{1}{2}}{\frac{1}{2}} = \frac{1}{2}$

---
## Example #2 - Dices

#### Problem
Given two dices, what's probability of getting sum of eyes equal to 9 given that first dice throw resulted in even number.

#### Let's write down probabilities

Each dice has possible six states: 1, 2, 3, 4, 5, 6. Let's write down all possibilities:

| Dice 1 | Dice 2 | Sum | Meets Requirement |
|------|------|------|------|
|1|1|2|No|
|2|1|3|No|
|3|1|4|No|
|4|1|5|No|
|5|1|6|No|
|6|1|7|No|
|1|2|3|No|
|2|2|4|No|
|3|2|5|No|
|4|2|6|No|
|5|2|7|No|
|6|2|8|No|
|1|3|4|No|
|2|3|5|No|
|3|3|6|No|
|4|3|7|No|
|5|3|8|No|
|6|3|9|Yes|
|1|4|5|No|
|2|4|6|No|
|3|4|7|No|
|4|4|8|No|
|5|4|9|No|
|6|4|10|No|
|1|5|6|No|
|2|5|7|No|
|3|5|8|No|
|4|5|9|Yes|
|5|5|10|No|
|6|5|11|No|
|1|6|7|No|
|2|6|8|No|
|3|6|9|No|
|4|6|10|No|
|5|6|11|No|
|6|6|12|No|

Consequently it is possible to state:
  
- $P(A)$ - probability of first dice result being even
- $P(B)$ - probability of sum of dices results being 9
- $P(A|B)$ - probability of first dice throw resulted in even number, given that sum of dices results is 9
- $P(B|A)$ - probability of sum of dices result being 9 given that first dice throw resulted in even number

equals to:

$P(A) = \frac{3}{6} = \frac{1}{2}$ (out of 6 options, possible are: 2, 4, 6)

$P(B) = \frac{4}{36} = \frac{1}{9}$ (out of 36 options, possible are: (6,3), (5,4), (4,5), (3,6))

$P(A|B) = \frac{2}{36} = \frac{1}{18}$ (out of 36 options, possible are: (6,3), (4,5))

and finally:


$P(B|A) = \frac{P(A|B) P(B)}{P(A)} = \frac{\frac{1}{18}\frac{1}{9}}{\frac{1}{2}} = 0.012345679012345678$

---
## Example #3 - Car Accident

What's the probability of car having an accident given that driver is driving in summer, there is no rain, it's a night and it's an urban area?

#### Mock data:

| Season | Weather | Daytime | Area | Did Accident Occur? |
|------|------|------|------|------|
| Summer | No-Raining | Night | Urban | No |
| Summer | No-Raining | Day | Urban | No |
| Summer | Raining | Night | Rural | No |
| Summer | Raining | Night | Urban | Yes |
| Summer | Raining | Day | Urban | No |
| Summer | Raining | Night | Rural | No |
| Winter | Raining | Night | Urban | Yes |
| Winter | Raining | Night | Urban | Yes |
| Winter | Raining | Night | Rural | Yes |
| Winter | No-Raining | Night | Rural | No |
| Winter | No-Raining | Night | Urban | No |
| Winter | No-Raining | Day | Urban | Yes |
| Spring | No-Raining | Night | Rural | Yes |
| Spring | No-Raining | Day | Rural | Yes |
| Spring | Raining | Night | Urban | No |
| Spring | Raining | Day | No | No |
| Spring | No-Raining | Night | Urban | No |
| Autumn | Raining | Night | Urban | Yes |
| Autumn | Raining | Day | Rural | Yes |
| Autumn | No-Raining | Night | Urban | No |
| Autumn | No-Raining | Day | Rural | No |
| Autumn | No-Raining | Day | Urban | No |
| Autumn | Raining | Day | Yes | No |
| Autumn | Raining | Night | Yes | No |
| Autumn | No-Raining | Night | No | No |

To handle data like this it is possible to calculate frequencies for each case:

#### 0. Accident probability
$P(Accident) = \frac{9}{25} = 0.36$

$P(No-Accident) = \frac{16}{25} = 0.64$

#### 1. Season probability

Frequency table:

| Season | Accident | No Accident | |
|------|------|------|------|
| Spring | 2/9 | 3/16 | 5/25 |
| Summer | 1/9 | 5/16 | 6/25 |
| Autumn | 2/9 | 6/16 | 8/25 |
| Winter | 4/9 | 2/16 | 6/25 |
| |9/25|16/25| |

Probabilities based on table:
 
$P(Spring) = \frac{5}{25} = 0.20$

$P(Summer) = \frac{6}{25} = 0.24$

$P(Autumn) = \frac{8}{25} = 0.32$

$P(Winter) = \frac{6}{25} = 0.24$

$P(Spring | Accident) = \frac{2}{9} = 0.22$

$P(Summer | Accident) = \frac{1}{9} = 0.11$

$P(Autumn | Accident) = \frac{2}{9} = 0.22$

$P(Winter | Accident) = \frac{4}{9} = 0.44$

#### 2. Weather probability

Frequency table:

| | Accident | No Accident | |
|------|------|------|------|
| Raining | 6/9 | 7/16 | 13/25 |
| No-Raining | 3/9 | 9/16 | 12/25 |
| | 9/25 | 16/25 | |

Probabilities based on table:

$P(Raining) = \frac{13}{25} = 0.52$

$P(No-Raining) = \frac{12}{25} = 0.48$

$P(Raining|Accident) = \frac{6}{9} = 0.667$

$P(No-Raining|Accident) = \frac{12}{25} = 0.333$


#### 3. Daytime probability

Frequency table:

| | Accident | No Accident | |
|------|------|------|------|
| Day | 3/9 | 6/16 | 9/25 |
| Night | 6/9 | 10/16 | 16/25 |
| | 9/25 | 16/25 | |

Probabilities based on table:

$P(Day) = \frac{9}{25} = 0.36$

$P(Night) = \frac{16}{25} = 0.64$

$P(Day|Accident) = \frac{3}{9} = 0.333$

$P(Night|Accident) = \frac{6}{9} = 0.667$

#### 4. Area probability

Frequency table:

| | Accident | No Accident | |
|------|------|------|------|
| Urban Area | 5/9 | 8/16 | 13/25 |
| Rural Area | 4/9 | 8/16 | 12/25 |
| | 9/25 | 16/25 | |

Probabilities based on table:

$P(Urban) = \frac{13}{25} = 0.52$

$P(Rural) = \frac{12}{25} = 0.48$

$P(Urban|Accident) = \frac{5}{9} = 0.556$

$P(Rural|Accident) = \frac{4}{9} = 0.444$

#### Assemble:

Calculating probablity of car accident occuring in summer, when there is no rain and during night, in urban area.

Where B equals to:
- Season: Summer
- Weather: No-Raining
- Daytime: Night
- Area: Urban

Where A equals to:
- Accident

Using Naive Bayes:

$P(A|B) = P(Accident | Season = Summer, Weather = No-Raining, Daytime = Night, Area = Urban)$

$P(A|B) = \frac{P(Summer|Accident)P(No-Raining|Accident)P(Night|Accident)P(Urban|Accident)P(Accident)}{P(Summer)P(No-Raining)P(Night)P(Urban)}$

$P(A|B) = \frac{\frac{1}{9}\frac{6}{9}\frac{6}{9}\frac{5}{9}\frac{9}{25}}{\frac{6}{25}\frac{12}{25}\frac{16}{25}\frac{13}{25}} = \frac{0.111\cdot0.667\cdot0.667\cdot0.556\cdot0.36}{0.24\cdot0.48\cdot0.64\cdot0.52} = \frac{0.0099}{0.038} = 0.26$

*Luckily the data was made up :)*

---
## Example 4 - Iris

To which type of Iris does the following sample belongs to?

|Sepal Length| Sepal Width | Petal Length | Petal Width | Iris Type |
|------|------|------|------|------|
|6.3|3.4|5.6|2.4|?|


Mock data:

|Sepal Length| Sepal Width | Petal Length | Petal Width | Iris Type |
|------|------|------|------|------|
|5.1|3.8|1.6|0.2|Setosa|
|4.6|3.2|1.4|0.2|Setosa|
|5.3|3.7|1.5|0.2|Setosa|
|5.0|3.3|1.4|0.2|Setosa|
|7.0|3.2|4.7|1.4|Versicolor|
|6.9|3.1|4.9|1.5|Versicolor|
|5.0|2.0|3.5|1.0|Versicolor|
|5.9|3.0|4.2|1.5|Versicolor|
|6.3|3.3|6.0|2.5|Virginica|
|4.9|2.5|4.5|1.7|Virginica|
|7.3|2.9|6.3|1.8|Virginica|
|6.7|2.5|5.8|1.8|Virginica|

In this case, there are numerical values so it is impossible to use frequency count. To handle this problem Gaussian Naive Bayes will be used. For each class, there will be created a representation of this class which is **mean** and **standard deviation** of each feature describing this class. Then **Gaussian Probability Density Function** can be used do calculate what's the probabilty of query sample belonging to specific class:

#### 1. Mean formula
$$\mu = \frac{1}{m} \sum_{i=1}^{m}x_i$$

where:

- $ \mu $ - mean
- $ m $ - amount of samples
- $ i $ - sample index
- $ x $ - sample value

#### 2. Standard Deviation Formula
$$\sigma = \sqrt{\frac{1}{m - 1}\sum_{i=1}^{m}(x_i - \mu)^2} $$

where:

- $ \sigma $ - standard deviation
- $ \mu $ - mean
- $ m $ - amount of samples
- $ i $ - sample index
- $ x $ - sample value

#### 3. Gaussian Pobability Density Function
$$f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x - \mu)^2}{2\sigma^2}}$$

where:

- $ \sigma $ - standard deviation
- $ \mu $ - mean
- $ e $ - Euler constant
- $ x $ - sample value

---

### Problem Statement
Let's solve the problem with Gaussian Naive Bayes, where:

Where B equals to:
- Sepal Length: 6.3
- Sepal Width: 3.4
- Petal Length: 5.6
- Petal Width: 2.4

Goal is to find largest posterior $P(A/B)$ where A is a **Class** and can equal to: 
- Setosa
- Veriscolor
- Virginica

And the equation to calculate posterior of class is as follows:

$P(Class | Sepal Length=6.3, Sepal Width=3.4, Petal Length=5.6, Petal Width=2.4) =$ 
$P(Sepal Length=6.3 | Class)P(Sepal Width=3.4 | Class)P(Petal Length=5.6 | Class)P(Petal Width=2.4 | Class)P(Class)$

but comparing to Bayes theorem following formula is lacking scaling factor $P(B)$:

$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$

because in case of Gaussian Naive Bayes $P(B)$ **doesn't affect the result** so it can be ommitted. So the following formula is used:

$P(A|B) \approx P(B|A)P(A)$

### Required Probabilities

- In order to find posterior $P(Setosa | Sepal Length=6.3, Sepal Width=3.4, Petal Length=5.6, Petal Width=2.4)$:

$P(Setosa) = \frac{4}{12} = 0.33$ (because there are 4 cases of Setosa out of 12 cases in table)

$ \mu_{setosa-sl} = 5.0 $

$ \mu_{setosa-sw} = 3.5 $

$ \mu_{setosa-pl} = 1.475 $

$ \mu_{setosa-pw} = 0.2 $

$ \sigma_{setosa-sl} = 0.255 $

$ \sigma_{setosa-sw} = 0.255 $

$ \sigma_{setosa-pl} = 0.083 $

$ \sigma_{setosa-pw} = 0.0 $

$P(Sepal Length=6.3 | Setosa) = \frac{1}{\sqrt{2\pi}\cdot0.255}e^{-\frac{(6.3 - 5.0)^2}{2\cdot0.255^2}} = 0.0000035373$

$P(Sepal Width=3.4 | Setosa) = \frac{1}{\sqrt{2\pi}\cdot0.255}e^{-\frac{(3.4 - 3.5)^2}{2\cdot0.255^2}} = 1.4489243033$

$P(Petal Length=5.6 | Setosa) = \frac{1}{\sqrt{2\pi}\cdot0.083}e^{-\frac{(5.6 - 1.475)^2}{2\cdot0.083^2}} = 0.0$

$P(Petal Width=2.4 | Setosa) = \frac{1}{\sqrt{2\pi}\cdot0.0}e^{-\frac{(2.4 - 0.2)^2}{2\cdot0.0^2}} = 0.0$

- In order to find posterior $P(Veriscolor | Sepal Length=6.3, Sepal Width=3.4, Petal Length=5.6, Petal Width=2.4)$:

$P(Veriscolor) = \frac{4}{12} = 0.33$ (because there are 4 cases of Veriscolor out of 12 cases in table)

$ \mu_{veriscolor-sl} = 6.2 $

$ \mu_{veriscolor-sw} = 2.825 $

$ \mu_{veriscolor-pl} = 4.325 $

$ \mu_{veriscolor-pw} = 1.35 $

$ \sigma_{veriscolor-sl} = 0.815 $

$ \sigma_{veriscolor-sw} = 0.482 $

$ \sigma_{veriscolor-pl} = 0.54 $

$ \sigma_{veriscolor-pw} = 0.206 $

$P(Sepal Length=6.3 | Veriscolor) = \frac{1}{\sqrt{2\pi}\cdot0.815}e^{-\frac{(6.3 - 6.2)^2}{2\cdot0.815^2}} = 0.4855496676$

$P(Sepal Width=3.4 | Veriscolor) = \frac{1}{\sqrt{2\pi}\cdot0.482}e^{-\frac{(3.4 - 2.825)^2}{2\cdot0.482^2}} = 0.4061237321$

$P(Petal Length=5.6 | Veriscolor) = \frac{1}{\sqrt{2\pi}\cdot0.54}e^{-\frac{(5.6 - 4.325)^2}{2\cdot0.54^2}} = 0.0455923085$

$P(Petal Width=2.4 | Veriscolor) = \frac{1}{\sqrt{2\pi}\cdot0.206}e^{-\frac{(2.4 - 1.35)^2}{2\cdot0.206^2}} = 0.0000045053$

- In order to find posterior $P(Virginica | Sepal Length=6.3, Sepal Width=3.4, Petal Length=5.6, Petal Width=2.4)$:

$P(Virginica) = \frac{4}{12} = 0.33$ (because there are 4 cases of Virginica out of 12 cases in table)

$ \mu_{virginica-sl} = 6.3 $

$ \mu_{virginica-sw} = 2.8 $

$ \mu_{virginica-pl} = 5.65 $

$ \mu_{virginica-pw} = 1.95 $

$ \sigma_{virginica-sl} = 0.883 $

$ \sigma_{virginica-sw} = 0.332 $

$ \sigma_{virginica-pl} = 0.687 $

$ \sigma_{virginica-pw} = 0.320 $

$P(Sepal Length=6.3 | Virginica) = \frac{1}{\sqrt{2\pi}\cdot0.883}e^{-\frac{(6.3 - 6.3)^2}{2\cdot0.883^2}} = 0.4517129780$

$P(Sepal Width=3.4 | Virginica) = \frac{1}{\sqrt{2\pi}\cdot0.332}e^{-\frac{(3.4 - 2.8)^2}{2\cdot0.332^2}} = 0.2341815809$

$P(Petal Length=5.6 | Virginica) = \frac{1}{\sqrt{2\pi}\cdot0.687}e^{-\frac{(5.6 - 5.65)^2}{2\cdot0.687^2}} = 0.5788419280$

$P(Petal Width=2.4 | Virginica) = \frac{1}{\sqrt{2\pi}\cdot0.32}e^{-\frac{(2.4 - 1.95)^2}{2\cdot0.32^2}} = 0.4640357888$

#### Result

Posterios for each class looks like follows:

$P(Setosa | Sepal Length=6.3, Sepal Width=3.4, Petal Length=5.6, Petal Width=2.4) = 0.0$

$P(Veriscolor | Sepal Length=6.3, Sepal Width=3.4, Petal Length=5.6, Petal Width=2.4) = 0.0000000135$

$P(Virginica | Sepal Length=6.3, Sepal Width=3.4, Petal Length=5.6, Petal Width=2.4) = 0.0094712109$

Consequently the answer is: 

|Sepal Length| Sepal Width | Petal Length | Petal Width | Iris Type |
|------|------|------|------|------|
|6.3|3.4|5.6|2.4|Virginica|

as it's posterior for given data is the highest.

## Gaussian Naive Bayes - Raw Implementation

### Imports

In [1]:
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Loading Data

In [2]:
def load_data():
    """Function loads dictionary containing iris dataset from scikit-learn
    library and splits samples in stratified way, in 0.8/0.2 ratio, into 
    train and test datasets.
    """
    iris = load_iris()
    samples, targets = iris["data"], iris["target"]
    return train_test_split(
        samples, targets, 
        test_size=0.2, 
        stratify=targets,
        shuffle=True, 
        random_state=42
    )

### Algorithm Code

In [3]:
class GaussianNaiveBayes:
    """Implementation of Gaussian Naive Bayes algorithm."""
    
    def __init__(self):
        """Inits variables for storing class representations."""
        self.classes = None
        self.class_priors = None
        self.class_feature_mean = None
        self.class_feature_std = None
    
    @staticmethod
    def _gaussian_probabiity_density_function(value, mean, std):
        """Implementation of Gaussian Probability Density Function. Used to tell what's the 
        probablity of "value" belonging to Gaussian Distribution described with "mean" and "std"
        values. Link to equation: https://en.wikipedia.org/wiki/Gaussian_function
        
        Parameters:
        -----------
        value: float
            Value for which probability of belonging to distribution will be returned.
        mean: float
            Mean of modeled Gaussian Distribution.
        std: float
            Standard Deviation of modeled Gaussian Distribution.
            
        Returns:
        -----------
        probability: float
            Probability of value belonging to distribution.
        """
        eps = 1e-4
        exponent = np.exp(-((value - mean)**2) / (2.0 * std**2 + eps))
        coefficient = 1.0 / (np.sqrt(2.0 * np.pi) * std + eps)
        probability = coefficient * exponent
        return probability
        
    def _set_prior_for_classes(self, y):
        """Sets prior (probability of occurence) for each class.
        
        Parameters:
        -----------
        y: numpy.array
            Vector containing class id for each training sample in vector "x".
        """
        self.classes, class_counts = np.unique(y, return_counts=True)
        self.class_priors = class_counts / np.sum(class_counts)
    
    def _set_distribution_parameters_for_classes(self, X, y):
        """Sets mean and standard deviation for each unique class in vector "y" based on it's
        samples stored in matrix "X".
        
        Parameters:
        -----------
        X: numpy.array
            Vector of training samples.
        y: numpy.array
            Vector containing class id for each training sample in matrix "X".
        """
        self.class_feature_mean = np.array(
            [np.mean(X[np.argwhere(y == c).flatten()], axis=0) for c in self.classes])
        self.class_feature_std = np.array(
            [np.std(X[np.argwhere(y == c).flatten()], axis=0) for c in self.classes])
        
    def _predict_for_sample(self, x):
        """Classification with usage of Bayes Rule: P(Y|X) = P(X|Y) * P(Y) / P(X)
        
        where: 
            
            P(Y|X) - Posterior, a probability of sample x belonging to class y given that features 
                     of sample x are independed and distributed according to distribution of class y
                     and it's prior.
            P(X|Y) - Probability of sample x given class distribution y.
            P(Y)   - Prior, a probability of class y occuring in whole dataset.
            P(X)   - Scales the posterior to make it a proper probability distribution. This term 
                     is ignored as it doesn't affect prediction.
                     
        Parameters:
        -----------
        x: numpy.array
            Vector of feature values representing sample "x".
            
        Returns:
        -----------
        predicted_class: object
            Returns class for which calculated posterior was the highest.
        """
        class_probabilities = self._gaussian_probabiity_density_function(
            x, self.class_feature_mean, self.class_feature_std)
        class_posteriors = self.class_priors * np.prod(class_probabilities, axis=1)
        predicted_class = self.classes[np.argmax(class_posteriors)]
        return predicted_class
        
    def fit(self, X, y):
        """For each unique class of vector 'y' calculates a representation based on samples in 
        matrix 'X'. Representation is a mean and standard deviation of each feature describing 
        samples in matrix 'X'. Stores representations in variables.
        
        Parameters:
        -----------
        X: numpy.array
            Matrix of samples.
        y: numpy.array
            Vector of classes for each corresponding sample in matrix 'X'.
        """
        self._set_prior_for_classes(y)
        self._set_distribution_parameters_for_classes(X, y)
        
    def predict(self, X):
        """Applies '_predict_for_sample' function for each row in matrix 'X' and returns a 
        numpy.array of predictions.
        
        Parameters:
        -----------
        X: numpy.array
            Matrix of samples.
        """
        return np.apply_along_axis(self._predict_for_sample, 1, X)

### Usage

In [4]:
X_train, X_test, y_train, y_test = load_data()
print("X_train matrix size: {}".format(X_train.shape))
print("y_train vector size: {}".format(y_train.shape))
print("X_test matrix size: {}".format(X_test.shape))
print("y_test vector size: {}".format(y_test.shape))
print("\n----\n")

# Creating Gausian Naive Bayes model
nb = GaussianNaiveBayes()

# Training model
nb.fit(X_train, y_train)

# Making prediction
pred = nb.predict(X_test)
print("Prediction vector: {}".format(pred))
print("  Expected values: {}".format(y_test))
print("\n----\n")

# Evaluation
accuracy = accuracy_score(pred, y_test)
print("Prediction accuracy: {}%".format(accuracy * 100.0))

X_train matrix size: (120, 4)
y_train vector size: (120,)
X_test matrix size: (30, 4)
y_test vector size: (30,)

----

Prediction vector: [0 2 1 1 0 1 0 0 2 1 2 2 2 1 0 0 0 1 1 2 0 2 1 2 2 2 1 0 2 0]
  Expected values: [0 2 1 1 0 1 0 0 2 1 2 2 2 1 0 0 0 1 1 2 0 2 1 2 2 1 1 0 2 0]

----

Prediction accuracy: 96.66666666666667%


### Comparison to scikit-learn implementation

In [5]:
from sklearn.naive_bayes import GaussianNB

X_train, X_test, y_train, y_test = load_data()
print("X_train matrix size: {}".format(X_train.shape))
print("y_train vector size: {}".format(y_train.shape))
print("X_test matrix size: {}".format(X_test.shape))
print("y_test vector size: {}".format(y_test.shape))
print("\n----\n")

# Creating Gausian Naive Bayes model
scikit_nb = GaussianNB()

# Training model
scikit_nb.fit(X_train, y_train)

# Making prediction
pred = scikit_nb.predict(X_test)
print("Prediction vector: {}".format(pred))
print("  Expected values: {}".format(y_test))
print("\n----\n")

# Evaluation
accuracy = accuracy_score(pred, y_test)
print("Prediction accuracy: {}%".format(accuracy * 100.0))

X_train matrix size: (120, 4)
y_train vector size: (120,)
X_test matrix size: (30, 4)
y_test vector size: (30,)

----

Prediction vector: [0 2 1 1 0 1 0 0 2 1 2 2 2 1 0 0 0 1 1 2 0 2 1 2 2 2 1 0 2 0]
  Expected values: [0 2 1 1 0 1 0 0 2 1 2 2 2 1 0 0 0 1 1 2 0 2 1 2 2 1 1 0 2 0]

----

Prediction accuracy: 96.66666666666667%
