# Klasifikasi menggunakan Naive Bayes

## Definisi Naive Bayes

Naive Bayes merupakan metode yang digunakan untuk pengklasifikasian dengan tingkat keakuratan yang baik. Dalam algoritma ini, tidak memerlukan adnaya pemodelan maupun uji statistik dalam pengklasifikasian.


Naive Bayes memiliki pendekatan berdasarkan probabilitas sederhana yang dirancang agar dapat dipergunakan dengan asumsi antar variabel saling bebas atau independen. Keuntungan menggunakan metode ini yakni memiliki tingkat nilai eror yang didapat lebih rendah dalam suatu dataset yang besar, kecepatan akurasi lebih tinggi ketika diaplikasikan pada dataset yang jumlahnya besar.

## Tipe -Tipe Algortima Naive Bayes

- Bernauli Naive Bayes

  Dimana prediktornya adalah variabel boolean. Oleh karena itu satu-satunya yang ada hanya benar atau salah. Biasanya algoritma ini digunakan ketika data sesuai dengan distribusi bernoulli multivariat.

- Naive Bayes Multinominal

  Dimana algoritma ini sering digunakan untuk memecahkan masalah klasifikasi dokumen. Contohnya, jika ingin menentukan apakah suatu dokumen termasuk dalam suatu kategori dan algoritma ini digunakan untuk memilahnya. Naive bayes menggunakan frekuensi kata-kata sekarang sebagai fitur.

- Gaussian Naive Bayes
  Digunakan jika prediktor tidak diskrit namun memiliki nilai kontinu dan prediktor tersebut diasumsikan sebagai sampel dari distribusi gaussian.

## Langkah Langkah Menghitung Naive Bayes
## Langkah-langkah Klasifikasi dengan Naïve Bayes Gaussian

Naïve Bayes Gaussian digunakan ketika fitur yang digunakan dalam klasifikasi berupa **data numerik kontinu** dan diasumsikan mengikuti distribusi normal (Gaussian). Data numerik berarti nilai yang dapat diukur dan memiliki skala kontinu, seperti **tinggi badan, berat badan, suhu, atau usia**.

### 1. Persiapan Data

Misalkan kita memiliki dataset dengan fitur numerik **X** dan ingin mengklasifikasikan ke dalam kelas **Y**:

- **X**: Fitur numerik (misalnya, tinggi badan, berat badan)
- **Y**: Label kelas (misalnya, "Laki-laki" atau "Perempuan")

### 2. Hitung Prior Probability

Prior probability adalah probabilitas awal dari masing-masing kelas sebelum melihat fitur.

$$
P(Y) = \frac{N_Y}{N}
$$

Dimana:

- \(N_Y\) = Jumlah sampel dalam kelas \(Y\)
- \(N\) = Total jumlah sampel

### 3. Hitung Parameter Distribusi Gaussian

Untuk setiap fitur \(X\) dalam kelas \(Y\), hitung **mean (rata-rata)** dan **variance (variansi)**:

#### a. Mean (Rata-rata)

$$
\mu_Y = \frac{1}{N_Y} \sum_{i=1}^{N_Y} X_i
$$

Dimana:

- \(\mu_Y\) = Rata-rata fitur \(X\) dalam kelas \(Y\)

#### b. Variance (Variansi)

$$
\sigma_Y^2 = \frac{1}{N_Y} \sum_{i=1}^{N_Y} (X_i - \mu_Y)^2
$$

Dimana:

- \(\sigma_Y^2\) = Variansi fitur \(X\) dalam kelas \(Y\)

### 4. Hitung Likelihood dengan Distribusi Gaussian

Untuk setiap fitur \(X\), kita hitung probabilitasnya menggunakan fungsi **Gaussian**:

$$
P(X | Y) = \frac{1}{\sqrt{2 \pi \sigma_Y^2}} \exp \left( -\frac{(X - \mu_Y)^2}{2 \sigma_Y^2} \right)
$$

Dimana:

- \(\sigma_Y^2\) = Variansi fitur \(X\) dalam kelas \(Y\)
- \(\mu_Y\) = Mean fitur \(X\) dalam kelas \(Y\)

### 5. Hitung Posterior Probability

Menggunakan **Teorema Bayes**, kita menghitung probabilitas posterior:

$$
P(Y | X) \propto P(X | Y) P(Y)
$$

Karena **P(X)** sama untuk semua kelas, kita cukup membandingkan:

$$
P(Y | X) = P(X_1 | Y) P(X_2 | Y) ... P(X_n | Y) P(Y)
$$

### 6. Prediksi Kelas

Pilih kelas dengan probabilitas **posterior tertinggi**:

$$
Y^* = \arg\max_Y P(Y | X)
$$



## Implementasi Pada Data Iris

### Install library yang digunakan untuk menghubungkan ke Database Cloud Aiven.io

In [39]:
!pip install pymysql
!pip install psycopg2-binary



### Import library yang akan digunakan

In [40]:
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
from tabulate import tabulate

### Koneksikan ke Database PostgreSQL di aiven.io

In [41]:
import psycopg2

# koneksi ke database mysql melalui aiven.io
DB_HOST = "pg-38d2eca9-irisposgresql.g.aivencloud.com"
DB_PORT = "23603"
DB_NAME = "defaultdb"
DB_USER = "avnadmin"
DB_PASS = "AVNS_cy1i2eXC9KoFrhmUO63"

connect_1 = psycopg2.connect(
    host=DB_HOST,
    port=DB_PORT,
    dbname=DB_NAME,
    user=DB_USER,
    password=DB_PASS,
    sslmode="require"
)

data_posgre = connect_1.cursor()

# akses ke data di database

data_posgre.execute("SELECT * FROM irisposgre.data_irisposgresql ORDER BY id ASC LIMIT 10;")
data_db = data_posgre.fetchall()

print("10 Data dalam tabel data_irisposgresql:")
for data in data_db:
    print(data)

10 Data dalam tabel data_irisposgresql:
(1, 'Iris-setosa', 20.1, 30.5)
(2, 'Iris-setosa', 4.9, 3.0)
(3, 'Iris-setosa', 4.7, 3.2)
(4, 'Iris-setosa', 4.6, 3.1)
(5, 'Iris-setosa', 5.0, 3.6)
(6, 'Iris-setosa', 5.4, 3.9)
(7, 'Iris-setosa', 4.6, 3.4)
(8, 'Iris-setosa', 5.0, 3.4)
(9, 'Iris-setosa', 4.4, 2.9)
(10, 'Iris-setosa', 4.9, 3.1)


### Koneksikan ke Database MySQL di aiven.io

In [42]:
import pymysql

# koneksi ke database mysql melalui aiven.io
DB_HOST = "mysql-385e0f60-irismysql.g.aivencloud.com"
DB_PORT = 23719
DB_NAME = "defaultdb"
DB_USER = "avnadmin"
DB_PASS = "AVNS_mhB8_mF1euq5hCL2xSt"

connect_2 = pymysql.connect(
    host=DB_HOST,
    port=DB_PORT,
    database=DB_NAME,
    user=DB_USER,
    password=DB_PASS,
    ssl={'ssl': {}}
)

data_mysql = connect_2.cursor()

# ambil data ke database
data_mysql.execute("SELECT * FROM irismysql.iris_databasesql LIMIT 10;")
data_db_sql = data_mysql.fetchall()


print("10 data dalam tabel iris_databasesql")
for data2 in data_db_sql:
  print(data2)

10 data dalam tabel iris_databasesql
(1, 'Iris-setosa', 86.4, 70.0)
(2, 'Iris-setosa', 1.4, 0.2)
(3, 'Iris-setosa', 1.3, 0.2)
(4, 'Iris-setosa', 1.5, 0.2)
(5, 'Iris-setosa', 1.4, 0.2)
(6, 'Iris-setosa', 1.7, 0.4)
(7, 'Iris-setosa', 1.4, 0.3)
(8, 'Iris-setosa', 1.5, 0.2)
(9, 'Iris-setosa', 1.4, 0.2)
(10, 'Iris-setosa', 1.5, 0.1)


### Menggabungkan Data dari Kedua Database

In [43]:
# koneksi ke PosgreSQL
posgre_url = create_engine("postgresql+psycopg2://avnadmin:AVNS_cy1i2eXC9KoFrhmUO63@pg-38d2eca9-irisposgresql.g.aivencloud.com:23603/defaultdb")
mysql_url = create_engine("mysql+pymysql://avnadmin:AVNS_mhB8_mF1euq5hCL2xSt@mysql-385e0f60-irismysql.g.aivencloud.com:23719/defaultdb")
# query ke MySQL
mysql_query = "SELECT id, `petal length`, `petal width` FROM irismysql.iris_databasesql ORDER BY id ASC;"
result_mysql = pd.read_sql(mysql_query, mysql_url)

#query ke PosgreSQL
posgre_query = "SELECT * FROM irisposgre.data_irisposgresql ORDER BY id ASC;"
result_posgre = pd.read_sql(posgre_query, posgre_url)

merged_db = pd.merge(result_posgre, result_mysql, on="id", how="left")
print(tabulate(merged_db, headers='keys', tablefmt='grid'))

+-----+------+-----------------+----------------+---------------+----------------+---------------+
|     |   id | Class           |   sepal length |   sepal width |   petal length |   petal width |
|   0 |    1 | Iris-setosa     |           20.1 |          30.5 |           86.4 |          70   |
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|   1 |    2 | Iris-setosa     |            4.9 |           3   |            1.4 |           0.2 |
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|   2 |    3 | Iris-setosa     |            4.7 |           3.2 |            1.3 |           0.2 |
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|   3 |    4 | Iris-setosa     |            4.6 |           3.1 |            1.5 |           0.2 |
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|   4 |   

### Class Naive Bayes

Class Naive Bayes ini berisi code untuk menghitung probabilitas setiap lagkah-langkahnya menggunakan metode naive bayes.

In [44]:
class NaiveBayes:
    def __init__(self, dataset):
        self.dataset = dataset
        self.separated = self.separate_by_class(dataset)
        self.summaries = self.mean_std_by_class(self.separated)
        self.priors = self.prior_probabilities(dataset)

    def separate_by_class(self, dataset):
        separated = {}
        for row in dataset:
            class_value = row[0]
            if class_value not in separated:
                separated[class_value] = []
            separated[class_value].append(row[1:])
        return {key: np.array(value) for key, value in separated.items()}

    def mean_std_by_class(self, separated):
        summaries = {}
        for class_value, instances in separated.items():
            summaries[class_value] = [(np.mean(col), np.std(col) if np.std(col) > 0 else 1e-6) for col in instances.T]
        return summaries

    def prior_probabilities(self, dataset):
        total_count = len(dataset)
        class_counts = {label: sum(dataset[:, 0] == label) for label in np.unique(dataset[:, 0])}
        return {label: class_counts[label] / total_count for label in class_counts}

    def gaussian_probability(self, x, mean, std):
        exponent = np.exp(-((x - mean) ** 2 / (2 * std ** 2)))
        return (1 / (np.sqrt(2 * np.pi) * std)) * exponent

    def calculate_likelihoods(self, input_features):
        likelihoods = {}
        for class_value, class_summaries in self.summaries.items():
            likelihoods[class_value] = []
            for i in range(len(class_summaries)):
                mean, std = class_summaries[i]
                prob = self.gaussian_probability(input_features[i], mean, std)
                likelihoods[class_value].append(prob)
        return likelihoods

    def calculate_posterior(self, input_features, likelihoods):
        probabilities = {}
        for class_value in self.summaries.keys():
            probabilities[class_value] = self.priors[class_value]
            for prob in likelihoods[class_value]:
                probabilities[class_value] *= prob
        return probabilities

    def predict(self, input_features):
        likelihoods = self.calculate_likelihoods(input_features)
        probabilities = self.calculate_posterior(input_features, likelihoods)
        return max(probabilities, key=probabilities.get)

    def get_predictions(self):
        return [self.predict(row[1:]) for row in self.dataset]

    def accuracy(self, predictions):
        correct = sum(1 for i in range(len(self.dataset)) if self.dataset[i][0] == predictions[i])
        return correct / len(self.dataset) * 100

### Implementasi pada Data yang Masih Memiliki Outlier

#### Buat objek menggunakan class Naive Bayes

In [45]:
var_merged = merged_db
feature_column = ['petal length', 'petal width', 'sepal length', 'sepal width']
X = var_merged[feature_column].to_numpy()
y = var_merged['Class'].to_numpy()
dataset = np.column_stack((y, X))

nb_model = NaiveBayes(dataset)
predictions = nb_model.get_predictions()
acc = nb_model.accuracy(predictions)

#### Probabilitas Prior

In [46]:
print("Prior Probabilities:", nb_model.priors)

Prior Probabilities: {'Iris-setosa': np.float64(0.3333333333333333), 'Iris-versicolor': np.float64(0.3333333333333333), 'Iris-virginica': np.float64(0.3333333333333333)}


#### Menghitung Nilai Mean dan Variansi dari Masing-Masing Class

In [47]:
for class_value, class_summaries in nb_model.summaries.items():
    print(f"Class {class_value}:")
    for i, (mean, std) in enumerate(class_summaries):
        print(f"  Feature {i}: Mean = {mean:.2f}, Variance = {std**2:.2f}")

Class Iris-setosa:
  Feature 0: Mean = 3.16, Variance = 141.42
  Feature 1: Mean = 1.64, Variance = 95.38
  Feature 2: Mean = 5.31, Variance = 4.59
  Feature 3: Mean = 3.96, Variance = 14.52
Class Iris-versicolor:
  Feature 0: Mean = 4.26, Variance = 0.22
  Feature 1: Mean = 1.33, Variance = 0.04
  Feature 2: Mean = 5.94, Variance = 0.26
  Feature 3: Mean = 2.77, Variance = 0.10
Class Iris-virginica:
  Feature 0: Mean = 5.55, Variance = 0.30
  Feature 1: Mean = 2.03, Variance = 0.07
  Feature 2: Mean = 6.59, Variance = 0.40
  Feature 3: Mean = 2.97, Variance = 0.10


#### Menghitung Likelihood Setiap Data pada Fitur dan Kelompokkan Berdasarkan Class

In [48]:
for i, row in enumerate(dataset):
    likelihoods = nb_model.calculate_likelihoods(row[1:])
    print(f"\nData {i+1}: class = {row[0]}")
    print(f"data column = {row[1:]}")
    print(f"Likelihoods: {likelihoods}")


Data 1: class = Iris-setosa
data column = [86.4 70.0 20.1 30.5]
Likelihoods: {'Iris-setosa': [np.float64(7.720589003307663e-13), np.float64(9.380324097056354e-13), np.float64(8.162727289983452e-12), np.float64(3.047107311330826e-12)], 'Iris-versicolor': [np.float64(0.0), np.float64(0.0), np.float64(1.1161488296226954e-167), np.float64(0.0)], 'Iris-virginica': [np.float64(0.0), np.float64(0.0), np.float64(5.6464261663330624e-101), np.float64(0.0)]}

Data 2: class = Iris-setosa
data column = [1.4 0.2 4.9 3.0]
Likelihoods: {'Iris-setosa': [np.float64(0.03317980663249961), np.float64(0.04040728274396853), np.float64(0.1829318350152797), np.float64(0.10144070532773083)], 'Iris-versicolor': [np.float64(5.314187023905371e-09), np.float64(1.3343656926498227e-07), np.float64(0.09997611963032066), np.float64(0.9763582874510215)], 'Iris-virginica': [np.float64(2.1013058551503751e-13), np.float64(2.3565950615316754e-10), np.float64(0.017397592617399125), np.float64(1.2454652951539944)]}

Data 3: 

### Menghitung Posterior Masing-Masing Data pada Setiap Class

In [49]:
for i, row in enumerate(dataset):
    posteriors = nb_model.calculate_posterior(row[1:], likelihoods)
    print(f"\nData {i+1}: class = {row[0]}")
    print(f"Posterior Probabilities: {posteriors}")


Data 1: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(8.194106213492233e-06), 'Iris-versicolor': np.float64(0.0046269901681235594), 'Iris-virginica': np.float64(0.07799046150159311)}

Data 2: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(8.194106213492233e-06), 'Iris-versicolor': np.float64(0.0046269901681235594), 'Iris-virginica': np.float64(0.07799046150159311)}

Data 3: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(8.194106213492233e-06), 'Iris-versicolor': np.float64(0.0046269901681235594), 'Iris-virginica': np.float64(0.07799046150159311)}

Data 4: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(8.194106213492233e-06), 'Iris-versicolor': np.float64(0.0046269901681235594), 'Iris-virginica': np.float64(0.07799046150159311)}

Data 5: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(8.194106213492233e-06), 'Iris-versicolor': np.float64(0.0046269901681235594), '

#### Prediksi dari Hasil Perhitungan Setiap Data pada Class

In [50]:
for i, row in enumerate(dataset):
    likelihoods = nb_model.calculate_likelihoods(row[1:])
    print(f"\nData {i+1}: class = {row[0]}")
    print(f"data column = {row[1:]}")
    print(f"Predicted = {predictions[i]}")


Data 1: class = Iris-setosa
data column = [86.4 70.0 20.1 30.5]
Predicted = Iris-setosa

Data 2: class = Iris-setosa
data column = [1.4 0.2 4.9 3.0]
Predicted = Iris-setosa

Data 3: class = Iris-setosa
data column = [1.3 0.2 4.7 3.2]
Predicted = Iris-setosa

Data 4: class = Iris-setosa
data column = [1.5 0.2 4.6 3.1]
Predicted = Iris-setosa

Data 5: class = Iris-setosa
data column = [1.4 0.2 5.0 3.6]
Predicted = Iris-setosa

Data 6: class = Iris-setosa
data column = [1.7 0.4 5.4 3.9]
Predicted = Iris-setosa

Data 7: class = Iris-setosa
data column = [1.4 0.3 4.6 3.4]
Predicted = Iris-setosa

Data 8: class = Iris-setosa
data column = [1.5 0.2 5.0 3.4]
Predicted = Iris-setosa

Data 9: class = Iris-setosa
data column = [1.4 0.2 4.4 2.9]
Predicted = Iris-setosa

Data 10: class = Iris-setosa
data column = [1.5 0.1 4.9 3.1]
Predicted = Iris-setosa

Data 11: class = Iris-setosa
data column = [1.5 0.2 5.4 3.7]
Predicted = Iris-setosa

Data 12: class = Iris-setosa
data column = [1.6 0.2 4.8 3.

#### Nilai Akurasi Setiap Data pada Class

In [51]:
accuracy = acc
print(f"nilai akurasi = {accuracy:.2f}%")

nilai akurasi = 96.00%


In [52]:
print("="*1000)



### Implementasi pada Data Cleaning dari Outlier

#### Data Cleaning

Menghilangkan data outlier menggunakan metode IQR (Interquartil Range)

In [53]:
# data yang telah digabung dari dua database yang berbeda
def remove_outliers(df, feature_column):
    X = df[feature_column].to_numpy()
    Q1 = np.percentile(X, 25, axis=0)
    Q3 = np.percentile(X, 75, axis=0)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    mask = np.all((X >= lower_bound) & (X <= upper_bound), axis=1)
    return df[mask]

data_merged = merged_db
feature_column = ['petal length', 'petal width', 'sepal length', 'sepal width']
clean_var_merged = remove_outliers(var_merged, feature_column)

print(f"sum data after cleaning = {clean_var_merged.shape}")
print("="*100)
print(tabulate(clean_var_merged, headers='keys', tablefmt='grid'))

sum data after cleaning = (145, 6)
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|     |   id | Class           |   sepal length |   sepal width |   petal length |   petal width |
|   1 |    2 | Iris-setosa     |            4.9 |           3   |            1.4 |           0.2 |
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|   2 |    3 | Iris-setosa     |            4.7 |           3.2 |            1.3 |           0.2 |
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|   3 |    4 | Iris-setosa     |            4.6 |           3.1 |            1.5 |           0.2 |
+-----+------+-----------------+----------------+---------------+----------------+---------------+
|   4 |    5 | Iris-setosa     |            5   |           3.6 |            1.4 |           0.2 |
+-----+------+-----------------+----------------+---------------+---------

#### Buat objek menggunakan class Naive Bayes

In [54]:
var_merged = clean_var_merged
feature_column = ['petal length', 'petal width', 'sepal length', 'sepal width']
X = var_merged[feature_column].to_numpy()
y = var_merged['Class'].to_numpy()
dataset = np.column_stack((y, X))

nb_model_2 = NaiveBayes(dataset)
predictions_2 = nb_model_2.get_predictions()
acc_2 = nb_model_2.accuracy(predictions_2)

#### Probabilitas Prior

In [55]:
print("Prior Probabilities:", nb_model_2.priors)

Prior Probabilities: {'Iris-setosa': np.float64(0.31724137931034485), 'Iris-versicolor': np.float64(0.33793103448275863), 'Iris-virginica': np.float64(0.3448275862068966)}


#### Menghitung Nilai Mean dan Variansi dari Masing-Masing Class

In [56]:
for class_value, class_summaries in nb_model_2.summaries.items():
    print(f"Class {class_value}:")
    for i, (mean, std) in enumerate(class_summaries):
        print(f"  Feature {i}: Mean = {mean:.2f}, Variance = {std**2:.2f}")

Class Iris-setosa:
  Feature 0: Mean = 1.47, Variance = 0.03
  Feature 1: Mean = 0.25, Variance = 0.01
  Feature 2: Mean = 4.97, Variance = 0.11
  Feature 3: Mean = 3.36, Variance = 0.11
Class Iris-versicolor:
  Feature 0: Mean = 4.28, Variance = 0.21
  Feature 1: Mean = 1.33, Variance = 0.04
  Feature 2: Mean = 5.96, Variance = 0.25
  Feature 3: Mean = 2.79, Variance = 0.09
Class Iris-virginica:
  Feature 0: Mean = 5.55, Variance = 0.30
  Feature 1: Mean = 2.03, Variance = 0.07
  Feature 2: Mean = 6.59, Variance = 0.40
  Feature 3: Mean = 2.97, Variance = 0.10


#### Menghitung Likelihood Setiap Data pada Fitur dan Kelompokkan Berdasarkan Class

In [57]:
for i, row in enumerate(dataset):
    likelihoods = nb_model_2.calculate_likelihoods(row[1:])
    print(f"\nData {i+1}: class = {row[0]}")
    print(f"data column = {row[1:]}")
    print(f"Likelihoods: {likelihoods}")


Data 1: class = Iris-setosa
data column = [1.4 0.2 4.9 3.0]
Likelihoods: {'Iris-setosa': [np.float64(2.0914800069536006), np.float64(3.437644964213319), np.float64(1.1510124326730646), np.float64(0.6588598793043308)], 'Iris-versicolor': [np.float64(2.1950804311531016e-09), np.float64(5.8403106627855077e-08), np.float64(0.08501833421378906), np.float64(1.0412962945903699)], 'Iris-virginica': [np.float64(2.1013058551503751e-13), np.float64(2.3565950615316754e-10), np.float64(0.017397592617399125), np.float64(1.2454652951539944)]}

Data 2: class = Iris-setosa
data column = [1.3 0.2 4.7 3.2]
Likelihoods: {'Iris-setosa': [np.float64(1.4563540489188522), np.float64(3.437644964213319), np.float64(0.8495584423882837), np.float64(1.0767092167349996)], 'Iris-versicolor': [np.float64(5.406580742988874e-10), np.float64(5.8403106627855077e-08), np.float64(0.03351655882003616), np.float64(0.5018813981829099)], 'Iris-virginica': [np.float64(5.141886439166322e-14), np.float64(2.3565950615316754e-10),

### Menghitung Posterior Masing-Masing Data pada Setiap Class

In [58]:
for i, row in enumerate(dataset):
    posteriors = nb_model_2.calculate_posterior(row[1:], likelihoods)
    print(f"\nData {i+1}: class = {row[0]}")
    print(f"Posterior Probabilities: {posteriors}")


Data 1: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(4.28517587121273e-139), 'Iris-versicolor': np.float64(0.005166652015981789), 'Iris-virginica': np.float64(0.08067978776026877)}

Data 2: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(4.28517587121273e-139), 'Iris-versicolor': np.float64(0.005166652015981789), 'Iris-virginica': np.float64(0.08067978776026877)}

Data 3: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(4.28517587121273e-139), 'Iris-versicolor': np.float64(0.005166652015981789), 'Iris-virginica': np.float64(0.08067978776026877)}

Data 4: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(4.28517587121273e-139), 'Iris-versicolor': np.float64(0.005166652015981789), 'Iris-virginica': np.float64(0.08067978776026877)}

Data 5: class = Iris-setosa
Posterior Probabilities: {'Iris-setosa': np.float64(4.28517587121273e-139), 'Iris-versicolor': np.float64(0.005166652015981789), 'Iris-

#### Prediksi dari Hasil Perhitungan Setiap Data pada Class

In [59]:
for i, row in enumerate(dataset):
    likelihoods = nb_model_2.calculate_likelihoods(row[1:])
    print(f"\nData {i+1}: class = {row[0]}")
    print(f"data column = {row[1:]}")
    print(f"Predicted = {predictions[i]}")


Data 1: class = Iris-setosa
data column = [1.4 0.2 4.9 3.0]
Predicted = Iris-setosa

Data 2: class = Iris-setosa
data column = [1.3 0.2 4.7 3.2]
Predicted = Iris-setosa

Data 3: class = Iris-setosa
data column = [1.5 0.2 4.6 3.1]
Predicted = Iris-setosa

Data 4: class = Iris-setosa
data column = [1.4 0.2 5.0 3.6]
Predicted = Iris-setosa

Data 5: class = Iris-setosa
data column = [1.7 0.4 5.4 3.9]
Predicted = Iris-setosa

Data 6: class = Iris-setosa
data column = [1.4 0.3 4.6 3.4]
Predicted = Iris-setosa

Data 7: class = Iris-setosa
data column = [1.5 0.2 5.0 3.4]
Predicted = Iris-setosa

Data 8: class = Iris-setosa
data column = [1.4 0.2 4.4 2.9]
Predicted = Iris-setosa

Data 9: class = Iris-setosa
data column = [1.5 0.1 4.9 3.1]
Predicted = Iris-setosa

Data 10: class = Iris-setosa
data column = [1.5 0.2 5.4 3.7]
Predicted = Iris-setosa

Data 11: class = Iris-setosa
data column = [1.6 0.2 4.8 3.4]
Predicted = Iris-setosa

Data 12: class = Iris-setosa
data column = [1.4 0.1 4.8 3.0]
P

#### Nilai Akurasi Data pada Class

In [60]:
accuracy = acc_2
print(f"nilai akurasi = {accuracy:.2f}%")

nilai akurasi = 95.86%


### Kesimpulan

Dapat disimpulkan bahwa nilai akurasi data yang memiliki outlier lebih besar yakni 96% dibandingkan dengan data yang sudah di cleaning yakni 95.86%. Hal ini terjadi kemungkinan disebabkan data oulier berperan besar dalam pengklasifikasian maka data yang sudah bersih akan cenderung lebih kurang dalam mengklasifikasi.