# Tugas Pattern Recognition - Bayes Decision Rule

--------------------
Nama : Made Raharja Surya Mahadi

NIM : 23520022

---------------------
## Deskripsi Tugas
Diberikan sebuah data histogram sebagai berikut
![image.png](https://github.com/share424/Bayesian-Conditional-Probability/raw/master/histogram.png)

dimana pada histogram diatas terdapat informasi jumlah ikan salmon dan seabass berdasarkan lightnessnya. Dari data tersebut, tentukan apakah ikan yang berada pada interval tertentu adalah ikan salmon atau seabass menggunakan metode `Bayes Decision Rule berbasis Maximum Posterior Probability and Minimum Risk`

----------------
## Konversi Histogram ke Tabel
langkah pertama konversikan histogram diatas menjadi tabel, pada proses konversi disini saya menggunakan interval `0.5` untuk lighnessnya

In [1]:
import pandas as pd
import numpy as np

In [2]:
dataset = pd.read_csv('data.csv')
dataset

Unnamed: 0,lightness,countSalmon,countSeabass
0,0-0.5,0,0
1,0.5-1,2,0
2,1-1.5,8,0
3,1.5-2,4,0
4,2-2.5,7,0
5,2.5-3,10,0
6,3-3.5,12,0
7,3.5-4,8,1
8,4-4.5,6,0
9,4.5-5,8,3


----------------
## Preproses
langkah pertama adalah ambil data jumlah salmon dan seabass untuk tiap interval lightnessnya, lalu lakukan `add one smoothing` (laplace smoothing) untuk menghindari pembagian dengan `0`

In [3]:
count_salmon = np.array(dataset.countSalmon) + 1
count_seabass = np.array(dataset.countSeabass) + 1

---
## Bayes Decision Rule berbasis Maximum Posterior Probability

### Hitung Probability tiap Kelas
pada proses ini, hitung probability untuk kelas salmon dan seabass dengan persamaan 

$\large P(\omega_{i}) = \frac{N_{i}}{\sum N_{j}} $

In [4]:
# hitung probability salmon dan seabass
p_salmon = count_salmon.sum() / (count_salmon.sum() + count_seabass.sum())
p_seabass = count_seabass.sum() / (count_salmon.sum() + count_seabass.sum())

print('P(salmon):', p_salmon)
print('P(seabass):', p_seabass)

P(salmon): 0.5497076023391813
P(seabass): 0.4502923976608187


----
### Hitung Posterior Conditional Probability
selanjutnya hitung posterior conditional probability untuk kelas samlmon dan seabass dengan persamaan

$\large P(\omega_{i}|x) = P(x|\omega_{i})P(\omega_{i}) $

yang dimana

$\large P(x|\omega_{i}) = \frac{count(x, \omega_{i})}{N_{i}} $

In [5]:
# hitung conditional probability density
cond_salmon = count_salmon / count_salmon.sum()
cond_seabass = count_seabass / count_seabass.sum()

# hitung posterior conditional probability
posterior_salmon = cond_salmon * p_salmon
posterior_seabass = cond_seabass * p_seabass

print('Posterior Probability Salmon:')
print(posterior_salmon)
print('Posterior Probability Seabass:')
print(posterior_seabass)

Posterior Probability Salmon:
[0.00584795 0.01754386 0.05263158 0.02923977 0.04678363 0.06432749
 0.07602339 0.05263158 0.04093567 0.05263158 0.02923977 0.02339181
 0.01169591 0.01169591 0.00584795 0.00584795 0.00584795 0.00584795
 0.00584795 0.00584795]
Posterior Probability Seabass:
[0.00584795 0.00584795 0.00584795 0.00584795 0.00584795 0.00584795
 0.00584795 0.01169591 0.00584795 0.02339181 0.02339181 0.02923977
 0.04093567 0.03508772 0.07602339 0.05847953 0.05847953 0.01754386
 0.01169591 0.01754386]


---
### Klasifikasi
selanjutnya lakukan klasifikasi dengan aturan

$
\begin{equation*}
    f(P(salmon|x), P(seabass|x)) = \begin{cases}
               salmon               & P(salmon|x) > P(seabass|x)\\
               seabass               & P(salmon|x) \leq P(seabass|x)\\
           \end{cases}
\end{equation*}
$

In [6]:
labels = ['salmon' if psalmon > pseabass else 'seabass' for psalmon, pseabass in zip(posterior_salmon, posterior_seabass)]

In [7]:
output = pd.DataFrame({
    "lightness": dataset.lightness,
    "countSalmon": dataset.countSalmon,
    "countSeabass": dataset.countSeabass,
    "conditionalSalmon": cond_salmon,
    "conditionalSeabass": cond_seabass,
    "posteriorProbSalmon": posterior_salmon,
    "posteriorProbSeabass": posterior_seabass,
    "label": labels
})
output

Unnamed: 0,lightness,countSalmon,countSeabass,conditionalSalmon,conditionalSeabass,posteriorProbSalmon,posteriorProbSeabass,label
0,0-0.5,0,0,0.010638,0.012987,0.005848,0.005848,seabass
1,0.5-1,2,0,0.031915,0.012987,0.017544,0.005848,salmon
2,1-1.5,8,0,0.095745,0.012987,0.052632,0.005848,salmon
3,1.5-2,4,0,0.053191,0.012987,0.02924,0.005848,salmon
4,2-2.5,7,0,0.085106,0.012987,0.046784,0.005848,salmon
5,2.5-3,10,0,0.117021,0.012987,0.064327,0.005848,salmon
6,3-3.5,12,0,0.138298,0.012987,0.076023,0.005848,salmon
7,3.5-4,8,1,0.095745,0.025974,0.052632,0.011696,salmon
8,4-4.5,6,0,0.074468,0.012987,0.040936,0.005848,salmon
9,4.5-5,8,3,0.095745,0.051948,0.052632,0.023392,salmon


---
## Bayes Decision Rule berbasis Minimum Risk

---
### Loss Table
Selanjutnya mendefinisikan loss table, terdapat 2 action yaitu $\alpha_{1}$ artinya pilih `salmon` dan $\alpha_{2}$ pilih `seabass`. Untuk risk tablenya akan didefinisikan pada tabel dibawah

|   loss  | salmon   | seabass |
|-------- |----------| ------- | 
| salmon  |    0     |    1    |
| seabass |    1     |    0    |

pada tabel diatas, jika sama-sama kategori salmon atau seabass maka lossnya adalah `0`, sebaliknya lossnya adalah `1`

In [8]:
loss_table = np.array([[0, 1], [1, 0]])

---
### Menghitung Threshold
selanjutnya kita akan menghitung threshold yang nantinya akan digunakan untuk menentukan kelas dari dataset. Threshold dapat dihitung menggunakan persamaan

$\large threshold = \frac{(\lambda_{12} - \lambda_{22})P(seabass)}{(\lambda_{21} - \lambda_{11})P(salmon)}$

In [9]:
# hitung threshold
threshold = ((loss_table[0, 1] - loss_table[1, 1]) * p_seabass) / ((loss_table[1, 0] - loss_table[0, 0]) * p_salmon)

print('Threshold:', threshold)

Threshold: 0.8191489361702128


---
### Menghitung Likelihood Ratio
likelihood ratio adalah nilai yang nantinya akan dibandingkan dengan threshold. Likelihood ratio dapat dihitung dengan persamaan

$\large LR_{i} = \frac{P(x_{i}|salmon)}{P(x_{i}|seabass)}$

In [10]:
# hitung likeliehood ratio
lr = [psalmon / pseabass for psalmon, pseabass in zip(posterior_salmon, posterior_seabass)]

print('LR')
print(lr)

LR
[1.0, 3.0, 9.0, 5.0, 8.0, 11.0, 13.0, 4.5, 7.0, 2.25, 1.25, 0.8, 0.2857142857142857, 0.3333333333333333, 0.07692307692307693, 0.1, 0.1, 0.3333333333333333, 0.5, 0.3333333333333333]


---
### Klasifikasi
terakhir lakukan proses klasifikasi dengan rule jika $LR_{i}$ > `threshold`, maka kelas untuk $x_{i}$ adalah `salmon`, sebaliknya adalah `seabass`

In [11]:
# klasifikasikan
labels = ["salmon" if a > threshold else "seabass" for a in lr]

In [12]:
output = pd.DataFrame({
    "lightness": dataset.lightness,
    "countSalmon": dataset.countSalmon,
    "countSeabass": dataset.countSeabass,
    "posteriorProbSalmon": posterior_salmon,
    "posteriorProbSeabass": posterior_seabass,
    "likelihoodRatio": lr,
    "threshold": [threshold for i in range(len(lr))],
    "label": labels
})

In [13]:
output

Unnamed: 0,lightness,countSalmon,countSeabass,posteriorProbSalmon,posteriorProbSeabass,likelihoodRatio,threshold,label
0,0-0.5,0,0,0.005848,0.005848,1.0,0.819149,salmon
1,0.5-1,2,0,0.017544,0.005848,3.0,0.819149,salmon
2,1-1.5,8,0,0.052632,0.005848,9.0,0.819149,salmon
3,1.5-2,4,0,0.02924,0.005848,5.0,0.819149,salmon
4,2-2.5,7,0,0.046784,0.005848,8.0,0.819149,salmon
5,2.5-3,10,0,0.064327,0.005848,11.0,0.819149,salmon
6,3-3.5,12,0,0.076023,0.005848,13.0,0.819149,salmon
7,3.5-4,8,1,0.052632,0.011696,4.5,0.819149,salmon
8,4-4.5,6,0,0.040936,0.005848,7.0,0.819149,salmon
9,4.5-5,8,3,0.052632,0.023392,2.25,0.819149,salmon
