# RANCANGAN APLIKASI SISTEM PAKAR DIAGNOSIS PENYAKIT GIGI MENGGUNAKAN METODE K-NEAREST NEIGHBOR DAN CERTAINTY FACTOR
Pada judul skripsi tersebut, terdapat 2 metode yang diuji untuk mendiagnosis penyakit gigi, yaitu:

1. Certainty Factor
2. Certainty Factor dan K-Nearest Neigbor (Gabungan)

Pada Jupyter Notebook ini, yang akan dijabarkan ialah metode yang ke-2. Yaitu <b>metode gabungan Certainty Factor dan K-Nearest Neihgbor.<b>

Diperlukan dataset (rekam medis pasien) untuk melakukan perhitungan metode gabungan ini. Dikarenakan limitasi skripsi ini adalah berupa 6 penyakit gigi dan 28 gejala. Maka dilakukan pengumpulan data berupa 100 dataset pasien yang memiliki keluhan dan penyakut sesuai dengan gejala dan diagnosis yang telah ditentukan.

Dataset berupa nilai CF gejala pasien dan diagnosis penyakitnya. Dengan penjelasan sebagai berikut:

1. 0.0 -> Tidak terjadi
2. 0.25 -> Ragu-ragu
3. 0.5 -> Mungkin
4. 0.75 -> Kemungkinan besar
5. 1.0 -> Yakin

### Import Library and Data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [24]:
# definisiin nama coloumn yg berisikan gejala dan diagnosis.
names = ['GP01','GP02','GP03','GP04','GP05', 'GP06','GP07', 'GP08', 'GP09', 'GP10', 'GP11', 'GP12', 'GP13', 'GP14', 'GP15', 'GP16', 'GP17', 'GP18', 'GP19', 'GP20', 'GP21', 'GP22', 'GP23', 'GP24', 'GP25','GP26','GP27','GP28', 'Diagnosis']

# Baca Dataset ke pandas dataframe
iris = pd.read_excel('rm1.xls', names = names)
iris

Unnamed: 0,GP01,GP02,GP03,GP04,GP05,GP06,GP07,GP08,GP09,GP10,...,GP20,GP21,GP22,GP23,GP24,GP25,GP26,GP27,GP28,Diagnosis
0,0.5,0.75,0.5,0.00,0.0,0.0,0.0,0.75,0.50,0.00,...,0.0,0.00,0.00,0.00,0.0,0.0,0.00,0.0,0.00,Gingivitis
1,0.0,0.00,0.0,0.75,0.0,1.0,0.0,0.50,0.75,0.00,...,0.0,0.00,0.00,0.00,0.0,0.0,0.00,0.0,0.00,Gingivitis
2,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00,0.50,0.75,...,0.0,0.75,0.00,1.00,0.0,0.0,0.00,0.0,0.00,Karies Gigi
3,0.0,0.75,0.0,0.00,0.5,0.0,0.0,0.00,0.00,0.00,...,0.0,0.00,0.00,0.00,0.0,0.0,0.00,1.0,0.00,Periodontitis
4,0.0,0.00,0.0,0.00,0.0,0.0,0.0,1.00,0.25,0.00,...,0.0,0.00,0.00,0.75,0.0,0.0,0.00,0.0,0.00,Karies Gigi
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00,0.00,0.00,...,0.0,0.00,0.25,0.00,0.0,0.0,0.00,0.0,0.00,Pulpitis
96,0.5,1.00,0.0,0.75,0.0,1.0,0.0,0.00,0.00,0.00,...,0.0,0.00,0.00,0.00,0.0,0.0,0.00,1.0,0.00,Periodontitis
97,0.0,0.00,0.0,0.75,1.0,1.0,0.0,0.00,0.00,0.00,...,0.0,0.00,0.00,0.00,0.5,0.0,0.25,0.0,0.00,Abses gigi
98,0.0,0.00,0.0,0.00,0.0,0.0,0.0,0.00,0.75,0.75,...,0.0,0.00,0.00,0.00,0.0,0.0,0.00,0.0,0.00,Karies Gigi


### Data Visualization and Analysis

In [3]:
iris.shape

(100, 29)

In [4]:
iris['Diagnosis'].value_counts()

Gingivitis       17
Karies Gigi      17
Periodontitis    17
Abses gigi       17
Pulpitis         16
Stomatitis       16
Name: Diagnosis, dtype: int64

In [5]:
iris.columns

Index(['GP01', 'GP02', 'GP03', 'GP04', 'GP05', 'GP06', 'GP07', 'GP08', 'GP09',
       'GP10', 'GP11', 'GP12', 'GP13', 'GP14', 'GP15', 'GP16', 'GP17', 'GP18',
       'GP19', 'GP20', 'GP21', 'GP22', 'GP23', 'GP24', 'GP25', 'GP26', 'GP27',
       'GP28', 'Diagnosis'],
      dtype='object')

In [6]:
iris.values

array([[0.5, 0.75, 0.5, ..., 0.0, 0.0, 'Gingivitis'],
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 'Gingivitis'],
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 'Karies Gigi'],
       ...,
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 'Abses gigi'],
       [0.0, 0.0, 0.0, ..., 0.0, 0.0, 'Karies Gigi'],
       [0.0, 1.0, 0.0, ..., 0.0, 0.75, 'Gingivitis']], dtype=object)

In [7]:
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 29 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   GP01       100 non-null    float64
 1   GP02       100 non-null    float64
 2   GP03       100 non-null    float64
 3   GP04       100 non-null    float64
 4   GP05       100 non-null    float64
 5   GP06       100 non-null    float64
 6   GP07       100 non-null    float64
 7   GP08       100 non-null    float64
 8   GP09       100 non-null    float64
 9   GP10       100 non-null    float64
 10  GP11       100 non-null    float64
 11  GP12       100 non-null    float64
 12  GP13       100 non-null    float64
 13  GP14       100 non-null    float64
 14  GP15       100 non-null    float64
 15  GP16       100 non-null    float64
 16  GP17       100 non-null    float64
 17  GP18       100 non-null    float64
 18  GP19       100 non-null    float64
 19  GP20       100 non-null    float64
 20  GP21       

In [8]:
iris.describe(include='all')

Unnamed: 0,GP01,GP02,GP03,GP04,GP05,GP06,GP07,GP08,GP09,GP10,...,GP20,GP21,GP22,GP23,GP24,GP25,GP26,GP27,GP28,Diagnosis
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100
unique,,,,,,,,,,,...,,,,,,,,,,6
top,,,,,,,,,,,...,,,,,,,,,,Gingivitis
freq,,,,,,,,,,,...,,,,,,,,,,17
mean,0.185,0.0875,0.0425,0.3,0.1525,0.155,0.035,0.05,0.095,0.07,...,0.0625,0.0725,0.1475,0.0825,0.105,0.0625,0.08,0.1625,0.03,
std,0.334506,0.244575,0.159129,0.351763,0.311592,0.323296,0.146594,0.177667,0.250706,0.222134,...,0.217234,0.233806,0.309967,0.222063,0.275653,0.211342,0.227192,0.335927,0.12949,
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
75%,0.25,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


In [9]:
#ambil semua nilai gejala, pisahin dari diagnosis 
X=iris.iloc[:,:28] #ambil data 28 kolom
X.head()

Unnamed: 0,GP01,GP02,GP03,GP04,GP05,GP06,GP07,GP08,GP09,GP10,...,GP19,GP20,GP21,GP22,GP23,GP24,GP25,GP26,GP27,GP28
0,0.5,0.75,0.5,0.0,0.0,0.0,0.0,0.75,0.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.75,0.0,1.0,0.0,0.5,0.75,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.75,...,0.0,0.0,0.75,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.75,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.25,0.0,...,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0


In [10]:
y=iris.iloc[:,-1] #ambil data diagnosis, kolom paling akhir (Kolom Diagnosis)
y

0        Gingivitis
1        Gingivitis
2       Karies Gigi
3     Periodontitis
4       Karies Gigi
          ...      
95         Pulpitis
96    Periodontitis
97       Abses gigi
98      Karies Gigi
99       Gingivitis
Name: Diagnosis, Length: 100, dtype: object

### Data Normalization

In [25]:
X = preprocessing.MinMaxScaler().fit_transform(X) #Min Max Normalization
X

array([[0.5 , 0.75, 0.5 , ..., 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
       ...,
       [0.  , 0.  , 0.  , ..., 0.25, 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
       [0.  , 1.  , 0.  , ..., 0.  , 0.  , 1.  ]])

### Train Test Split

In [26]:
#split data training dan data testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state= 1) #0.2 adalah diambil 20% data testing

In [13]:
print(y_test)

80    Periodontitis
84    Periodontitis
33       Abses gigi
81    Periodontitis
93         Pulpitis
17       Abses gigi
36    Periodontitis
82    Periodontitis
69       Stomatitis
65       Stomatitis
92         Pulpitis
39       Stomatitis
56      Karies Gigi
52      Karies Gigi
51      Karies Gigi
32       Abses gigi
31       Stomatitis
44       Stomatitis
78    Periodontitis
10       Gingivitis
Name: Diagnosis, dtype: object


### Training and Predicting

In [14]:
knnmodel=KNeighborsClassifier(n_neighbors=6,metric='euclidean') #n_neighbors adalah jumlah K untuk KNN-nya.

In [15]:
knnmodel.fit(X_train,y_train) 

KNeighborsClassifier(metric='euclidean', n_neighbors=6)

In [16]:
y_predict1=knnmodel.predict(X_test)

### Output Visualization

In [17]:
prediction_output=pd.DataFrame(data=[y_predict1,y_test.values],index=['Predicted Output','Actual Output'])

In [18]:
prediction_output.iloc[0,:].value_counts()

Periodontitis    6
Stomatitis       4
Karies Gigi      4
Pulpitis         3
Abses gigi       2
Gingivitis       1
Name: Predicted Output, dtype: int64

In [19]:
prediction_output.transpose()

Unnamed: 0,Predicted Output,Actual Output
0,Periodontitis,Periodontitis
1,Periodontitis,Periodontitis
2,Abses gigi,Abses gigi
3,Periodontitis,Periodontitis
4,Pulpitis,Pulpitis
5,Pulpitis,Abses gigi
6,Periodontitis,Periodontitis
7,Periodontitis,Periodontitis
8,Stomatitis,Stomatitis
9,Stomatitis,Stomatitis


### Accuracy

In [20]:
from sklearn.metrics import accuracy_score

In [21]:
acc=accuracy_score(y_test,y_predict1)
print(acc)

0.9


### Confusion Matrix

In [22]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
cm=confusion_matrix(y_test.values,y_predict1)
cr=classification_report(y_test.values,y_predict1)
cm1=pd.DataFrame(data=cm,index=['Pulpitis',  'Stomatitis',  'Periodontitis', 'Karies Gigi', 'Abses gigi', 'Gingivitis'],columns=['Pulpitis',  'Stomatitis',  'Periodontitis', 'Karies Gigi', 'Abses gigi', 'Gingivitis'])
cm1

Unnamed: 0,Pulpitis,Stomatitis,Periodontitis,Karies Gigi,Abses gigi,Gingivitis
Pulpitis,2,0,0,0,1,0
Stomatitis,0,1,0,0,0,0
Periodontitis,0,0,3,0,0,0
Karies Gigi,0,0,0,6,0,0
Abses gigi,0,0,0,0,2,0
Gingivitis,0,0,1,0,0,4


In [23]:
print('\n\nclassification report: \n\n',cr)



classification report: 

                precision    recall  f1-score   support

   Abses gigi       1.00      0.67      0.80         3
   Gingivitis       1.00      1.00      1.00         1
  Karies Gigi       0.75      1.00      0.86         3
Periodontitis       1.00      1.00      1.00         6
     Pulpitis       0.67      1.00      0.80         2
   Stomatitis       1.00      0.80      0.89         5

     accuracy                           0.90        20
    macro avg       0.90      0.91      0.89        20
 weighted avg       0.93      0.90      0.90        20

