# Diagnese Model Testing

## A. Summary & How to Use
This document is to show the testing predict results of the h5 model using "Testing.csv".

## B. How to use H5 model?
### i. Data Processing
1. Load **"Testing.csv"** dataset using 'pandas'.
2. Separate the **feature** column (x) and the **target/label** column (y). Feature column is the **symptoms** starting from column 1-131, while Label is in the 132 column named **prognosis** on **Testing.csv**
3. Convert label into numeric using `LabelEncoder` from **Scikit-Learn**
4. Load **"Deskripsi_dan_Dokter.csv"** using `csv`.

### ii. Load hdf5 Model
Load `h5` model from **Model h5** folder.

### iii. Predict Model
Define loop get 41 row of dataset as input, then predict into the model `h5` loaded. 
 

## C. Data Processing
This data processing uses **"Testing.csv"** as input for prediction and **"Description_and_Doctor.csv"** as a condition that retrieves the model prediction results.
### i. Import Libraries
We use import to load libraries. The libraries we needed are:

- TensorFlow
- Pandas
- Numpy
- Csv

In [18]:
import tensorflow as tf
import pandas as pd
import numpy as np
import csv

from sklearn.preprocessing import LabelEncoder

from keras.models import load_model

### ii. Load Dataset
Load datasets **"Testing.csv"** and using `Pandas` from Dataset folder **"Deskripsi_and_Dokter"** using `csv`.

In [25]:
test_df = pd.read_csv('Dataset/Testing.csv', header=0, delimiter=';')

### iii. Seperate Feature and Label
Separate feature and label from dataset using `drop` method. The target label of dataset is **prognosis** column.

In [6]:
X_test = test_df.drop('prognosis', axis=1)  # Features
y_test = test_df['prognosis']  # Target variable

### iv. Label Encoder
Convert label y into numeric using `LabelEncoder` from **Scikit-Learn**.

Label y (prognosis) at **"Testing.csv** has a string datatype then needed convert to numeric before before entering the model prediction.

Then use `classes` method for show list of encoded.

In [8]:
# Make object LabelEncoder
label_encoder = LabelEncoder()

# Convert to Numeric
y_test_encoded = label_encoder.fit_transform(y_test)

In [9]:
# Show list of encoded
list(label_encoder.classes_)

['AIDS',
 'Alergi',
 'Artritis',
 'Asma Bronkial',
 'Cacar air',
 'Demam berdarah',
 'Diabetes',
 'GERD',
 'Gastroenteritis',
 'Hemoroid dimorfik (ambeien)',
 'Hepatitis A',
 'Hepatitis Alkoholik',
 'Hepatitis B',
 'Hepatitis C',
 'Hepatitis D',
 'Hepatitis E',
 'Hipertensi',
 'Hipertiroidisme',
 'Hipoglikemia',
 'Hipotiroidisme',
 'Impetigo',
 'Infeksi jamur',
 'Infeksi saluran kemih',
 'Jerawat',
 'Kolestasis kronis',
 'Kuning (penyakit kuning)',
 'Malaria',
 'Migraine',
 'Osteoartritis',
 'Paralisis (pendarahan otak)',
 'Penyakit ulkus peptikum',
 'Pilek biasa',
 'Pneumonia',
 'Psoriasis',
 'Reaksi obat',
 'Serangan jantung',
 'Spondilosis Serviks',
 'Tuberculosis',
 'Typus',
 'Varises',
 'Vertigo Posisional Paroksismal']

### v. Make summary for each feature with value 1 
This step is to gain any feature that has label 1 for each label y.

In [13]:
# Summary value 1 at testing data 
for index, row in X_test.iterrows():
    class_with_1 = ', '.join(row[row == 1].index)
    prognosis = y_test[index]
    print(f"{index + 1}.{prognosis} : \n    Classes with value 1 - {class_with_1}")

1.Vertigo Posisional Paroksismal : 
    Classes with value 1 - muntah, sakit_kepala, mual, kaku_saat_ingin_bergerak, sensasi_berputar, kehilangan_keseimbangan
2.Jerawat : 
    Classes with value 1 - ruam_kulit, jerawat_bernanah, komedo, menggaruk
3.AIDS : 
    Classes with value 1 - otot_mengecil, bercak_di_tenggorokan, demam_tinggi, rasa_lapar_berlebihan
4.Hepatitis Alkoholik : 
    Classes with value 1 - muntah, kulit_kekuningan, nyeri_perut, kelebihan_cairan, pendarahan_perut, perut_kembung, riwayat_konsumsi_alkohol
5.Alergi : 
    Classes with value 1 - bersin-bersin, menggigil, merinding, perubahan_warna_kulit_di_area_tertentu
6.Artritis : 
    Classes with value 1 - nyeri_sendi_panggul, otot_melemah, leher_kaku, pembengkakan_sendi, nyeri_saat_berjalan
7.Asma Bronkial : 
    Classes with value 1 - kelelahan, batuk, demam_tinggi, sesak_napas, air_kencing_berlebih, penyakit_keturunan
8.Spondilosis Serviks : 
    Classes with value 1 - sakit_punggung, nyeri_dada, iritasi_di_anus, nye

## D. Load HDF5 Model
The `h5` model that is loaded is the best model generated by **diagnese_model.ipynb** which is then saved into the `hdf5`/`h5` format which can be used for re-training or just load to make some predictions with the model.

In [10]:
model = load_model('Model h5/model.h5')

## E. Prediction with HDF5 Model
This step is the main topic of this document. Where the `h5` model that has been loaded will be used to predict the **"Testing.csv "** data that has been processed previously to make predictions.

### i. Making prediction 
Make prediction with each label at **Testing.csv** dataset using `loop` and `iterrows` method, find index with highest probability as a **prediction**, convert prediction from probabilitas into label with `LabelEncoder`, and last `print` result of `prediction` and compare to `actual` label to find out how many predictions are correct and how many predictions are incorrect.

In this scope, with 41 prediction data **Testing.csv**, the model only predicted 2 classes **incorrectly** and was able to predict 39 classes **correctly**.

In [15]:
# Melakukan prediksi menggunakan model
predictions_test_prob = model.predict(X_test)
for index, row in X_test.iterrows():
    input_data = row # Retrieving input data from each row
    prediction_prob = predictions_test_prob[index] # Retrieving prediction probabilities for the input data
    max_prob_index = np.argmax(prediction_prob) # Finding the index of the highest probability
    max_prob_label = label_encoder.classes_[max_prob_index] # Finding the label with the highest probability
    max_prob = prediction_prob[max_prob_index] # Retrieving the highest probability
    actual_label = y_test[index] # Retrieving the actual label from y_test
    print(f"Prediction: {max_prob_label} ({max_prob}), Actual Label: {actual_label}\n")
    #print(f"Input: {input_data}, Prediction: {max_prob_label} ({max_prob}), Actual Label: {actual_label}")
    #print(f"Input: {input_data}, Prediction: {max_prob_label}, Actual Label: {actual_label}")

Prediction: Vertigo Posisional Paroksismal (0.8745632767677307), Actual Label: Vertigo Posisional Paroksismal

Prediction: Jerawat (0.934771716594696), Actual Label: Jerawat

Prediction: AIDS (0.8159803152084351), Actual Label: AIDS

Prediction: Hepatitis Alkoholik (0.9026685357093811), Actual Label: Hepatitis Alkoholik

Prediction: Alergi (0.7802639603614807), Actual Label: Alergi

Prediction: Artritis (0.7720317244529724), Actual Label: Artritis

Prediction: Asma Bronkial (0.8591945171356201), Actual Label: Asma Bronkial

Prediction: Spondilosis Serviks (0.14554649591445923), Actual Label: Spondilosis Serviks

Prediction: Cacar air (0.9057180285453796), Actual Label: Cacar air

Prediction: Kolestasis kronis (0.8439196944236755), Actual Label: Kolestasis kronis

Prediction: Pilek biasa (0.9612761735916138), Actual Label: Pilek biasa

Prediction: Demam berdarah (0.9285697340965271), Actual Label: Demam berdarah

Prediction: Diabetes (0.6149367690086365), Actual Label: Diabetes

Predict

### ii. Get Prediction with Description and Specialist
Here the model will predict the index of the desired `X_test` as input, then take the conditioning based on the prediction output to retrieve the descriptions and specialty doctors in the **"Deskripsi_dan_Dokter.csv "** dataset.
1. Make a Function to Returns the description and specialist of the doctor based on the disease prediction.
2. To retrieve input data from **"Testing.csv "** with the desired index, you can use the `iloc` method then input what index is used for prediction.
3. Find index with highest probability as a prediction, convert prediction from probabilitas into label with `LabelEncoder`, and last `print` result of `prognosis` and compare `deskripsi` and `dokter_spesialist` appropriate.

In this scope, the function can provide the correct `deskripsi` and `dokter_spesialis` based on the `prognosis`.

In [26]:
def deskripsi_dan_dokter(prediksi):
    """
    Returns the description and specialist of the doctor based on the disease prediction.

    Parameters:
    - prediction (str): Disease prediction based on the model.

    Returns:
    - deskripsi (str): Description of the disease that matches the prediction.
    - dokter_spesialis (str): The specialist of the doctor that matches the prediction.

    If the disease prediction is found in the file 'Description_and_Doctor.csv', the
    function will return the description and specialist of the corresponding doctor.
    If the prediction is not found, then the function will return the message "Disease not recognized"
    and "No corresponding specialist".
    """

    with open('Deskripsi_dan_Dokter.csv') as file:
        prognosis = csv.DictReader(file)
        for row in prognosis:
            if row['Prognosis'] == prediksi:
                return row['Deskripsi'], row['Spesialis']

    return "Penyakit tidak dikenali", "Tidak ada spesialis yang sesuai"


In [24]:
#contoh input 
input_data = X_test.iloc[30]

# Mengubah input data menjadi bentuk yang sesuai untuk prediksi
input_data = np.array([input_data])

predictions_prob = model.predict(input_data)
max_prob_index = np.argmax(predictions_prob)  # Menemukan indeks probabilitas tertinggi
max_prob_label = label_encoder.classes_[max_prob_index]  # Menemukan label kelas dengan probabilitas tertinggi
#print("Prognosis:", max_prob_label)

# Contoh penggunaan
prediksi = max_prob_label
deskripsi, dokter_spesialis = deskripsi_dan_dokter(prediksi)

#print("Deskripsi:", deskripsi)
#print("Rekomendasi dokter: Dokter", dokter_spesialis)

print(f"Prognosis Anda adalah: {prediksi} \n{prediksi} adalah {deskripsi} \nAnda kami sarankan untuk mengunjungi {dokter_spesialis} terdekat")

Prognosis Anda adalah: Malaria 
Malaria adalah Penyakit yang disebabkan oleh parasit Plasmodium yang ditularkan melalui gigitan nyamuk Anopheles. 
Anda kami sarankan untuk mengunjungi Spesialis Penyakit Infeksi terdekat


## F. Conclusion
The `h5` model that has been created is **very good**, based on the results of testing each column which only gets **2 incorrect predictions** from 41 input data and then can provide `Deskripsi` and recommendations for `Dokter Spesialis` **correctly**, it can be concluded that the existing h5 model is **ready for use**.