# **Project Overview**

## Latar Belakang
Sistem rekomendasi penyakit berdasarkan gejala dapat membantu pengguna mendapatkan informasi awal mengenai kemungkinan kondisi medis yang mereka alami dan obat-obatan yang umumnya terkait. Proyek ini bertujuan membangun sistem rekomendasi yang menyarankan penyakit berdasarkan gejala yang dimasukkan pengguna, dan selanjutnya menampilkan informasi obat untuk penyakit yang direkomendasikan.

Sumber Dataset : [Kaggle : https://www.kaggle.com/datasets/noorsaeed/medicine-recommendation-system-dataset/data]

Referensi :
Komal Kumar, N., & Vigneswari, D. (2019, December). A drug recommendation system for multi-disease in health care using machine learning. In International Conference on Advanced Communication and Computational Technology (pp. 1-12). Singapore: Springer Nature Singapore.

# **Business Understanding**

## Problem Statements
1.  Bagaimana cara merekomendasikan penyakit kepada pengguna berdasarkan kumpulan gejala yang mereka alami?
2.  Bagaimana cara menyajikan informasi obat yang relevan untuk penyakit yang direkomendasikan?
3.  Algoritma atau pendekatan apa yang cocok untuk membangun sistem rekomendasi ini?

## Goals
1.  Membuat sistem yang dapat menerima input berupa gejala-gejala dari pengguna dan mengeluarkan daftar rekomendasi penyakit.
2.  Menampilkan daftar obat yang umum digunakan untuk penyakit-penyakit yang direkomendasikan.
3.  Menerapkan pendekatan *content-based filtering* untuk merekomendasikan penyakit berdasarkan kesamaan gejala.

## Solution Approach
Pendekatan utama adalah **Content-Based Filtering** untuk rekomendasi penyakit, diikuti dengan pencarian (lookup) informasi obat.
1.  **Representasi Fitur Penyakit**: Setiap penyakit akan direpresentasikan oleh profil gejalanya.
2.  **Representasi Fitur Obat**: Dataset kedua akan digunakan untuk memetakan penyakit ke obat-obatannya.
3.  **Input Pengguna**: Pengguna memasukkan satu atau lebih gejala.
4.  **Mekanisme Rekomendasi Penyakit**: Sistem menghitung kesamaan antara gejala input dengan profil gejala penyakit.
5.  **Menampilkan Obat**: Untuk penyakit yang direkomendasikan, sistem akan mencari dan menampilkan daftar obat terkait.
6.  **Output**: Top-N rekomendasi penyakit beserta obatnya.

In [65]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ast
from google.colab import files
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

# Load Data

In [39]:
#load dataset 1 (gejala)
df_symptoms = pd.read_csv('symtoms_df.csv')

In [40]:
df_symptoms.head()

Unnamed: 0.1,Unnamed: 0,Disease,Symptom_1,Symptom_2,Symptom_3,Symptom_4
0,0,Fungal infection,itching,skin_rash,nodal_skin_eruptions,dischromic _patches
1,1,Fungal infection,skin_rash,nodal_skin_eruptions,dischromic _patches,
2,2,Fungal infection,itching,nodal_skin_eruptions,dischromic _patches,
3,3,Fungal infection,itching,skin_rash,dischromic _patches,
4,4,Fungal infection,itching,skin_rash,nodal_skin_eruptions,


In [41]:
df_symptoms.tail()

Unnamed: 0.1,Unnamed: 0,Disease,Symptom_1,Symptom_2,Symptom_3,Symptom_4
4915,4915,(vertigo) Paroymsal Positional Vertigo,vomiting,headache,nausea,spinning_movements
4916,4916,Acne,skin_rash,pus_filled_pimples,blackheads,scurring
4917,4917,Urinary tract infection,burning_micturition,bladder_discomfort,foul_smell_of urine,continuous_feel_of_urine
4918,4918,Psoriasis,skin_rash,joint_pain,skin_peeling,silver_like_dusting
4919,4919,Impetigo,skin_rash,high_fever,blister,red_sore_around_nose


In [42]:
df_symptoms.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4920 entries, 0 to 4919
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  4920 non-null   int64 
 1   Disease     4920 non-null   object
 2   Symptom_1   4920 non-null   object
 3   Symptom_2   4920 non-null   object
 4   Symptom_3   4920 non-null   object
 5   Symptom_4   4572 non-null   object
dtypes: int64(1), object(5)
memory usage: 230.8+ KB


In [43]:
df_symptoms.describe()

Unnamed: 0.1,Unnamed: 0
count,4920.0
mean,2459.5
std,1420.425992
min,0.0
25%,1229.75
50%,2459.5
75%,3689.25
max,4919.0


In [44]:
print(df_symptoms['Symptom_1'].unique())

['itching' ' skin_rash' ' continuous_sneezing' ' shivering'
 ' stomach_pain' ' acidity' ' vomiting' ' indigestion' ' muscle_wasting'
 ' patches_in_throat' ' fatigue' ' weight_loss' ' sunken_eyes' ' cough'
 ' headache' ' chest_pain' ' back_pain' ' weakness_in_limbs' ' chills'
 ' joint_pain' ' yellowish_skin' ' constipation'
 ' pain_during_bowel_movements' ' breathlessness' ' cramps' ' weight_gain'
 ' mood_swings' ' neck_pain' ' muscle_weakness' ' stiff_neck'
 ' pus_filled_pimples' ' burning_micturition' ' bladder_discomfort'
 ' high_fever']


In [45]:
df_symptoms.Symptom_2.unique()

array([' skin_rash', ' nodal_skin_eruptions', ' shivering', ' chills',
       ' acidity', ' ulcers_on_tongue', ' vomiting', ' yellowish_skin',
       ' stomach_pain', ' loss_of_appetite', ' indigestion',
       ' patches_in_throat', ' high_fever', ' weight_loss',
       ' restlessness', ' sunken_eyes', ' dehydration', ' cough',
       ' chest_pain', ' dizziness', ' headache', ' weakness_in_limbs',
       ' neck_pain', ' weakness_of_one_body_side', ' fatigue',
       ' joint_pain', ' lethargy', ' nausea', ' abdominal_pain',
       ' pain_during_bowel_movements', ' pain_in_anal_region',
       ' breathlessness', ' sweating', ' cramps', ' bruising',
       ' weight_gain', ' cold_hands_and_feets', ' mood_swings',
       ' anxiety', ' knee_pain', ' stiff_neck', ' swelling_joints',
       ' pus_filled_pimples', ' blackheads', ' bladder_discomfort',
       ' foul_smell_of urine', ' skin_peeling', ' blister'], dtype=object)

In [46]:
df_symptoms.Symptom_3.unique()

array([' nodal_skin_eruptions', ' dischromic _patches', ' chills',
       ' watering_from_eyes', ' ulcers_on_tongue', ' vomiting',
       ' yellowish_skin', ' nausea', ' stomach_pain',
       ' burning_micturition', ' abdominal_pain', ' loss_of_appetite',
       ' high_fever', ' extra_marital_contacts', ' restlessness',
       ' lethargy', ' dehydration', ' diarrhoea', ' breathlessness',
       ' dizziness', ' loss_of_balance', ' headache',
       ' blurred_and_distorted_vision', ' neck_pain',
       ' weakness_of_one_body_side', ' altered_sensorium', ' fatigue',
       ' weight_loss', ' sweating', ' joint_pain', ' dark_urine',
       ' swelling_of_stomach', ' cough', ' pain_in_anal_region',
       ' bloody_stool', ' chest_pain', ' bruising', ' obesity',
       ' cold_hands_and_feets', ' mood_swings', ' anxiety', ' knee_pain',
       ' hip_joint_pain', ' swelling_joints', ' movement_stiffness',
       ' spinning_movements', ' blackheads', ' scurring',
       ' foul_smell_of urine', ' c

In [47]:
df_symptoms.Symptom_4.unique()

array([' dischromic _patches', nan, ' watering_from_eyes', ' vomiting',
       ' cough', ' nausea', ' loss_of_appetite', ' burning_micturition',
       ' spotting_ urination', ' passage_of_gases', ' abdominal_pain',
       ' extra_marital_contacts', ' lethargy', ' irregular_sugar_level',
       ' diarrhoea', ' breathlessness', ' family_history',
       ' loss_of_balance', ' lack_of_concentration',
       ' blurred_and_distorted_vision', ' excessive_hunger', ' dizziness',
       ' altered_sensorium', ' weight_loss', ' high_fever', ' sweating',
       ' headache', ' fatigue', ' dark_urine', ' yellowish_skin',
       ' yellowing_of_eyes', ' swelling_of_stomach',
       ' distention_of_abdomen', ' bloody_stool', ' irritation_in_anus',
       ' chest_pain', ' obesity', ' swollen_legs', ' mood_swings',
       ' restlessness', ' hip_joint_pain', ' swelling_joints',
       ' movement_stiffness', ' painful_walking', ' spinning_movements',
       ' scurring', ' continuous_feel_of_urine', ' silve

In [48]:
print(f"Jumlah penyakit unik: {df_symptoms['Disease'].nunique()}")

Jumlah penyakit unik: 41


In [49]:
#Menghitung Jumlah Gejala Unik (perlu menggabungkan semua kolom gejala)
symptoms = set()
for col in ['Symptom_1', 'Symptom_2', 'Symptom_3', 'Symptom_4']:
    df_symptoms[col].dropna().apply(lambda x: symptoms.add(x.strip())) # .strip() untuk menghapus spasi ekstra
print(f"Jumlah gejala unik: {len(symptoms)}")

Jumlah gejala unik: 86


In [50]:
## Melihat beberapa gejala unik
list(symptoms)[:]

['pain_during_bowel_movements',
 'lethargy',
 'acidity',
 'passage_of_gases',
 'yellow_crust_ooze',
 'obesity',
 'cramps',
 'skin_rash',
 'dark_urine',
 'painful_walking',
 'blister',
 'loss_of_appetite',
 'blurred_and_distorted_vision',
 'stiff_neck',
 'continuous_sneezing',
 'weight_gain',
 'chills',
 'scurring',
 'extra_marital_contacts',
 'abdominal_pain',
 'patches_in_throat',
 'lack_of_concentration',
 'diarrhoea',
 'skin_peeling',
 'chest_pain',
 'foul_smell_of urine',
 'headache',
 'weight_loss',
 'bruising',
 'movement_stiffness',
 'shivering',
 'weakness_in_limbs',
 'family_history',
 'knee_pain',
 'small_dents_in_nails',
 'bladder_discomfort',
 'swelling_of_stomach',
 'distention_of_abdomen',
 'bloody_stool',
 'hip_joint_pain',
 'dischromic _patches',
 'irregular_sugar_level',
 'spinning_movements',
 'stomach_pain',
 'nausea',
 'blackheads',
 'breathlessness',
 'restlessness',
 'yellowish_skin',
 'indigestion',
 'itching',
 'anxiety',
 'muscle_wasting',
 'constipation',
 'sw

In [51]:
#dataset 2 (medicine/obat berdasarkan gejala)
df_medicine = pd.read_csv('medications.csv')
df_medicine.head()

Unnamed: 0,Disease,Medication
0,Fungal infection,"['Antifungal Cream', 'Fluconazole', 'Terbinafi..."
1,Allergy,"['Antihistamines', 'Decongestants', 'Epinephri..."
2,GERD,"['Proton Pump Inhibitors (PPIs)', 'H2 Blockers..."
3,Chronic cholestasis,"['Ursodeoxycholic acid', 'Cholestyramine', 'Me..."
4,Drug Reaction,"['Antihistamines', 'Epinephrine', 'Corticoster..."


In [52]:
df_medicine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41 entries, 0 to 40
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Disease     41 non-null     object
 1   Medication  41 non-null     object
dtypes: object(2)
memory usage: 788.0+ bytes


In [53]:
df_medicine.describe(include='all')

Unnamed: 0,Disease,Medication
count,41,41
unique,41,38
top,Fungal infection,"['Antiviral drugs', 'IV fluids', 'Blood transf..."
freq,1,3


## Analisis Awal
### Dataset Gejala-Penyakit

In [54]:
# Jumlah penyakit unik
print(f"\nJumlah penyakit unik di df_symptoms: {df_symptoms['Disease'].nunique()}")

# Jumlah gejala unik (perlu menggabungkan semua kolom gejala)
symptoms_set = set()
for col in ['Symptom_1', 'Symptom_2', 'Symptom_3', 'Symptom_4']:
    df_symptoms[col].dropna().apply(lambda x: symptoms_set.add(x.strip().lower()))
print(f"Jumlah gejala unik di df_symptoms: {len(symptoms_set)}")


Jumlah penyakit unik di df_symptoms: 41
Jumlah gejala unik di df_symptoms: 86


### Dataset Penyakit-Obat

In [55]:
# Jumlah penyakit unik
print(f"\nJumlah penyakit unik di df_medicine: {df_medicine['Disease'].nunique()}")
# Cek apakah ada missing values di kolom Medication
print(f"Missing values di kolom Medication: {df_medicine['Medication'].isnull().sum()}")


Jumlah penyakit unik di df_medicine: 41
Missing values di kolom Medication: 0


# Data Preparation

## 4.1 Persiapan Dataset Gejala-Penyakit (df_symptoms)

In [56]:
# Menghapus kolom 'Unnamed: 0'
if 'Unnamed: 0' in df_symptoms.columns:
    df_symptoms_processed = df_symptoms.drop(columns=['Unnamed: 0'])
else:
    df_symptoms_processed = df_symptoms.copy()

In [57]:
df_symptoms_processed.isnull().sum()

Unnamed: 0,0
Disease,0
Symptom_1,0
Symptom_2,0
Symptom_3,0
Symptom_4,348


In [58]:
# Menangani Missing Values (NaN) di kolom gejala dan membersihkan spasi
symptom_cols = ['Symptom_1', 'Symptom_2', 'Symptom_3', 'Symptom_4']
for col in symptom_cols:
    df_symptoms_processed[col] = df_symptoms_processed[col].fillna('').str.strip().str.lower()

In [61]:
# Membersihkan nama penyakit (hapus spasi ekstra dan ubah ke lowercase untuk konsistensi)
df_symptoms_processed['Disease'] = df_symptoms_processed['Disease'].str.strip().str.lower()

In [62]:
# Membuat profil gejala untuk setiap penyakit unik
df_symptoms_processed['all_symptoms_list'] = df_symptoms_processed[symptom_cols].values.tolist()

disease_symptoms_map = {}
for index, row in df_symptoms_processed.iterrows():
    disease = row['Disease']
    symptoms_for_row = set(s for s in row['all_symptoms_list'] if s) # Gejala unik per baris
    if disease not in disease_symptoms_map:
        disease_symptoms_map[disease] = set()
    disease_symptoms_map[disease].update(symptoms_for_row)

disease_profile_df = pd.DataFrame([
    {'Disease': disease, 'Symptoms_Profile': ' '.join(sorted(list(symptoms)))}
    for disease, symptoms in disease_symptoms_map.items()
])

print("\n--- Profil Penyakit Berdasarkan Gejala (disease_profile_df) ---")
disease_profile_df.head()


--- Profil Penyakit Berdasarkan Gejala (disease_profile_df) ---


Unnamed: 0,Disease,Symptoms_Profile
0,fungal infection,dischromic _patches itching nodal_skin_eruptio...
1,allergy,chills continuous_sneezing shivering watering_...
2,gerd,acidity cough stomach_pain ulcers_on_tongue vo...
3,chronic cholestasis,itching loss_of_appetite nausea vomiting yello...
4,drug reaction,burning_micturition itching skin_rash spotting...


In [63]:
print(f"Jumlah profil penyakit unik setelah grouping: {len(disease_profile_df)}")


Jumlah profil penyakit unik setelah grouping: 41


## 4.2 Persiapan Dataset Penyakit-Obat (df_medicine)

In [66]:
# Menghapus kolom yang mungkin tidak perlu jika ada (misal 'Unnamed: 0')
if 'Unnamed: 0' in df_medicine.columns:
    df_medicine_processed = df_medicine.drop(columns=['Unnamed: 0'])
else:
    df_medicine_processed = df_medicine.copy()

# Membersihkan nama penyakit (agar konsisten dengan df_symptoms_processed)
df_medicine_processed['Disease'] = df_medicine_processed['Disease'].str.strip().str.lower()

# Menangani missing values di kolom Medication (jika ada)
df_medicine_processed['Medication'] = df_medicine_processed['Medication'].fillna("['Informasi tidak tersedia']")

# Parsing kolom 'Medication' jika masih dalam bentuk string '[item1, item2]'
def parse_medication_string(med_str):
    try:
        parsed_list = ast.literal_eval(med_str)
        if isinstance(parsed_list, list):
            return [str(item).strip() for item in parsed_list]
        return [med_str]
    except (ValueError, SyntaxError):
        return [med_str.strip()] if isinstance(med_str, str) else ["Informasi tidak tersedia"]

df_medicine_processed['Medication_List'] = df_medicine_processed['Medication'].apply(parse_medication_string)

print("\n--- Dataset Obat Setelah Diproses (df_medicine_processed) ---")
df_medicine_processed.head()


--- Dataset Obat Setelah Diproses (df_medicine_processed) ---


Unnamed: 0,Disease,Medication,Medication_List
0,fungal infection,"['Antifungal Cream', 'Fluconazole', 'Terbinafi...","[Antifungal Cream, Fluconazole, Terbinafine, C..."
1,allergy,"['Antihistamines', 'Decongestants', 'Epinephri...","[Antihistamines, Decongestants, Epinephrine, C..."
2,gerd,"['Proton Pump Inhibitors (PPIs)', 'H2 Blockers...","[Proton Pump Inhibitors (PPIs), H2 Blockers, A..."
3,chronic cholestasis,"['Ursodeoxycholic acid', 'Cholestyramine', 'Me...","[Ursodeoxycholic acid, Cholestyramine, Methotr..."
4,drug reaction,"['Antihistamines', 'Epinephrine', 'Corticoster...","[Antihistamines, Epinephrine, Corticosteroids,..."


In [67]:
# Hapus duplikat penyakit jika ada, pertahankan yang pertama
df_medicine_processed = df_medicine_processed.drop_duplicates(subset=['Disease'], keep='first')
print(f"Jumlah entri unik obat per penyakit: {len(df_medicine_processed)}")

Jumlah entri unik obat per penyakit: 41


# Modeling (Content-Based Filtering)

## Menggunakan TF-IDF Vectorizer untuk Gejala

In [68]:
#menggunakan TF-IDF
tfidf_vectorizer = TfidfVectorizer()
disease_profile_df['Symptoms_Profile'] = disease_profile_df['Symptoms_Profile'].fillna('') # Pastikan tidak ada NaN
tfidf_matrix_symptoms = tfidf_vectorizer.fit_transform(disease_profile_df['Symptoms_Profile'])

print(f"\nBentuk matriks TF-IDF Gejala: {tfidf_matrix_symptoms.shape}")


Bentuk matriks TF-IDF Gejala: (41, 89)


## Menghitung Kesamaan Cosine antar Penyakit

In [69]:
cosine_sim_symptoms = cosine_similarity(tfidf_matrix_symptoms, tfidf_matrix_symptoms)
print(f"Bentuk matriks Cosine Similarity Gejala: {cosine_sim_symptoms.shape}")

Bentuk matriks Cosine Similarity Gejala: (41, 41)


# **Membuat Sistem Rekomendasi dan Evaluasi**

## Membuat Fungsi Rekomendasi Penyakit dan Obat

In [71]:
# Fungsi untuk mendapatkan obat
def get_medications_for_disease_lookup(disease_name, medicine_df):
    meds_row = medicine_df[medicine_df['Disease'].str.lower() == disease_name.lower()]
    if not meds_row.empty:
        med_list = meds_row['Medication_List'].iloc[0]
        return ', '.join(med_list) if isinstance(med_list, list) else med_list
    return "Informasi obat tidak tersedia."

# Fungsi utama rekomendasi
def recommend_diseases_and_meds(user_symptoms_str, disease_profiles, medicine_data, tfidf_vec, tfidf_mat, top_n=5):
    processed_user_symptoms = ' '.join(sorted(list(set(s.strip().lower() for s in user_symptoms_str.split(',') if s.strip()))))

    if not processed_user_symptoms:
        print("\nMohon masukkan gejala.")
        return None, [] # Kembalikan None jika tidak ada input

    user_tfidf_vector = tfidf_vec.transform([processed_user_symptoms])
    cosine_similarities_user = cosine_similarity(user_tfidf_vector, tfidf_mat)
    similarity_scores = list(enumerate(cosine_similarities_user[0]))
    sorted_similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    print(f"\nGejala yang Anda masukkan: {user_symptoms_str}")
    print(f"Rekomendasi {top_n} penyakit teratas beserta kemungkinan obatnya:")

    recommended_diseases_list = []
    recommended_scores_list = []

    recommended_count = 0
    for i in range(len(sorted_similarity_scores)):
        if recommended_count >= top_n:
            break

        disease_index = sorted_similarity_scores[i][0]
        score = sorted_similarity_scores[i][1]
        disease_name = disease_profiles['Disease'].iloc[disease_index]

        if score > 0:
            medications = get_medications_for_disease_lookup(disease_name, medicine_data)
            print(f"{recommended_count+1}. Penyakit: {disease_name.title()} (Skor Kesamaan Gejala: {score:.2f})")
            print(f"   Obat Umum: {medications}")
            print("-" * 30)
            recommended_diseases_list.append(disease_name)
            recommended_scores_list.append(score)
            recommended_count += 1
        elif recommended_count == 0 and i == 0:
            print("Tidak ditemukan penyakit yang cocok dengan gejala yang Anda masukkan.")
            return None, [] # Kembalikan None jika tidak ada rekomendasi

    return pd.Series(recommended_diseases_list), recommended_scores_list

# Testing Rekomendasi & Evaluation

In [72]:
# Contoh Penggunaan Sistem Rekomendasi
user_input_1 = "itching, skin_rash, nodal_skin_eruptions"
recs1, scores1 = recommend_diseases_and_meds(user_input_1, disease_profile_df, df_medicine_processed, tfidf_vectorizer, tfidf_matrix_symptoms, top_n=3)

user_input_2 = "vomiting, headache, nausea"
recs2, scores2 = recommend_diseases_and_meds(user_input_2, disease_profile_df, df_medicine_processed, tfidf_vectorizer, tfidf_matrix_symptoms, top_n=3)

user_input_3 = "fever, cough, breathlessness"
recs3, scores3 = recommend_diseases_and_meds(user_input_3, disease_profile_df, df_medicine_processed, tfidf_vectorizer, tfidf_matrix_symptoms, top_n=3)

user_input_4 = "runny_nose"
recs4, scores4 = recommend_diseases_and_meds(user_input_4, disease_profile_df, df_medicine_processed, tfidf_vectorizer, tfidf_matrix_symptoms, top_n=3)


Gejala yang Anda masukkan: itching, skin_rash, nodal_skin_eruptions
Rekomendasi 3 penyakit teratas beserta kemungkinan obatnya:
1. Penyakit: Fungal Infection (Skor Kesamaan Gejala: 0.70)
   Obat Umum: Antifungal Cream, Fluconazole, Terbinafine, Clotrimazole, Ketoconazole
------------------------------
2. Penyakit: Chicken Pox (Skor Kesamaan Gejala: 0.46)
   Obat Umum: Antiviral drugs, Pain relievers, IV fluids, Blood transfusions, Platelet transfusions
------------------------------
3. Penyakit: Drug Reaction (Skor Kesamaan Gejala: 0.31)
   Obat Umum: Antihistamines, Epinephrine, Corticosteroids, Antibiotics, Antifungal Cream
------------------------------

Gejala yang Anda masukkan: vomiting, headache, nausea
Rekomendasi 3 penyakit teratas beserta kemungkinan obatnya:
1. Penyakit: (Vertigo) Paroymsal  Positional Vertigo (Skor Kesamaan Gejala: 0.65)
   Obat Umum: Informasi obat tidak tersedia.
------------------------------
2. Penyakit: Chronic Cholestasis (Skor Kesamaan Gejala: 0.47)

## Metrik Evaluasi
Evaluasi untuk sistem ini bersifat kualitatif dan berbasis kasus karena tidak adanya *ground truth* yang terpisah untuk pengujian otomatis.

1.  **Relevansi Penyakit**: Berdasarkan contoh penggunaan.
2.  **Relevansi Obat**: Keakuratan daftar obat bergantung pada kualitas dataset `medications.csv`.
3.  **Precision@k (Manual/Sampled)**: Mengambil sampel, menggunakan beberapa gejala sebagai input, dan melihat apakah penyakit asli muncul di top-k.

In [75]:
# Evaluasi Precision@k Sederhana (Manual)
test_samples_eval = df_symptoms_processed.drop_duplicates(subset=['Disease'], keep='first').sample(10, random_state=42)

hits_at_1_eval = 0
hits_at_3_eval = 0
k_eval = 3

print("\n\n--- EVALUASI PRECISION@K SEDERHANA ---")
for index, row in test_samples_eval.iterrows():
    input_symptoms_list = [s for s in [row['Symptom_1'], row['Symptom_2'], row['Symptom_3'], row['Symptom_4']] if s][:3]
    input_symptoms_str_eval = ','.join(input_symptoms_list)

    true_disease_eval = row['Disease']

    print(f"\nInput Gejala Uji: {input_symptoms_str_eval}")
    print(f"Penyakit Sebenarnya: {true_disease_eval.title()}")

    _processed_user_symptoms_eval = ' '.join(sorted(list(set(s.strip().lower() for s in input_symptoms_str_eval.split(',') if s.strip()))))
    if not _processed_user_symptoms_eval:
        print("Tidak ada gejala input untuk evaluasi.")
        continue

    _user_tfidf_vector_eval = tfidf_vectorizer.transform([_processed_user_symptoms_eval])
    _cosine_similarities_user_eval = cosine_similarity(_user_tfidf_vector_eval, tfidf_matrix_symptoms)
    _similarity_scores_eval = list(enumerate(_cosine_similarities_user_eval[0]))
    _sorted_similarity_scores_eval = sorted(_similarity_scores_eval, key=lambda x: x[1], reverse=True)

    recommended_diseases_names_eval = []
    print(f"Top-{k_eval} Rekomendasi:")

    _recommended_count_eval = 0
    for _i_eval in range(len(_sorted_similarity_scores_eval)):
        if _recommended_count_eval >= k_eval:
            break

        _disease_idx_eval = _sorted_similarity_scores_eval[_i_eval][0]
        _score_eval = _sorted_similarity_scores_eval[_i_eval][1]
        _rec_disease_name_eval = disease_profile_df['Disease'].iloc[_disease_idx_eval]

        if _score_eval > 0:
            print(f"- {_rec_disease_name_eval.title()} (Skor: {_score_eval:.2f})")
            recommended_diseases_names_eval.append(_rec_disease_name_eval)
            _recommended_count_eval +=1
        elif _recommended_count_eval == 0 and _i_eval ==0:
            print("Tidak ada rekomendasi.")
            break

    if recommended_diseases_names_eval:
        if true_disease_eval == recommended_diseases_names_eval[0]:
            hits_at_1_eval += 1
        if true_disease_eval in recommended_diseases_names_eval[:k_eval]:
            hits_at_3_eval += 1

precision_at_1_final = hits_at_1_eval / len(test_samples_eval) if len(test_samples_eval) > 0 else 0
precision_at_3_final = hits_at_3_eval / len(test_samples_eval) if len(test_samples_eval) > 0 else 0

print(f"\nPrecision@1 (dari 10 sampel): {precision_at_1_final:.2f}")
print(f"Precision@3 (dari 10 sampel): {precision_at_3_final:.2f}")



--- EVALUASI PRECISION@K SEDERHANA ---

Input Gejala Uji: vomiting,yellowish_skin,abdominal_pain
Penyakit Sebenarnya: Alcoholic Hepatitis
Top-3 Rekomendasi:
- Alcoholic Hepatitis (Skor: 0.65)
- Peptic Ulcer Diseae (Skor: 0.45)
- Hepatitis E (Skor: 0.41)

Input Gejala Uji: vomiting,headache,weakness_of_one_body_side
Penyakit Sebenarnya: Paralysis (Brain Hemorrhage)
Top-3 Rekomendasi:
- Paralysis (Brain Hemorrhage) (Skor: 0.79)
- Typhoid (Skor: 0.39)
- Malaria (Skor: 0.35)

Input Gejala Uji: vomiting,sunken_eyes,dehydration
Penyakit Sebenarnya: Gastroenteritis
Top-3 Rekomendasi:
- Gastroenteritis (Skor: 0.83)
- Typhoid (Skor: 0.11)
- Hepatitis E (Skor: 0.11)

Input Gejala Uji: chills,vomiting,fatigue
Penyakit Sebenarnya: Tuberculosis
Top-3 Rekomendasi:
- Typhoid (Skor: 0.72)
- Dengue (Skor: 0.70)
- Tuberculosis (Skor: 0.66)

Input Gejala Uji: itching,skin_rash,stomach_pain
Penyakit Sebenarnya: Drug Reaction
Top-3 Rekomendasi:
- Drug Reaction (Skor: 0.62)
- Chicken Pox (Skor: 0.48)
- Fu