# Introduction: Predictive model for differential diagnosis

In this notebook, our goal is to develop a model that can take in a patient's symptoms as an input and return a list of the top 3 possible classes (diseases) alongside confidence values for each class expressed as probabilities.


## Library and Data import

In [1]:
#|include: false 

%pip install seaborn
%pip install fastkaggle
%pip install -Uqq fastbook
%pip install --upgrade pip
%pip install tqdm
#%pip install catboost
#%pip install optuna
#%pip install optuna_distributed
#%pip install openfe
#%pip install xgboost
#%pip install lightgbm
#%pip install h2o
#%pip install polars
#%pip install -q -U autogluon.tabular
#%pip install autogluon
#%pip install wandb
#%pip install sweetviz

Note: you may need to restart the kernel to use updated packages.
Collecting fastkaggle
  Downloading fastkaggle-0.0.8-py3-none-any.whl.metadata (4.3 kB)
Downloading fastkaggle-0.0.8-py3-none-any.whl (11 kB)
Installing collected packages: fastkaggle
Successfully installed fastkaggle-0.0.8
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting pip
  Downloading pip-25.0.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-25.0.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m41.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.0
    Uninstalling pip-24.0:
      Successfully uninstalled pip-24.0
Successfully installed pip-25.0.1
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to 

In [2]:
#| code-fold: true
#| output: false
#| code-summary: "Library Import"

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#import fastbook
#fastbook.setup_book()
#from fastbook import *
from fastai.tabular.all import *
import numpy as np
from numpy import random
from tqdm import tqdm
from ipywidgets import interact
from fastai.imports import *
np.set_printoptions(linewidth=130)
from fastai.text.all import *
from pathlib import Path
import os
import warnings
import gc
import pickle
from joblib import dump, load

# ULMFiT approach

Our initial pre-trained model used above was initially trained on Wikipedia on the task of guessing the next word. We then fine-tuned this model for our disease classification task based on symptoms.

But the Wikipedia English might differ from medical jargon, so to further improve our model, We can take this a step further by fitting this pre-trained model on medical corpus and using that as a the base for our classifier.

In [3]:
!ls /kaggle/input/symptoms-disease-no-id

symptom_disease_no_id_col.csv  symptom_no_id.csv


In [4]:
path = Path('/kaggle/input/symptoms-disease-no-id')
path

Path('/kaggle/input/symptoms-disease-no-id')

In [5]:
#symptom_df = pd.read_csv(path_lm/'symptom_synth.csv',index_col=0)
symptom_df = pd.read_csv(path/'symptom_no_id.csv')
sd_df = pd.read_csv(path/'symptom_disease_no_id_col.csv')
symptom_df.head()

Unnamed: 0,text
0,"I have been experiencing a skin rash on my arms, legs, and torso for the past few weeks. It is red, itchy, and covered in dry, scaly patches."
1,"My skin has been peeling, especially on my knees, elbows, and scalp. This peeling is often accompanied by a burning or stinging sensation."
2,"I have been experiencing joint pain in my fingers, wrists, and knees. The pain is often achy and throbbing, and it gets worse when I move my joints."
3,"There is a silver like dusting on my skin, especially on my lower back and scalp. This dusting is made up of small scales that flake off easily when I scratch them."
4,"My nails have small dents or pits in them, and they often feel inflammatory and tender to the touch. Even there are minor rashes on my arms."


In [6]:
symptom_df['text'].nunique(),sd_df['text'].nunique()

(1153, 1153)

In [7]:
#dls_lm = TextDataLoaders.from_df(symptom_df, path=path, is_lm=True, valid_pct=0.2)
dls_lm = TextDataLoaders.from_df(symptom_df, path=path, is_lm=True,text_col='text', valid_pct=0.2)
#dls_lm = TextDataLoaders.from_folder(path=path_lm, is_lm=True, valid_pct=0.1)

In [8]:
dls_lm.show_batch(max_n=5)

Unnamed: 0,text,text_
0,"xxbos a few days ago , i experienced a tiny rash around my nose . xxmaj the rash is now accompanied by a burning feeling and skin redness and discharge of fluid . i believe it is some xxunk of infection . xxbos xxmaj my eyes are usually red and inflamed , and i have the xxunk that something is xxunk my sinuses . xxmaj i 've been coughing up a lot","a few days ago , i experienced a tiny rash around my nose . xxmaj the rash is now accompanied by a burning feeling and skin redness and discharge of fluid . i believe it is some xxunk of infection . xxbos xxmaj my eyes are usually red and inflamed , and i have the xxunk that something is xxunk my sinuses . xxmaj i 've been coughing up a lot of"
1,"which i can not sleep all night . xxbos xxmaj enlarged lymph nodes are giving me a great deal of pain . i have rashes all over my body and because of which i can not sleep all night . xxbos i am exhausted and have lost my appetite . i feel vomiting and ca n't eat anything . xxmaj in addition , little red spots are beginning to appear on my","i can not sleep all night . xxbos xxmaj enlarged lymph nodes are giving me a great deal of pain . i have rashes all over my body and because of which i can not sleep all night . xxbos i am exhausted and have lost my appetite . i feel vomiting and ca n't eat anything . xxmaj in addition , little red spots are beginning to appear on my skin"
2,"weakness . xxmaj i 've been feeling disoriented and weak on my feet , and my neck hurts . xxbos i have noticed that the blood vessels in my legs are getting more noticeable than usual . xxmaj it is a little concerning to me . xxmaj moreover , i am experiencing cramps every day . xxbos xxmaj my desire for sex has xxunk , and xxmaj i 'm having trouble having",". xxmaj i 've been feeling disoriented and weak on my feet , and my neck hurts . xxbos i have noticed that the blood vessels in my legs are getting more noticeable than usual . xxmaj it is a little concerning to me . xxmaj moreover , i am experiencing cramps every day . xxbos xxmaj my desire for sex has xxunk , and xxmaj i 'm having trouble having sex"
3,"my nails . xxbos i experience skin irritations and rashes , especially in my skin 's xxunk . xxmaj any wounds and bruises i have on my skin also heal quite slowly . xxbos xxmaj i 'm suffering from intense itching , chills , vomiting , and a high fever . xxmaj i 've also been sweating a lot and have a headache . xxmaj nausea and muscle pain have also been","nails . xxbos i experience skin irritations and rashes , especially in my skin 's xxunk . xxmaj any wounds and bruises i have on my skin also heal quite slowly . xxbos xxmaj i 'm suffering from intense itching , chills , vomiting , and a high fever . xxmaj i 've also been sweating a lot and have a headache . xxmaj nausea and muscle pain have also been bothering"
4,"my muscles recently and my neck has been truly solid . xxmaj swollen joints make it difficult for me to move around . xxmaj walking has also been difficult . xxbos xxmaj back pain , a coughing cough , and numbness in my arms and legs have been plaguing me . xxmaj in addition , my neck hurts , and xxmaj i 've having trouble staying balanced and without getting woozy .","muscles recently and my neck has been truly solid . xxmaj swollen joints make it difficult for me to move around . xxmaj walking has also been difficult . xxbos xxmaj back pain , a coughing cough , and numbness in my arms and legs have been plaguing me . xxmaj in addition , my neck hurts , and xxmaj i 've having trouble staying balanced and without getting woozy . xxbos"


In [9]:

learn = language_model_learner(dls_lm, AWD_LSTM, metrics=[accuracy, Perplexity()], path=path, wd=0.1).to_fp16()

  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)


In [10]:
#| error: false
learn.fit_one_cycle(1, 1e-2)

  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.345206,3.619923,0.339641,37.334679,00:02


In [11]:
#| code-fold: show

# Create a directory to save the model
os.makedirs('/kaggle/working/models', exist_ok=True)

# Set the model directory for the learner
learn.model_dir = '/kaggle/working/models'

# Now save the model
learn.save('1epoch')

Path('/kaggle/working/models/1epoch.pth')

In [12]:
#| error: false
learn = learn.load('1epoch')

  state = torch.load(file, map_location=device)


In [13]:
#| error: false
learn.unfreeze()
learn.fit_one_cycle(5, 1e-3)

  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.582769,2.951274,0.400897,19.130304,00:02
1,3.262762,2.605996,0.430874,13.544711,00:02
2,3.022963,2.381455,0.478921,10.820632,00:02
3,2.843581,2.281945,0.488238,9.79571,00:02
4,2.728844,2.265132,0.491623,9.632401,00:02


In [14]:
#| code-fold: true
#| output: false
#| code-summary: "Save the model"
# Now save the model
learn.save_encoder('finetuned')

In [15]:
#| output: false
#| error: false
TEXT = "I have running nose, stomach and joint pains"
N_WORDS = 40
N_SENTENCES = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) 
         for _ in range(N_SENTENCES)]

  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


In [16]:
print("\n".join(preds))

i have running nose , stomach and joint pains all over my body . i have been experiencing vomiting and weakness in my neck since then , and my chest hurts . i am also concerned about this . i have had trouble breathing and have difficulty breathing
i have running nose , stomach and joint pains . i have a high fever , a headache , lots of irritability , and coughing up a lot of thick , cough - filled foods . i have been experiencing stomach ache , a chronic cough , and stomach


In [17]:
#symptom_df = pd.read_csv(path_lm/'symptom_synth.csv',index_col=0)
#sd_df = pd.read_csv(path_lm/'symptom_disease_no_id_col.csv')
sd_df.head()

Unnamed: 0,label,text
0,Psoriasis,"I have been experiencing a skin rash on my arms, legs, and torso for the past few weeks. It is red, itchy, and covered in dry, scaly patches."
1,Psoriasis,"My skin has been peeling, especially on my knees, elbows, and scalp. This peeling is often accompanied by a burning or stinging sensation."
2,Psoriasis,"I have been experiencing joint pain in my fingers, wrists, and knees. The pain is often achy and throbbing, and it gets worse when I move my joints."
3,Psoriasis,"There is a silver like dusting on my skin, especially on my lower back and scalp. This dusting is made up of small scales that flake off easily when I scratch them."
4,Psoriasis,"My nails have small dents or pits in them, and they often feel inflammatory and tender to the touch. Even there are minor rashes on my arms."


In [18]:
# Check for NaN values in the label column
print(sd_df['label'].isna().sum())

# If there are NaNs, you can drop those rows
#df = df.dropna(subset=['label'])

0


In [19]:
#| output: false
#| error: false
#dls_clas = TextDataLoaders.from_df(sd_df, path=path,valid='test', text_vocab=dls_lm.vocab)
dls_clas = TextDataLoaders.from_df(sd_df, path=path,valid='test',text_col='text',label_col='label', text_vocab=dls_lm.vocab)

In [20]:
#| error: false
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)


In [21]:
from pathlib import Path
learn.path = Path('/kaggle/working')

In [22]:
#| error: false
learn = learn.load_encoder('finetuned')

  wgts = torch.load(join_path_file(file,self.path/self.model_dir, ext='.pth'), map_location=device)


In [23]:
len(dls_lm.vocab)

944

In [24]:
#| error: false
learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,time
0,2.264708,2.392303,0.5125,00:01


In [25]:
#| error: false
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))

epoch,train_loss,valid_loss,accuracy,time
0,1.476579,1.601696,0.683333,00:01


In [26]:
learn.unfreeze()
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3))

epoch,train_loss,valid_loss,accuracy,time
0,1.015998,1.070644,0.8125,00:01
1,0.933672,0.829735,0.833333,00:01


In [27]:
learn.predict("I am having a running stomach, fever, general body weakness and have been getting bitten by mosquitoes often")

('Malaria',
 tensor(12),
 tensor([0.0095, 0.0191, 0.1334, 0.0744, 0.0090, 0.0108, 0.0350, 0.0114, 0.0178,
         0.1524, 0.0047, 0.0629, 0.1591, 0.0250, 0.0115, 0.0098, 0.0778, 0.0085,
         0.0086, 0.0388, 0.0281, 0.0217, 0.0482, 0.0226]))

In [28]:
#| code-fold: true
#| code-summary: "Click to see full code in one cell"
#| error: false
path = Path('/kaggle/input/symptoms-disease-no-id')
#symptom_df = pd.read_csv(path_lm/'symptom_synth.csv',index_col=0)
symptom_df = pd.read_csv(path/'symptom_no_id.csv')
sd_df = pd.read_csv(path/'symptom_disease_no_id_col.csv')
dls_lm = TextDataLoaders.from_df(symptom_df, path=path,text_col='text', is_lm=True, valid_pct=0.2)
learn = language_model_learner(dls_lm, AWD_LSTM, metrics=[accuracy, Perplexity()], path=path, wd=0.1).to_fp16()
learn.fit_one_cycle(1, 1e-2)
# Create a directory to save the model
os.makedirs('/kaggle/working/models', exist_ok=True)
# Set the model directory for the learner
learn.model_dir = '/kaggle/working/models'
# Now save the model
learn.save('1epoch')
learn = learn.load('1epoch')
learn.unfreeze()
learn.fit_one_cycle(5, 1e-3)
# Now save the model
learn.save_encoder('finetuned')


#finetuning the classifier
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
dls_clas = TextDataLoaders.from_df(sd_df, path=path,text_col='text',label_col='label', text_vocab=dls_lm.vocab)
from pathlib import Path
learn.path = Path('/kaggle/working')
learn = learn.load_encoder('finetuned')
learn.fit_one_cycle(1, 2e-2)
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))
learn.unfreeze()
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3))
learn.predict("I am having a running stomach, fever, general body weakness and have been getting bitten by mosquitoes often")

  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.302639,3.686153,0.296875,39.891106,00:01


  state = torch.load(file, map_location=device)
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.610258,3.102001,0.348524,22.242418,00:02
1,3.277711,2.710642,0.411169,15.038921,00:02
2,3.039388,2.502887,0.443793,12.217712,00:02
3,2.871224,2.425918,0.464337,11.312613,00:02
4,2.760429,2.403721,0.469763,11.064268,00:02


  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)


  wgts = torch.load(join_path_file(file,self.path/self.model_dir, ext='.pth'), map_location=device)


epoch,train_loss,valid_loss,accuracy,time
0,2.234374,2.362932,0.4125,00:01


epoch,train_loss,valid_loss,accuracy,time
0,1.408947,1.452858,0.729167,00:01


epoch,train_loss,valid_loss,accuracy,time
0,0.973775,1.00662,0.795833,00:01
1,0.90817,0.828894,0.8125,00:01


('Malaria',
 tensor(12),
 tensor([0.0070, 0.0168, 0.0382, 0.0947, 0.0049, 0.0124, 0.0142, 0.0096, 0.0066,
         0.0884, 0.0054, 0.1012, 0.2662, 0.0422, 0.0538, 0.0053, 0.1228, 0.0074,
         0.0121, 0.0363, 0.0154, 0.0056, 0.0128, 0.0207]))