# Introduction: Predictive model for differential diagnosis

In this notebook, our goal is to develop a model that can take in a patient's symptoms as an input and return a list of the top 3 possible classes (diseases) alongside confidence values for each class expressed as probabilities.


## Library and Data import

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/s2d-synth/s2d_synth.csv
/kaggle/input/symptom2disease/Symptom2Disease.csv
/kaggle/input/symptoms-disease-no-id/symptom_disease_no_id_col.csv
/kaggle/input/symptoms-disease-no-id/symptom_no_id.csv
/kaggle/input/medical-corpus/dx_datav1.csv.csv
/kaggle/input/symptom-dataset-synthetic/symptom_synth.csv


In [2]:
%%time
#%pip install catboost
#%pip install optuna
#%pip install optuna_distributed
#%pip install openfe
%pip install seaborn
#%pip install xgboost
#%pip install lightgbm
%pip install fastkaggle
#%pip install h2o
%pip install -Uqq fastbook
#%pip install polars
#%pip install -q -U autogluon.tabular
#%pip install autogluon
%pip install --upgrade pip
%pip install tqdm
#%pip install wandb
#%pip install sweetviz

Note: you may need to restart the kernel to use updated packages.
Collecting fastkaggle
  Downloading fastkaggle-0.0.8-py3-none-any.whl.metadata (4.3 kB)
Downloading fastkaggle-0.0.8-py3-none-any.whl (11 kB)
Installing collected packages: fastkaggle
Successfully installed fastkaggle-0.0.8
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting pip
  Downloading pip-25.0.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-25.0.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.0
    Uninstalling pip-24.0:
      Successfully uninstalled pip-24.0
Successfully installed pip-25.0.1
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel 

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#import fastbook
#fastbook.setup_book()
#from fastbook import *
from fastai.tabular.all import *
import numpy as np
from numpy import random
from tqdm import tqdm
from ipywidgets import interact
from fastai.imports import *
np.set_printoptions(linewidth=130)
from fastai.text.all import *
from pathlib import Path
import os
import warnings
import gc
import pickle
from joblib import dump, load

# ULMFiT approach

Our initial pre-trained model used above was initially trained on Wikipedia on the task of guessing the next word. We then fine-tuned this model for our disease classification task based on symptoms.

But the Wikipedia English might differ from medical jargon, so to further improve our model, We can take this a step further by fitting this pre-trained model on medical corpus and using that as a the base for our classifier.

In [5]:
!ls /kaggle/input/symptoms-disease-no-id

symptom_disease_no_id_col.csv  symptom_no_id.csv


In [7]:
path = Path('/kaggle/input/symptoms-disease-no-id')
path

Path('/kaggle/input/symptoms-disease-no-id')

In [None]:
path_lm
/kaggle/input/symptoms-disease-no-id/symptom_disease_no_id_col.csv
/kaggle/input/symptoms-disease-no-id/symptom_no_id.csv

In [22]:
#symptom_df = pd.read_csv(path_lm/'symptom_synth.csv',index_col=0)
symptom_df = pd.read_csv(path/'symptom_no_id.csv')
sd_df = pd.read_csv(path/'symptom_disease_no_id_col.csv')
symptom_df.head()

Unnamed: 0,text
0,"I have been experiencing a skin rash on my arms, legs, and torso for the past few weeks. It is red, itchy, and covered in dry, scaly patches."
1,"My skin has been peeling, especially on my knees, elbows, and scalp. This peeling is often accompanied by a burning or stinging sensation."
2,"I have been experiencing joint pain in my fingers, wrists, and knees. The pain is often achy and throbbing, and it gets worse when I move my joints."
3,"There is a silver like dusting on my skin, especially on my lower back and scalp. This dusting is made up of small scales that flake off easily when I scratch them."
4,"My nails have small dents or pits in them, and they often feel inflammatory and tender to the touch. Even there are minor rashes on my arms."


In [27]:
symptom_df['text'].nunique(),sd_df['text'].nunique()

(1153, 1153)

In [10]:
dls_lm = TextDataLoaders.from_df(symptom_df, path=path, is_lm=True, valid_pct=0.2)
#dls_lm = TextDataLoaders.from_folder(path=path_lm, is_lm=True, valid_pct=0.1)

In [11]:
dls_lm.show_batch(max_n=5)

Unnamed: 0,text,text_
0,"xxbos i get wheezing and breathing difficulties , which are asthma symptoms . i frequently have headaches and fever . xxmaj i 'm continuously exhausted . xxbos xxmaj my muscles have n't been very strong , and xxmaj i 've been experiencing back ache . xxmaj i 've been feeling lightheaded and wobbly on my feet , and my neck hurts . xxbos xxmaj periodically , the tingling in my throat ,","i get wheezing and breathing difficulties , which are asthma symptoms . i frequently have headaches and fever . xxmaj i 'm continuously exhausted . xxbos xxmaj my muscles have n't been very strong , and xxmaj i 've been experiencing back ache . xxmaj i 've been feeling lightheaded and wobbly on my feet , and my neck hurts . xxbos xxmaj periodically , the tingling in my throat , poor"
1,"dents , which is really xxunk . xxmaj moreover , my joints pain everyday and i have no idea what is causing it . xxbos xxmaj i 've been suffering from symptoms including a headache , chest pain , dizziness , losing my balance , and trouble concentrating . xxbos i have a stomach ache that xxunk me from falling or staying asleep . xxmaj after using the restroom , i feel",", which is really xxunk . xxmaj moreover , my joints pain everyday and i have no idea what is causing it . xxbos xxmaj i 've been suffering from symptoms including a headache , chest pain , dizziness , losing my balance , and trouble concentrating . xxbos i have a stomach ache that xxunk me from falling or staying asleep . xxmaj after using the restroom , i feel worn"
2,". xxmaj they are not painful but are concerning to me . xxbos i have been dealing with back pain , a cough that wo n't go away , and weakness in my arms and legs . xxmaj my neck hurts and i have had problems with dizziness and maintaining my balance . xxbos xxmaj for days , xxmaj i 've had a nasty cough and cold . xxmaj my sinuses are","xxmaj they are not painful but are concerning to me . xxbos i have been dealing with back pain , a cough that wo n't go away , and weakness in my arms and legs . xxmaj my neck hurts and i have had problems with dizziness and maintaining my balance . xxbos xxmaj for days , xxmaj i 've had a nasty cough and cold . xxmaj my sinuses are clogged"
3,"developing sores on my face and nose area . i am not sure what is causing this . xxmaj the sores on my face are swollen and tender to the touch , and i have a burning sensation and redness of the skin . xxbos xxmaj my bowel motions have been really difficult for me recently . xxmaj going is difficult , and it aches when i do . xxmaj when i","sores on my face and nose area . i am not sure what is causing this . xxmaj the sores on my face are swollen and tender to the touch , and i have a burning sensation and redness of the skin . xxbos xxmaj my bowel motions have been really difficult for me recently . xxmaj going is difficult , and it aches when i do . xxmaj when i go"
4,"my monthly cycle has changed , and xxmaj i 've had an unexpected vaginal discharge . i frequently experience mood swings and experience xxunk xxunk . xxbos i have headaches and migraines , and i have been having difficulties sleeping . xxmaj my entire body has been shaking and twitching . xxmaj sometimes i become lightheaded . xxbos xxmaj my urine is frequently black , red , and has a really strange","monthly cycle has changed , and xxmaj i 've had an unexpected vaginal discharge . i frequently experience mood swings and experience xxunk xxunk . xxbos i have headaches and migraines , and i have been having difficulties sleeping . xxmaj my entire body has been shaking and twitching . xxmaj sometimes i become lightheaded . xxbos xxmaj my urine is frequently black , red , and has a really strange odour"


In [12]:
learn = language_model_learner(dls_lm, AWD_LSTM, metrics=[accuracy, Perplexity()], path=path, wd=0.1).to_fp16()

  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)


In [13]:
learn.fit_one_cycle(1, 1e-2)

  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.301454,3.542589,0.342159,34.556274,00:02


In [15]:
import os

# Create a directory to save the model
os.makedirs('/kaggle/working/models', exist_ok=True)

# Set the model directory for the learner
learn.model_dir = '/kaggle/working/models'

# Now save the model
learn.save('1epoch')

Path('/kaggle/working/models/1epoch.pth')

In [16]:
learn = learn.load('1epoch')

  state = torch.load(file, map_location=device)


In [17]:
learn.unfreeze()
learn.fit_one_cycle(5, 1e-3)

  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.592164,2.991858,0.397063,19.922655,00:02
1,3.283616,2.637313,0.429253,13.975606,00:02
2,3.0539,2.458659,0.464265,11.689125,00:02
3,2.884304,2.379019,0.474609,10.794313,00:02
4,2.764172,2.36129,0.479601,10.604625,00:02


In [18]:
# Now save the model
learn.save_encoder('finetuned')

In [19]:
#learn.save_encoder('finetuned')

In [20]:
TEXT = "I have running nose, stomach and joint pains"
N_WORDS = 40
N_SENTENCES = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) 
         for _ in range(N_SENTENCES)]

  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()
  self.autocast,self.learn.scaler,self.scales = autocast(dtype=dtype),GradScaler(**self.kwargs),L()


In [21]:
print("\n".join(preds))

i have running nose , stomach and joint pains . I asthma a lot of saliva and my mouth has swollen . i also have a rash on my cheeks . My eyes are all yellow . i have been experiencing infections and have a high
i have running nose , stomach and joint pains . The sore throat is causing me a lot of discomfort and i have an extreme cough . When i perform , i feel really sick and exhausted . i have been having trouble breathing and do


In [23]:
#symptom_df = pd.read_csv(path_lm/'symptom_synth.csv',index_col=0)
#sd_df = pd.read_csv(path_lm/'symptom_disease_no_id_col.csv')
sd_df.head()

Unnamed: 0,label,text
0,Psoriasis,"I have been experiencing a skin rash on my arms, legs, and torso for the past few weeks. It is red, itchy, and covered in dry, scaly patches."
1,Psoriasis,"My skin has been peeling, especially on my knees, elbows, and scalp. This peeling is often accompanied by a burning or stinging sensation."
2,Psoriasis,"I have been experiencing joint pain in my fingers, wrists, and knees. The pain is often achy and throbbing, and it gets worse when I move my joints."
3,Psoriasis,"There is a silver like dusting on my skin, especially on my lower back and scalp. This dusting is made up of small scales that flake off easily when I scratch them."
4,Psoriasis,"My nails have small dents or pits in them, and they often feel inflammatory and tender to the touch. Even there are minor rashes on my arms."


In [24]:
# Check for NaN values in the label column
print(sd_df['label'].isna().sum())

# If there are NaNs, you can drop those rows
#df = df.dropna(subset=['label'])

0


In [34]:
dls_clas = TextDataLoaders.from_df(sd_df, path=path,valid='test', text_vocab=dls_lm.vocab)

  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)


In [35]:
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)


In [36]:
from pathlib import Path
learn.path = Path('/kaggle/working')

In [37]:
learn = learn.load_encoder('finetuned')

  wgts = torch.load(join_path_file(file,self.path/self.model_dir, ext='.pth'), map_location=device)


In [38]:
len(dls_lm.vocab)

944

In [39]:
learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,time


  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)


KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/transforms.py", line 263, in encodes
    return TensorCategory(self.vocab.o2i[o])
KeyError: 'I have a cough that has continued for days, and I feel really weak and tired. My fever is high, and my breath has become strained. When I cough, I also generate a lot of mucus.'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
    data = next(self.dataset_iter)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/load.py", line 140, in create_batches
    yield from map(self.do_batch, self.chunkify(res))
  File "/opt/conda/lib/python3.10/site-packages/fastcore/basics.py", line 245, in chunked
    res = list(itertools.islice(it, chunk_sz))
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/load.py", line 170, in do_item
    try: return self.after_item(self.create_item(s))
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/load.py", line 177, in create_item
    if self.indexed: return self.dataset[s or 0]
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 449, in __getitem__
    res = tuple([tl[it] for tl in self.tls])
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 449, in <listcomp>
    res = tuple([tl[it] for tl in self.tls])
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 408, in __getitem__
    return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 368, in _after_item
    def _after_item(self, o): return self.tfms(o)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 210, in __call__
    def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 160, in compose_tfms
    x = f(x, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 83, in __call__
    def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 93, in _call
    return self._do_call(getattr(self, fn), x, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 99, in _do_call
    return retain_type(f(x, **kwargs), x, ret)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/dispatch.py", line 122, in __call__
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/transforms.py", line 265, in encodes
    raise KeyError(f"Label '{o}' was not included in the training dataset") from e
KeyError: "Label 'I have a cough that has continued for days, and I feel really weak and tired. My fever is high, and my breath has become strained. When I cough, I also generate a lot of mucus.' was not included in the training dataset"


In [33]:
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))

epoch,train_loss,valid_loss,accuracy,time


  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)
  o = r[c] if isinstance(c, int) or not c in getattr(r, '_fields', []) else getattr(r, c)


KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/transforms.py", line 263, in encodes
    return TensorCategory(self.vocab.o2i[o])
KeyError: "My bowel motions are giving me a lot of problems right now. Going is difficult, and going hurts when I do it. When I go, my anus bleeds and is really uncomfortable. I'm in a lot of discomfort and it hurts extremely bad."

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
    data = next(self.dataset_iter)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/load.py", line 140, in create_batches
    yield from map(self.do_batch, self.chunkify(res))
  File "/opt/conda/lib/python3.10/site-packages/fastcore/basics.py", line 245, in chunked
    res = list(itertools.islice(it, chunk_sz))
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/load.py", line 170, in do_item
    try: return self.after_item(self.create_item(s))
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/load.py", line 177, in create_item
    if self.indexed: return self.dataset[s or 0]
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 449, in __getitem__
    res = tuple([tl[it] for tl in self.tls])
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 449, in <listcomp>
    res = tuple([tl[it] for tl in self.tls])
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 408, in __getitem__
    return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/core.py", line 368, in _after_item
    def _after_item(self, o): return self.tfms(o)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 210, in __call__
    def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 160, in compose_tfms
    x = f(x, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 83, in __call__
    def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 93, in _call
    return self._do_call(getattr(self, fn), x, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/transform.py", line 99, in _do_call
    return retain_type(f(x, **kwargs), x, ret)
  File "/opt/conda/lib/python3.10/site-packages/fastcore/dispatch.py", line 122, in __call__
    return f(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/fastai/data/transforms.py", line 265, in encodes
    raise KeyError(f"Label '{o}' was not included in the training dataset") from e
KeyError: "Label 'My bowel motions are giving me a lot of problems right now. Going is difficult, and going hurts when I do it. When I go, my anus bleeds and is really uncomfortable. I'm in a lot of discomfort and it hurts extremely bad.' was not included in the training dataset"


In [None]:
learn.unfreeze()
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3))

In [None]:
learn.predict("I am having a running stomach, fever, general body weakness and have been getting bitten by mosquitoes often")

In [41]:
# For language model
dls_lm = TextDataLoaders.from_df(
    symptom_df,
    text_col='text',
    is_lm=True,
    valid_pct=0.2
)

# Create and train language model
learn = language_model_learner(dls_lm, AWD_LSTM, metrics=[accuracy, Perplexity()], 
                              path=Path('/kaggle/working'), wd=0.1)
learn.fit_one_cycle(1, 1e-2)
learn.save_encoder('finetuned')

# For classifier
dls_clas = TextDataLoaders.from_df(
    sd_df,  # Your labeled dataset
    text_col='text',
    label_col='label',
    valid_pct=0.2,
    text_vocab=dls_lm.vocab
)

# Create classifier model
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, 
                               metrics=accuracy, 
                               path=Path('/kaggle/working'))

# Load the fine-tuned encoder
learn.load_encoder('finetuned')

# Train classifier
learn.fit_one_cycle(1, 2e-2)

  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)


epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.342138,3.592784,0.3682,36.335079,00:01


  wgts = torch.load(wgts_fname, map_location = lambda storage,loc: storage)
  wgts = torch.load(join_path_file(file,self.path/self.model_dir, ext='.pth'), map_location=device)


epoch,train_loss,valid_loss,accuracy,time
0,2.461782,2.590309,0.304167,00:01
