<a href="https://colab.research.google.com/github/moin1306/BreastCancer_ML/blob/main/Medichat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**MediChat : AI Health Assistant Chatbot**

This project developed a creative AI-powered medical chatbot to support users with health-related questions. By fine-tuning a language model on a specialized medical dataset, the chatbot provides accurate and reliable answers tailored to user queries. A streamlined Gradio interface allows users of all skill levels to easily input questions and receive clear, practical responses. Built in Python, the project focuses on efficiency and flexibility, optimizing the model to deliver dependable health guidance in an accessible, user-friendly way, enhancing access to healthcare information.



---



 **Installing Unsloth**

Installs unsloth and its latest version from GitHub for model loading and fine-tuning.

In [None]:
!pip install unsloth # install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git # Also get the latest version Unsloth!

Collecting git+https://github.com/unslothai/unsloth.git
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-req-build-3q79ol3b
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-req-build-3q79ol3b
  Resolved https://github.com/unslothai/unsloth.git to commit 6c234d5a66adb76b9b93fb0f2445648199d88e66
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth: filename=unsloth-2025.3.19-py3-none-any.whl size=192661 sha256=eff58aebb7254c6c69872a15831e85c740c76936b3754f2a68ab5b589f72397f
  Stored in directory: /tmp/pip-ephem-wheel-cache-mwemf9cj/wheels/d1/17/05/850ab10c33284a4763b0595cd8ea9d01fce6e221cac24b3c01
Successfully built unsloth
Installing collected packages: unsloth


**Import Libraries**  
Imports libraries for model loading, fine-tuning, and dataset handling.

In [None]:
# Step3: Import necessary libraries
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported
from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset
import wandb

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!


**Check Hugging Face Token**   
Authenticates with Hugging Face using a token stored in Colab secrets.

In [None]:
# Step4: Check HF token
from google.colab import userdata
hf_token = userdata.get('HF_TOKEN')
login(hf_token)

**Check GPU Availability**

Verifies CUDA and GPU availability on Colab.

In [None]:
# Test if CUDA is available
import torch
print("CUDA available:", torch.cuda.is_available())
print("GPU device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU")

CUDA available: True
GPU device: Tesla T4


**Load Pretrained Model**

Loads the deepseek-ai/deepseek-llm-7b-chat model with 4-bit quantization (7B Parameters).

In [None]:
# Step5: Setup pretrained DeepSeek-R1

model_name = "deepseek-ai/deepseek-llm-7b-chat"
max_sequence_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_sequence_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token
)

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

KeyboardInterrupt: 

**Define System Prompt**

Sets up a prompt template for medical question-answering.

In [None]:
# Step6: Setup system prompt
prompt_style = """
Below is a task description along with additional context provided in the input section. Your goal is to provide a well-reasoned response that effectively addresses the request.

Before crafting your answer, take a moment to carefully analyze the question. Develop a clear, step-by-step thought process to ensure your response is both logical and accurate.

### Task:
You are a medical expert specializing in clinical reasoning, diagnostics, and treatment planning. Answer the medical question below using your advanced knowledge.

### Query:
{}

### Answer:
<think>{}
"""

**Run Inference (Pre-Fine-Tuning)**

Tests the pretrained model with a medical question about cystometry.

In [None]:
# Step7: Run Inference on the model

# Define a test question
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or
              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

FastLanguageModel.for_inference(model)

# Tokenize the input
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response
outputs = model.generate (
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

# Decode the response tokens back to text
response = tokenizer.batch_decode(outputs)


print(response)


["<｜begin▁of▁sentence｜>\nBelow is a task description along with additional context provided in the input section. Your goal is to provide a well-reasoned response that effectively addresses the request.\n\nBefore crafting your answer, take a moment to carefully analyze the question. Develop a clear, step-by-step thought process to ensure your response is both logical and accurate.\n\n### Task:\nYou are a medical expert specializing in clinical reasoning, diagnostics, and treatment planning. Answer the medical question below using your advanced knowledge.\n\n### Query:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or\n              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,\n              what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Answer:\n<think>\n1. Understand the patient's symptoms: The patient has involuntary uri

In [None]:
print(response[0].split("### Answer:")[1])


<think>
1. Understand the patient's symptoms: The patient has involuntary urine loss during activities like coughing or sneezing but no leakage at night.
2. Consider the gynecological exam and Q-tip test: These tests are used to evaluate the bladder's function, specifically the residual volume and detrusor contractions.
3. Analyze the results: Based on the patient's symptoms and the results of the gynecological exam and Q-tip test, cystometry will likely reveal a condition called stress urinary incontinence (SUI).
4. Explain the findings: Stress urinary incontinence is a common condition in which the pelvic floor muscles are weakened or weakened, leading to urine leakage during physical activities like coughing, sneezing, or lifting. The patient's history of urine loss during these activities and the absence of leakage at night supports this diagnosis.
5. Provide treatment options: Treatment options for stress urinary incontinence include behavioral interventions, such as Kegel exerci

**Load Medical Dataset**

Loads a medical dataset for fine-tuning.

In [None]:
# Step8: Setup fine-tuning

# Load Dataset
medical_dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en", split = "train[:500]", trust_remote_code = True)

README.md:   0%|          | 0.00/1.97k [00:00<?, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/58.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/19704 [00:00<?, ? examples/s]

In [None]:
medical_dataset[1]

{'Question': 'A 33-year-old woman is brought to the emergency department 15 minutes after being stabbed in the chest with a screwdriver. Given her vital signs of pulse 110/min, respirations 22/min, and blood pressure 90/65 mm Hg, along with the presence of a 5-cm deep stab wound at the upper border of the 8th rib in the left midaxillary line, which anatomical structure in her chest is most likely to be injured?',
 'Complex_CoT': "Okay, let's figure out what's going on here. A woman comes in with a stab wound from a screwdriver. It's in her chest, upper border of the 8th rib, left side, kind of around the midaxillary line. First thought, that's pretty close to where the lung sits, right?\n\nLet's talk about location first. This spot is along the left side of her body. Above the 8th rib, like that, is where a lot of important stuff lives, like the bottom part of the left lung, possibly the diaphragm too, especially considering how deep the screwdriver went.\n\nThe wound is 5 cm deep. Tha

**Define EOS Token**

Retrieves the model’s end-of-sequence token.

In [None]:
EOS_TOKEN = tokenizer.eos_token  # Define EOS_TOKEN which tells the model when to stop generating text during training
EOS_TOKEN

'<｜end▁of▁sentence｜>'

**Define Training Prompt**
Sets up a prompt template for fine-tuning.

In [None]:
### Finetuning
# Updated training prompt style to add </think> tag
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""



**Preprocess Dataset**

Prepare the data for fine-tuning

In [None]:
# Prepare the data for fine-tuning

def preprocess_input_data(examples):
  inputs = examples["Question"]
  cots = examples["Complex_CoT"]
  outputs = examples["Response"]

  texts = []

  for input, cot, output in zip(inputs, cots, outputs):
    text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
    texts.append(text)

  return {
      "texts" : texts,
  }

In [None]:
finetune_dataset = medical_dataset.map(preprocess_input_data, batched = True)

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [None]:
finetune_dataset["texts"][0]

"Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.\nPlease answer the following medical question.\n\n### Question:\nGiven the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?\n\n### Response:\n<think>\nOkay, let's see what's going on here. We've got sudden weakness in the person's left arm and leg - and that screams something neuro-related, maybe a stroke?\n\nBut wait, there's more. The right lower leg is sw

**Apply LoRA**

Configures LoRA (Low-Rank Adaptation) for efficient fine-tuning.

In [None]:
# Step9: Setup/Apply LoRA finetuning to the model

model_lora = FastLanguageModel.get_peft_model(
    model = model,
    r = 16,
    target_modules = [
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj"
    ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3047,
    use_rslora = False,
    loftq_config = None
)

Unsloth 2025.3.19 patched 30 layers with 30 QKV layers, 30 O layers and 30 MLP layers.


**Clear Model Attribute**

Removes a potential conflicting attribute from the model

In [None]:
# Add this before creating the trainer
if hasattr(model, '_unwrapped_old_generate'):
    del model._unwrapped_old_generate

**Set Up Fine-Tuning Trainer**

Configures the SFTTrainer for fine-tuning.

In [None]:
trainer = SFTTrainer(
    model = model_lora,
    tokenizer = tokenizer,
    train_dataset = finetune_dataset,
    dataset_text_field = "texts",
    max_seq_length = max_sequence_length,
    dataset_num_proc = 1,

    # Define training args
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        num_train_epochs = 1,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir = "outputs",
    ),
)

Unsloth: Tokenizing ["texts"]:   0%|          | 0/500 [00:00<?, ? examples/s]

**Set Up Weights & Biases (W&B)**

Configures W&B for training monitoring.

In [None]:
# Setup WANDB
from google.colab import userdata
wnb_token = userdata.get("WANDB_TO_TOKEN")
# Login to WnB
wandb.login(key=wnb_token) # import wandb
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-on-Medical-CoT-Dataset',
    job_type="training",
    anonymous="allow"
)


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mkhanmoin1306[0m ([33mkhanmoin1306-lnmiit[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


**Start Fine-Tuning**

Runs the fine-tuning process.

In [None]:
# Start the fine-tuning process
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 37,478,400/7,000,000,000 (0.54% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,1.7896
20,1.4021
30,1.3251
40,1.2938
50,1.3301
60,1.3051


**Finish W&B Run**

Closes the W&B session.

In [None]:
wandb.finish()

**Test Fine-Tuned Model (Cystometry)**

Tests the fine-tuned model with the cystometry question.

In [None]:
# Step10: Testing after fine-tuning
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing
              but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

FastLanguageModel.for_inference(model_lora)

# Tokenize the input
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response
outputs = model_lora.generate (
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

# Decode the response tokens back to text
response = tokenizer.batch_decode(outputs)

print(response)

["<｜begin▁of▁sentence｜>\nBelow is a task description along with additional context provided in the input section. Your goal is to provide a well-reasoned response that effectively addresses the request.\n\nBefore crafting your answer, take a moment to carefully analyze the question. Develop a clear, step-by-step thought process to ensure your response is both logical and accurate.\n\n### Task:\nYou are a medical expert specializing in clinical reasoning, diagnostics, and treatment planning. Answer the medical question below using your advanced knowledge.\n\n### Query:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing\n              but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,\n              what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Answer:\n<think>\nOkay, so we're dealing with a 61-year-old woman who's been dealing wi

In [None]:
print(response[0].split("### Answer:")[1])


<think>
Alright, let's think about this. We've got a 59-year-old guy who's got a fever, chills, night sweats, and he's just feeling really wiped out. Plus, he's got a 12 mm vegetation on his aortic valve. That's not good.

Now, let's look at the bacteria in his blood. They're gram-positive, catalase-negative, and they're gamma-hemolytic. That's a pretty specific profile. Those kinds of bugs usually come from the Lancefield group B streptococci.

What's interesting is that they don't grow in a 6.5% NaCl medium. That's usually what bacteria grow in, especially in blood cultures. So, if they don't grow in NaCl, that means they're resistant to it.

Now, what's common in patients who have these types of bacteria? It's often a connection to some kind of immunocompromise, like AIDS, or some other condition that weakens their immune system.

So, if we're thinking about this in terms of what's common for people with these types of bacteria and these types of symptoms, it's really pointing towa

**Test Fine-Tuned Model (Aortic Valve)**

Tests the fine-tuned model with a new question about aortic valve vegetation.

In [None]:
question = """A 59-year-old man presents with a fever, chills, night sweats, and generalized fatigue,
              and is found to have a 12 mm vegetation on the aortic valve. Blood cultures indicate gram-positive, catalase-negative,
              gamma-hemolytic cocci in chains that do not grow in a 6.5% NaCl medium.
              What is the most likely predisposing factor for this patient's condition?"""

FastLanguageModel.for_inference(model_lora)

# Tokenize the input
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response
outputs = model_lora.generate (
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

# Decode the response tokens back to text
response = tokenizer.batch_decode(outputs)

print(response[0].split("### Answer:")[1])


<think>
Alright, let's think about this. We've got a 59-year-old guy who's got a fever, chills, night sweats, and he's just feeling really wiped out. Plus, he's got a 12 mm vegetation on his aortic valve. That's not good.

Now, let's look at the bacteria in his blood. They're gram-positive, catalase-negative, and they're gamma-hemolytic. That's a pretty specific profile. Those kinds of bugs usually come from the Lancefield group B streptococci.

What's interesting is that they don't grow in a 6.5% NaCl medium. That's usually what bacteria grow in, especially in blood cultures. So, if they don't grow in NaCl, that means they're resistant to it.

Now, what's common in patients who have these types of bacteria? It's often a connection to some kind of immunocompromise, like AIDS, or some other condition that weakens their immune system.

So, if we're thinking about this in terms of what's common for people with these types of bacteria and these types of symptoms, it's really pointing towa

**User Interface**

 Developed a user-friendly interface with Gradio, allowing seamless input of health prompts and display of informative responses,
 enhancing accessibility for non-technical users.

In [None]:

# Step 1: Install Gradio (skip if already installed)
print("Installing Gradio...")
!pip install gradio --quiet
print("Gradio installed.")

# Step 2: Import Gradio
try:
    import gradio as gr
except ImportError as e:
    raise ImportError(f"Failed to import Gradio: {e}. Please ensure Gradio is installed.")

# Step 3: Placeholder chatbot function (no model loading)
def ai_doctor_chatbot(question):
    if not question.strip():
        return "Please enter a medical question."

    # Placeholder response (replace with model inference after fixing notebook)
    placeholder_response = (
        "This is a placeholder response because the fine-tuned model is not available. "
        "To enable AI responses, please fix AI_Doctor_3.ipynb by updating Cell 7 with a valid model "
        "(e.g., 'mistralai/Mixtral-8x7B-Instruct-v0.1'), re-run the notebook to generate the 'outputs' directory, "
        "and then use the full Gradio script with model loading."
    )

    return f"**Disclaimer**: This is an AI-generated response. Consult a doctor for professional medical advice.\n\n**Response**:\n{placeholder_response}"

# Step 4: Create minimal Gradio interface
interface = gr.Interface(
    fn=ai_doctor_chatbot,
    inputs=gr.Textbox(
        label="Ask a Medical Question",
        placeholder="E.g., What would cystometry reveal for stress urinary incontinence?"
    ),
    outputs=gr.Textbox(label="Medichat Response"),
    title=" MediChat: AI Health Assistant Chatbot",
    description="Get instant, reliable answers to your health-related questions. MediChat uses AI to provide supportive information based on real medical conversations.",
    theme="soft",  # Clean, Grok-like aesthetic
    allow_flagging="never"  # Disable flagging for simplicity
)

# Step 5: Launch the interface
print("Launching Gradio interface...")
interface.launch()


Installing Gradio...
Gradio installed.




Launching Gradio interface...
It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://71b198f93a0447cbc3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


