<a href="https://colab.research.google.com/github/moin1306/ai_chatbot/blob/main/copy_of_medichat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**MediChat : AI Health Assistant Chatbot**

This project developed a creative AI-powered medical chatbot to support users with health-related questions. By fine-tuning a language model on a specialized medical dataset, the chatbot provides accurate and reliable answers tailored to user queries. A streamlined Gradio interface allows users of all skill levels to easily input questions and receive clear, practical responses. Built in Python, the project focuses on efficiency and flexibility, optimizing the model to deliver dependable health guidance in an accessible, user-friendly way, enhancing access to healthcare information.



---



 **Installing Unsloth**

Installs unsloth and its latest version from GitHub for model loading and fine-tuning.

In [None]:
!pip install unsloth # install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git # Also get the latest version Unsloth!

**Import Libraries**  
Imports libraries for model loading, fine-tuning, and dataset handling.

In [None]:
# Step3: Import necessary libraries
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported
from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset
import wandb

**Check Hugging Face Token**   
Authenticates with Hugging Face using a token stored in Colab secrets.

In [None]:
# Step4: Check HF token
from google.colab import userdata
hf_token = userdata.get('HF_TOKEN')
login(hf_token)

**Check GPU Availability**

Verifies CUDA and GPU availability on Colab.

In [None]:
# Test if CUDA is available
import torch
print("CUDA available:", torch.cuda.is_available())
print("GPU device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU")

**Load Pretrained Model**

Loads the deepseek-ai/deepseek-llm-7b-chat model with 4-bit quantization (7B Parameters).

In [None]:
# Step5: Setup pretrained DeepSeek-R1

model_name = "deepseek-ai/deepseek-llm-7b-chat"
max_sequence_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_sequence_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token
)

**Define System Prompt**

Sets up a prompt template for medical question-answering.

In [None]:
# Step6: Setup system prompt
prompt_style = """
Below is a task description along with additional context provided in the input section. Your goal is to provide a well-reasoned response that effectively addresses the request.

Before crafting your answer, take a moment to carefully analyze the question. Develop a clear, step-by-step thought process to ensure your response is both logical and accurate.

### Task:
You are a medical expert specializing in clinical reasoning, diagnostics, and treatment planning. Answer the medical question below using your advanced knowledge.

### Query:
{}

### Answer:
<think>{}
"""

**Run Inference (Pre-Fine-Tuning)**

Tests the pretrained model with a medical question about cystometry.

In [None]:
# Step7: Run Inference on the model

# Define a test question
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or
              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

FastLanguageModel.for_inference(model)

# Tokenize the input
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response
outputs = model.generate (
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

# Decode the response tokens back to text
response = tokenizer.batch_decode(outputs)


print(response)


In [None]:
print(response[0].split("### Answer:")[1])

**Load Medical Dataset**

Loads a medical dataset for fine-tuning.

In [None]:
# Step8: Setup fine-tuning

# Load Dataset
medical_dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en", split = "train[:500]", trust_remote_code = True)

In [None]:
medical_dataset[1]

**Define EOS Token**

Retrieves the model’s end-of-sequence token.

In [None]:
EOS_TOKEN = tokenizer.eos_token  # Define EOS_TOKEN which tells the model when to stop generating text during training
EOS_TOKEN

**Define Training Prompt**
Sets up a prompt template for fine-tuning.

In [None]:
### Finetuning
# Updated training prompt style to add </think> tag
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""



**Preprocess Dataset**

Prepare the data for fine-tuning

In [None]:
# Prepare the data for fine-tuning

def preprocess_input_data(examples):
  inputs = examples["Question"]
  cots = examples["Complex_CoT"]
  outputs = examples["Response"]

  texts = []

  for input, cot, output in zip(inputs, cots, outputs):
    text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
    texts.append(text)

  return {
      "texts" : texts,
  }

In [None]:
finetune_dataset = medical_dataset.map(preprocess_input_data, batched = True)

In [None]:
finetune_dataset["texts"][0]

**Apply LoRA**

Configures LoRA (Low-Rank Adaptation) for efficient fine-tuning.

In [None]:
# Step9: Setup/Apply LoRA finetuning to the model

model_lora = FastLanguageModel.get_peft_model(
    model = model,
    r = 16,
    target_modules = [
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj"
    ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3047,
    use_rslora = False,
    loftq_config = None
)

**Clear Model Attribute**

Removes a potential conflicting attribute from the model

In [None]:
# Add this before creating the trainer
if hasattr(model, '_unwrapped_old_generate'):
    del model._unwrapped_old_generate

**Set Up Fine-Tuning Trainer**

Configures the SFTTrainer for fine-tuning.

In [None]:
trainer = SFTTrainer(
    model = model_lora,
    tokenizer = tokenizer,
    train_dataset = finetune_dataset,
    dataset_text_field = "texts",
    max_seq_length = max_sequence_length,
    dataset_num_proc = 1,

    # Define training args
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        num_train_epochs = 1,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir = "outputs",
    ),
)

**Set Up Weights & Biases (W&B)**

Configures W&B for training monitoring.

In [None]:
# Setup WANDB
from google.colab import userdata
wnb_token = userdata.get("WANDB_TO_TOKEN")
# Login to WnB
wandb.login(key=wnb_token) # import wandb
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-on-Medical-CoT-Dataset',
    job_type="training",
    anonymous="allow"
)


**Start Fine-Tuning**

Runs the fine-tuning process.

In [None]:
# Start the fine-tuning process
trainer_stats = trainer.train()

**Finish W&B Run**

Closes the W&B session.

In [None]:
wandb.finish()

**Test Fine-Tuned Model (Cystometry)**

Tests the fine-tuned model with the cystometry question.

In [None]:
# Step10: Testing after fine-tuning
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing
              but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

FastLanguageModel.for_inference(model_lora)

# Tokenize the input
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response
outputs = model_lora.generate (
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

# Decode the response tokens back to text
response = tokenizer.batch_decode(outputs)

print(response)

In [None]:
print(response[0].split("### Answer:")[1])

**Test Fine-Tuned Model (Aortic Valve)**

Tests the fine-tuned model with a new question about aortic valve vegetation.

In [None]:
question = """A 59-year-old man presents with a fever, chills, night sweats, and generalized fatigue,
              and is found to have a 12 mm vegetation on the aortic valve. Blood cultures indicate gram-positive, catalase-negative,
              gamma-hemolytic cocci in chains that do not grow in a 6.5% NaCl medium.
              What is the most likely predisposing factor for this patient's condition?"""

FastLanguageModel.for_inference(model_lora)

# Tokenize the input
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response
outputs = model_lora.generate (
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

# Decode the response tokens back to text
response = tokenizer.batch_decode(outputs)

print(response[0].split("### Answer:")[1])

**User Interface**

 Developed a user-friendly interface with Gradio, allowing seamless input of health prompts and display of informative responses,
 enhancing accessibility for non-technical users.

In [None]:

# Step 1: Install Gradio (skip if already installed)
print("Installing Gradio...")
!pip install gradio --quiet
print("Gradio installed.")

# Step 2: Import Gradio
try:
    import gradio as gr
except ImportError as e:
    raise ImportError(f"Failed to import Gradio: {e}. Please ensure Gradio is installed.")

# Step 3: Placeholder chatbot function (no model loading)
def ai_doctor_chatbot(question):
    if not question.strip():
        return "Please enter a medical question."

    # Placeholder response (replace with model inference after fixing notebook)
    placeholder_response = (
        "This is a placeholder response because the fine-tuned model is not available. "
        "To enable AI responses, please fix AI_Doctor_3.ipynb by updating Cell 7 with a valid model "
        "(e.g., 'mistralai/Mixtral-8x7B-Instruct-v0.1'), re-run the notebook to generate the 'outputs' directory, "
        "and then use the full Gradio script with model loading."
    )

    return f"**Disclaimer**: This is an AI-generated response. Consult a doctor for professional medical advice.\n\n**Response**:\n{placeholder_response}"

# Step 4: Create minimal Gradio interface
interface = gr.Interface(
    fn=ai_doctor_chatbot,
    inputs=gr.Textbox(
        label="Ask a Medical Question",
        placeholder="E.g., What would cystometry reveal for stress urinary incontinence?"
    ),
    outputs=gr.Textbox(label="Medichat Response"),
    title=" MediChat: AI Health Assistant Chatbot",
    description="Get instant, reliable answers to your health-related questions. MediChat uses AI to provide supportive information based on real medical conversations.",
    theme="soft",  # Clean, Grok-like aesthetic
    allow_flagging="never"  # Disable flagging for simplicity
)

# Step 5: Launch the interface
print("Launching Gradio interface...")
interface.launch()
