<a href="https://colab.research.google.com/github/surya9112005/MRDRAG/blob/main/Singleprototype.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install dependencies
!pip install sentence-transformers scikit-learn pandas

import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans
import numpy as np

# ---- Load dataset ----
df = pd.read_csv("/content/cardiology-qa-training-data.csv")  # <-- Update path if needed

# ---- Initialize embedding model ----
model = SentenceTransformer("all-MiniLM-L6-v2")

# Encode all questions
question_embeddings = model.encode(df["question"].tolist(), convert_to_tensor=False)

# ---- Create Experts by Clustering ----
num_experts = 5
kmeans = KMeans(n_clusters=num_experts, random_state=42)
clusters = kmeans.fit_predict(question_embeddings)

df["expert_id"] = clusters

# Store experts (grouped data)
experts = {i: df[df["expert_id"] == i] for i in range(num_experts)}

# Precompute embeddings per expert
expert_embeddings = {
    i: model.encode(experts[i]["question"].tolist(), convert_to_tensor=False)
    for i in experts
}

print(f"✅ Created {num_experts} experts from dataset")

# ---- Router Function ----
def route_to_expert(user_query):
    query_emb = model.encode([user_query], convert_to_tensor=False)
    expert_scores = cosine_similarity(query_emb, kmeans.cluster_centers_)[0]
    chosen_expert = expert_scores.argmax()
    confidence = expert_scores[chosen_expert]
    return chosen_expert, query_emb, confidence

# ---- Dynamic MoE Answer Function ----
def get_precise_moe_answer(user_query, top_k=1):
    chosen_expert, query_emb, expert_conf = route_to_expert(user_query)

    expert_df = experts[chosen_expert]
    expert_embs = expert_embeddings[chosen_expert]

    sims = cosine_similarity(query_emb, expert_embs)[0]
    top_idx = sims.argsort()[-top_k:][::-1]

    # Always ask clarifying question for precision
    clarifying_question = "To provide a more precise answer, can you give more details? For example, duration, severity, or associated symptoms."

    results = []
    for idx in top_idx:
        row = expert_df.iloc[idx]
        results.append({
            "expert_id": chosen_expert,
            "matched_question": row["question"],
            "answer": row["answer"],
            "similarity": round(sims[idx], 3),
            "expert_confidence": round(expert_conf, 3),
            "clarifying_needed": True,
            "clarifying_question": clarifying_question
        })

    return results




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Created 5 experts from dataset


In [None]:
# ---- Step 1: Ask initial query ----
user_query = input("Enter your query: ")

# ---- Step 2: Get initial MoE response ----
initial_response = get_precise_moe_answer(user_query)[0]

# Display initial answer and clarifying question
print("\nExpert Matched Question:", initial_response["matched_question"])
print("Initial Answer:", initial_response["answer"])
print("Similarity:", initial_response["similarity"])
print("Expert Confidence:", initial_response["expert_confidence"])
print("\nClarifying Question:", initial_response["clarifying_question"])

# ---- Step 3: Ask user for clarification ----
clarification = input("\nYour response to the clarifying question: ")

# ---- Step 4: Combine original query + clarification for refinement ----
refined_query = user_query + " Additional info: " + clarification

# ---- Step 5: Get refined answer ----
refined_response = get_precise_moe_answer(refined_query)[0]

# Display refined result
print("\n--- Refined Response ---")
print("Expert Matched Question:", refined_response["matched_question"])
print("Answer:", refined_response["answer"])
print("Similarity:", refined_response["similarity"])
print("Expert Confidence:", refined_response["expert_confidence"])


Enter your query: "Lately I get really tired even after small activities, sometimes my chest feels heavy, and I get short of breath easily. My heart sometimes feels like it’s beating strangely. What could be wrong with me?"

Expert Matched Question: For the past 3 weeks i have felt like my heart stops then i loose my breath it is a very odd feeling in my chest. This is becoming a more common occuring up to 8 times per day at rest or whilst engaged in activity. I am 37 with no previous conditions. I am starting to feel dizzy - could this be related and should i see a doctor?
Initial Answer: hi their have read your query and i want to suggest you that you need to get an ecg done and check your serum electrolytes also and consult a cardiologist in person. if anything doesn't show up on ecg i would like to do a 24hr ecg monitoring (holder monitoring) to rule out if any rhythm abnormality going on and an echocardiography also to see for lv function and structural integrity of the heart. goo

In [None]:
# --- Step 1: Initial user query ---
user_query = input("Patient: ")

while True:
    # --- Step 2: Get MoE response (short) ---
    response = get_precise_moe_answer(user_query)[0]

    # Doctor gives concise answer
    print(f"\nDoctor: {response['answer'].split('.')[0]}.")  # take first sentence for conciseness

    # Doctor asks a clarifying question
    print(f"Doctor: {response['clarifying_question']}")

    # Patient responds
    user_clarification = input("Patient: ")

    # Update query with clarification for next iteration
    user_query += " Additional info: " + user_clarification

    # Compute similarity/confidence to see if we are confident enough to stop
    if response['expert_confidence'] > 0.85:  # you can tune this threshold
        print("\nDoctor: Based on your responses, the likely concern is a cardiac issue. I recommend seeing a cardiologist for ECG, echocardiography, and possibly Holter monitoring.")
        break
    else:
        print("\n--- Continuing clarification ---")


Patient: Sometimes I feel a fluttering or skipping in my chest, especially when I’m resting or lying down. It only lasts a few seconds, but it happens several times a day. I also feel lightheaded occasionally. Could this be a problem with my heart?"

Doctor: if you have had the symptoms evaluated and found no underlying heart disease or dangerous rhythm, the primary concern is to relieve your symptoms and anxiety.
Doctor: To provide a more precise answer, can you give more details? For example, duration, severity, or associated symptoms.
Patient: "These palpitations have been happening for about 3 weeks. They usually last 10–20 seconds and happen a few times a day. I sometimes feel dizzy or a little short of breath when it happens, but it goes back to normal quickly."

--- Continuing clarification ---

Doctor: if you have had the symptoms evaluated and found no underlying heart disease or dangerous rhythm, the primary concern is to relieve your symptoms and anxiety.
Doctor: To provide 

KeyboardInterrupt: Interrupted by user

In [None]:
# Install dependencies
!pip install sentence-transformers scikit-learn pandas

import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans

# -------------------- Load dataset --------------------
df = pd.read_csv("/content/cardiology-qa-training-data.csv")  # replace with your path

# -------------------- Initialize embedding model --------------------
model = SentenceTransformer("all-MiniLM-L6-v2")
question_embeddings = model.encode(df["question"].tolist(), convert_to_tensor=False)

# -------------------- Create Experts --------------------
num_experts = 5
kmeans = KMeans(n_clusters=num_experts, random_state=42)
clusters = kmeans.fit_predict(question_embeddings)
df["expert_id"] = clusters
experts = {i: df[df["expert_id"] == i] for i in range(num_experts)}
expert_embeddings = {i: model.encode(experts[i]["question"].tolist(), convert_to_tensor=False) for i in experts}

# -------------------- Router function --------------------
def route_to_expert(user_query):
    query_emb = model.encode([user_query], convert_to_tensor=False)
    expert_scores = cosine_similarity(query_emb, kmeans.cluster_centers_)[0]
    chosen_expert = expert_scores.argmax()
    return chosen_expert, query_emb

# -------------------- MoE iterative refinement --------------------
def refine_answer_to_store(user_query, max_clarifications=3, confidence_threshold=0.85):
    full_query = user_query
    clarifications = 0

    while clarifications < max_clarifications:
        chosen_expert, query_emb = route_to_expert(full_query)
        expert_df = experts[chosen_expert]
        expert_embs = expert_embeddings[chosen_expert]

        sims = cosine_similarity(query_emb, expert_embs)[0]
        top_idx = sims.argsort()[-1:][::-1][0]  # top 1 match

        matched_question = expert_df.iloc[top_idx]["question"]
        answer = expert_df.iloc[top_idx]["answer"]
        confidence = sims[top_idx]

        # If confidence is high enough, stop
        if confidence >= confidence_threshold:
            break

        # Ask clarifying question
        clarifying_question = "Doctor: To provide a more precise answer, please give duration, severity, and any associated symptoms."
        print(f"\n{clarifying_question}")
        additional_info = input("Patient: ")
        full_query += " Additional info: " + additional_info
        clarifications += 1

    # Return final refined answer and metadata
    return {
        "user_query": user_query,
        "refined_query": full_query,
        "expert_id": chosen_expert,
        "matched_question": matched_question,
        "final_answer": answer,
        "similarity": round(confidence, 3)
    }

# -------------------- Main interaction --------------------
patient_query = input("Patient: ")
refined_response = refine_answer_to_store(patient_query)

# Store in dataframe and CSV
df_results = pd.DataFrame([refined_response])
df_results.to_csv("/content/refined_cardiology_qa.csv", index=False)

# -------------------- Display final response --------------------
print("\n✅ Final refined answer stored in dataframe and CSV file.\n")
print("User Query:", refined_response["user_query"])
print("Refined Query:", refined_response["refined_query"])
print("Expert Matched Question:", refined_response["matched_question"])
print("Final Answer:", refined_response["final_answer"])
print("Similarity / Confidence:", refined_response["similarity"])


Patient: "Doctor, sometimes I feel like my heart skips a beat or flutters, especially when I’m sitting quietly or lying down. It only lasts a few seconds, but it happens several times a day. Sometimes I feel a little dizzy or lightheaded. Could this be a problem with my heart?"

Doctor: To provide a more precise answer, please give duration, severity, and any associated symptoms.
Patient: hese fluttering episodes have been happening for about 2 weeks. Each one lasts around 10–15 seconds. I feel a little dizzy sometimes, but there’s no chest pain or fainting. They happen a few times a day, mostly when I’m resting."

Doctor: To provide a more precise answer, please give duration, severity, and any associated symptoms.
Patient: "The palpitations happen a few times a day for the past 3 weeks, each lasting 10–20 seconds, sometimes causing slight dizziness, but no chest pain or fainting."

Doctor: To provide a more precise answer, please give duration, severity, and any associated symptoms.


In [None]:
# Install dependencies
!pip install sentence-transformers scikit-learn pandas

import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans

# -------------------- Load dataset --------------------
df = pd.read_csv("/content/cardiology-qa-training-data.csv")  # replace with your path

# -------------------- Initialize embedding model --------------------
model = SentenceTransformer("all-MiniLM-L6-v2")
question_embeddings = model.encode(df["question"].tolist(), convert_to_tensor=False)

# -------------------- Create Experts --------------------
num_experts = 5
kmeans = KMeans(n_clusters=num_experts, random_state=42)
clusters = kmeans.fit_predict(question_embeddings)
df["expert_id"] = clusters
experts = {i: df[df["expert_id"] == i] for i in range(num_experts)}
expert_embeddings = {i: model.encode(experts[i]["question"].tolist(), convert_to_tensor=False) for i in experts}

# -------------------- Router function --------------------
def route_to_expert(user_query):
    query_emb = model.encode([user_query], convert_to_tensor=False)
    expert_scores = cosine_similarity(query_emb, kmeans.cluster_centers_)[0]
    chosen_expert = expert_scores.argmax()
    return chosen_expert, query_emb

# -------------------- Iterative doctor-patient MoE --------------------
def doctor_patient_session(user_query, max_iterations=5):
    full_query = user_query
    iterations = 0
    patient_info = {}

    print(f"\nPatient: {user_query}")

    while iterations < max_iterations:
        chosen_expert, query_emb = route_to_expert(full_query)
        expert_df = experts[chosen_expert]
        expert_embs = expert_embeddings[chosen_expert]

        sims = cosine_similarity(query_emb, expert_embs)[0]
        top_idx = sims.argsort()[-1:][::-1][0]  # top 1 match

        matched_question = expert_df.iloc[top_idx]["question"]
        answer = expert_df.iloc[top_idx]["answer"]
        confidence = sims[top_idx]

        # Doctor asks a clarifying question like in real life
        clarifying_questions = [
            "Do you feel chest pain during these episodes? (yes/no)",
            "How long does each episode last? (seconds/minutes)",
            "How many times per day do these episodes occur?",
            "Do you feel dizzy, faint, or short of breath?",
            "Have you noticed swelling in feet/ankles or other symptoms?"
        ]

        # Stop asking if confidence is high enough
        if confidence >= 0.85 or iterations == max_iterations-1:
            print(f"\nDoctor (Final Advice): Based on your information, {answer}")
            break

        # Ask next clarifying question
        clarifying_question = clarifying_questions[iterations % len(clarifying_questions)]
        print(f"\nDoctor: {clarifying_question}")
        patient_response = input("Patient: ")
        patient_info[clarifying_question] = patient_response

        # Update full_query with patient's latest response
        full_query += " " + patient_response
        iterations += 1

    # Store results in a dataframe
    refined_response = {
        "user_query": user_query,
        "refined_query": full_query,
        "expert_id": chosen_expert,
        "matched_question": matched_question,
        "final_answer": answer,
        "similarity": round(confidence, 3),
        "patient_responses": patient_info
    }

    df_results = pd.DataFrame([refined_response])
    df_results.to_csv("/content/refined_cardiology_qa.csv", index=False)
    print("\n✅ Session completed. Final answer stored in CSV.")

# -------------------- Start interactive session --------------------
user_query = input("Enter your query: ")
doctor_patient_session(user_query)


Enter your query: Patient: "Doctor, sometimes my heart feels like it’s skipping a beat or fluttering, especially when I’m resting or lying down. Each episode lasts only a few seconds, but it happens a few times a day. I sometimes feel a little dizzy or lightheaded. I haven’t had chest pain or fainting. Could this be a problem with my heart?"

Patient: Patient: "Doctor, sometimes my heart feels like it’s skipping a beat or fluttering, especially when I’m resting or lying down. Each episode lasts only a few seconds, but it happens a few times a day. I sometimes feel a little dizzy or lightheaded. I haven’t had chest pain or fainting. Could this be a problem with my heart?"

Doctor: Do you feel chest pain during these episodes? (yes/no)
Patient: yes

Doctor: How long does each episode last? (seconds/minutes)
Patient: 30 sec

Doctor: How many times per day do these episodes occur?
Patient: 3-4 times while i rest

Doctor: Do you feel dizzy, faint, or short of breath?
Patient: yes

Doctor (F

In [None]:
# Install dependencies
!pip install sentence-transformers scikit-learn pandas nltk

import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize

# -------------------- Load dataset --------------------
df = pd.read_csv("/content/cardiology-qa-training-data.csv")  # replace with your path

# -------------------- Initialize embedding model --------------------
model = SentenceTransformer("all-MiniLM-L6-v2")
question_embeddings = model.encode(df["question"].tolist(), convert_to_tensor=False)

# -------------------- Create Experts --------------------
num_experts = 5
kmeans = KMeans(n_clusters=num_experts, random_state=42)
clusters = kmeans.fit_predict(question_embeddings)
df["expert_id"] = clusters
experts = {i: df[df["expert_id"] == i] for i in range(num_experts)}
expert_embeddings = {i: model.encode(experts[i]["question"].tolist(), convert_to_tensor=False) for i in experts}

# -------------------- Router function --------------------
def route_to_expert(user_query):
    query_emb = model.encode([user_query], convert_to_tensor=False)
    expert_scores = cosine_similarity(query_emb, kmeans.cluster_centers_)[0]
    chosen_expert = expert_scores.argmax()
    return chosen_expert, query_emb

# -------------------- Dynamic doctor-patient session --------------------
def doctor_patient_session(user_query, max_iterations=5):
    full_query = user_query
    iterations = 0
    patient_info = {}

    print(f"\nPatient: {user_query}")

    clarifying_questions = [
        "Do you feel chest pain or pressure during these episodes? (yes/no)",
        "How long does each episode last? (seconds/minutes)",
        "How many times per day do these episodes occur?",
        "Do you feel dizzy, faint, or short of breath?",
        "Have you noticed swelling in feet/ankles or other symptoms?"
    ]

    urgent_symptoms = ["chest pain", "fainting", "severe shortness of breath"]

    while iterations < max_iterations:
        chosen_expert, query_emb = route_to_expert(full_query)
        expert_df = experts[chosen_expert]
        expert_embs = expert_embeddings[chosen_expert]
        sims = cosine_similarity(query_emb, expert_embs)[0]
        top_idx = sims.argsort()[-1:][::-1][0]

        matched_question = expert_df.iloc[top_idx]["question"]
        answer = expert_df.iloc[top_idx]["answer"]
        confidence = sims[top_idx]

        # If dangerous symptom detected, provide urgent advice immediately
        if any(symptom in full_query.lower() for symptom in urgent_symptoms):
            print("\nDoctor (Urgent Advice): Your symptoms may indicate a serious heart problem. Please visit a cardiologist immediately or call emergency services.")
            break

        # Stop iterating if similarity is very high
        if confidence >= 0.85 or iterations == max_iterations-1:
            print(f"\nDoctor (Final Advice): {answer}")
            break

        # Ask dynamic clarifying question
        question = clarifying_questions[iterations % len(clarifying_questions)]
        print(f"\nDoctor: {question}")
        patient_response = input("Patient: ")
        patient_info[question] = patient_response

        # Update query with new patient info for next iteration
        full_query += " " + patient_response
        iterations += 1

    # Save session to CSV
    refined_response = {
        "user_query": user_query,
        "refined_query": full_query,
        "expert_id": chosen_expert,
        "matched_question": matched_question,
        "final_answer": answer,
        "similarity": round(confidence, 3),
        "patient_responses": patient_info
    }
    df_results = pd.DataFrame([refined_response])
    df_results.to_csv("/content/refined_cardiology_qa.csv", index=False)
    print("\n✅ Session completed. Final answer stored in CSV.")

# -------------------- Start session --------------------
user_query = input("Enter your query: ")
doctor_patient_session(user_query)




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Enter your query: "Doctor, I get very tired even after walking a short distance, and sometimes I feel short of breath when lying down. My feet and ankles swell a bit by evening. I also notice my heart sometimes feels like it’s beating faster than usual. I don’t have any known heart problems. Could something be wrong with my heart?"

Patient: "Doctor, I get very tired even after walking a short distance, and sometimes I feel short of breath when lying down. My feet and ankles swell a bit by evening. I also notice my heart sometimes feels like it’s beating faster than usual. I don’t have any known heart problems. Could something be wrong with my heart?"

Doctor: Do you feel chest pain or pressure during these episodes? (yes/no)
Patient: "No, I don’t feel chest pain."

Doctor (Urgent Advice): Your symptoms may indicate a serious heart problem. Please visit a cardiologist immediately or call emergency services.

✅ Session completed. Final answer stored in CSV.


In [None]:
# Install dependencies
!pip install pandas sentence-transformers scikit-learn

import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# -------------------------
# 1. Load Pre-trained Database (MOE-style)
# -------------------------
# CSV should have columns: question, answer, clarifying_question, follow_up, disease_type, medication, precautions
df = pd.read_csv("/content/cardiology-qa-training-data.csv")

# -------------------------
# 2. Initialize Embedding Model
# -------------------------
model = SentenceTransformer("all-MiniLM-L6-v2")
question_embeddings = model.encode(df['question'].tolist(), convert_to_tensor=False)

# -------------------------
# 3. Retrieve Relevant Expert Response
# -------------------------
def get_expert_response(user_query):
    query_emb = model.encode([user_query], convert_to_tensor=False)
    sims = cosine_similarity(query_emb, question_embeddings)[0]
    top_idx = sims.argmax()
    row = df.iloc[top_idx]

    response = {
        "answer": row['answer'],
        "clarifying_question": row['clarifying_question'] if 'clarifying_question' in row else None,
        "follow_up": row['follow_up'] if 'follow_up' in row else None,
        "disease_type": row['disease_type'] if 'disease_type' in row else None,
        "medication": row['medication'] if 'medication' in row else None,
        "precautions": row['precautions'] if 'precautions' in row else None
    }
    return response

# -------------------------
# 4. Interactive Doctor-Patient Session (Iterative until satisfied)
# -------------------------
def doctor_chatbot():
    print("💓 Welcome to the Doctor Chatbot 💓")
    user_query = input("Patient: ")
    session_active = True

    while session_active:
        expert_response = get_expert_response(user_query)

        # Doctor's main response
        print("\nDoctor:", expert_response['answer'])

        # Iteratively ask clarifying questions
        while expert_response['clarifying_question'] and pd.notna(expert_response['clarifying_question']):
            clar_reply = input("\nDoctor (Clarifying Question): " + expert_response['clarifying_question'] + "\nPatient: ")
            user_query += " Additional info: " + clar_reply
            expert_response = get_expert_response(user_query)
            print("\nDoctor:", expert_response['answer'])

        # Ask for medication, precautions, or follow-up if patient asks
        user_next = input("\nPatient (ask about medication/precautions/doubts or type 'done' to finish): ").strip().lower()
        if 'medication' in user_next:
            if expert_response['medication']:
                print("\nDoctor (Medication):", expert_response['medication'])
            else:
                print("\nDoctor: Medication advice not available. Please consult your doctor.")
        elif 'precaution' in user_next or 'care' in user_next:
            if expert_response['precautions']:
                print("\nDoctor (Precautions):", expert_response['precautions'])
            else:
                print("\nDoctor: General precautions not available. Follow standard healthy lifestyle advice.")
        elif user_next == 'done':
            session_active = False
            print("\n✅ Session completed. Stay healthy!")
        else:
            # If the patient asks a new question, continue the loop
            user_query += " " + user_next

# -------------------------
# Run Chatbot
# -------------------------
doctor_chatbot()


💓 Welcome to the Doctor Chatbot 💓
Patient: user_query = "Doctor, sometimes my heart feels like it’s skipping a beat or fluttering, especially when I’m resting or lying down. Each episode lasts only a few seconds, but it happens a few times a day. I sometimes feel a little dizzy or lightheaded. I haven’t had chest pain or fainting. Could this be a problem with my heart?"

Doctor: hi, don't be too much anxious. try to avoid stress during your work. avoid heavy alcohol chatbot. for seeing your heart rhythm ecg or echocardiography done. then if any abnormality comes treatment is done. you also measure yr blood pressure because you mentioned stressful life so. if blood pressure mild elevated low salt food and green veg taken more. to relieve stress do yoga early in morning. do regular exercise.

Patient (ask about medication/precautions/doubts or type 'done' to finish): done

✅ Session completed. Stay healthy!
