# **MediChat AI** – A blend of "Medical" and "Chat" for AI-driven medical conversations
**Aim:**

To develop a Medical AI Chatbot that enables healthcare professionals to retrieve relevant patient records based on symptoms, medical history, and diagnoses using Retrieval-Augmented Generation (RAG) with ChromaDB and OpenAI’s GPT model.

**Objectives:**


1.   Efficient Medical Record Retrieval → Store and search structured patient data using ChromaDB with OpenAI embeddings.
2.   Conversational AI Assistance → Enable a chatbot-like interface where users can ask medical queries and get AI-generated responses.

3.   Query Moderation & Compliance → Integrate OpenAI’s Moderation API to ensure ethical and legal compliance of user queries.
4.   Multi-Turn Conversations → Allow up to 3 follow-up interactions to provide better context in conversations.
5.   Scalability & Production Readiness → Implement the solution in a modular way, making it scalable and ready for deployment in a real-world healthcare setting.

## Installing Required Libraries

In [1]:
!pip install openai chromadb langchain gradio

Collecting chromadb
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting gradio
  Downloading gradio-5.20.1-py3-none-any.whl.metadata (16 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.19.0-py2.py3-none-any.whl.metadata (2.9 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.20.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.

In [None]:
import openai
import chromadb
import json
import os
import re
import uuid
from openai import OpenAI
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
import gradio as gr

## Generate Patient Data using open ai

In [1]:
# 🔑 Set OpenAI API Key
os.environ["OPENAI_API_KEY"] = "sk-project-key"
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Initialize OpenAI Client
openai.api_key = os.getenv("OPENAI_API_KEY")

# Define System Prompt for OpenAI to Generate Patients
generation_prompt = """
You are a medical AI system. Generate a detailed patient history in structured JSON format.

### Format Example:
{
    "patient_name": "John Doe",
    "age": 45,
    "gender": "Male",
    "symptoms": ["fever", "headache", "muscle pain"],
    "past_medical_history": ["Hypertension", "Type 2 Diabetes"],
    "current_diagnosis": "Influenza",
    "medications": ["Metformin", "Lisinopril", "Ibuprofen", "Paracetamol"],
    "treatment_plan": "Rest, hydration, and medication for 5 days"
}

### Task:
Generate a unique and realistic patient history in the same JSON format.
"""
def generate_patient_record():
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a medical AI assistant."},
            {"role": "user", "content": generation_prompt}
        ],
        temperature=0.5,
        max_tokens=1000
    )

    # Access message content correctly
    content = response.choices[0].message.content

    # Convert text to JSON
    return json.loads(content)

**Fixing the raw structure of generated output**

In [2]:
def extract_json(text):
    """Extracts JSON from a text response."""
    match = re.search(r'\{.*\}', text, re.DOTALL)  # Find JSON structure
    if match:
        json_text = match.group()
        try:
            return json.loads(json_text)  # Convert to Python dictionary
        except json.JSONDecodeError:
            print("Warning: Invalid JSON. Returning raw text.")
            return json_text  # Return raw text if JSON parsing fails
    return None  # Return None if no JSON is found

def generate_patient_record():
    """Generates a structured medical record using OpenAI API."""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a medical AI assistant."},
            {"role": "user", "content": generation_prompt}
        ],
        temperature=0.5,
        max_tokens=500
    )

    content = response.choices[0].message.content  # Extract response text
    return extract_json(content)  # Ensure the response is structured JSON

# Generate 10 Patient Records
patient_data = [generate_patient_record() for _ in range(10)]

print("✅ 10 Patient Records Generated Successfully!")

# Print first patient record for verification
print(json.dumps(patient_data[0], indent=2))

✅ 10 Patient Records Generated Successfully!
{
  "patient_name": "Sarah Johnson",
  "age": 32,
  "gender": "Female",
  "symptoms": [
    "cough",
    "shortness of breath",
    "fever"
  ],
  "past_medical_history": [
    "Asthma",
    "Allergic rhinitis"
  ],
  "current_diagnosis": "Pneumonia",
  "medications": [
    "Albuterol inhaler",
    "Prednisone",
    "Azithromycin",
    "Acetaminophen"
  ],
  "treatment_plan": "Antibiotics, inhaler use every 4 hours, rest, and follow-up chest X-ray in 1 week"
}


## Store Generated Patient Data in Chroma db using open ai embedding

In [3]:
# 🔹 Initialize ChromaDB
chroma_client = chromadb.PersistentClient(path="/content/drive/MyDrive/UpGrad/Gen AI/Project/chroma_db")
embedding_function = OpenAIEmbeddingFunction(api_key=os.getenv("OPENAI_API_KEY"))

# 🔹 Create/Open Collection with OpenAI Embeddings
collection = chroma_client.get_or_create_collection(
    name="medical_summaries",
    embedding_function=embedding_function  # ✅ Ensure OpenAI embeddings are used
)

# 🔹 Store Patient Records in ChromaDB with Unique IDs
for patient in patient_data:
    patient_text = json.dumps(patient)  # Convert JSON to string format
    unique_id = f"{patient['patient_name']}_{uuid.uuid4().hex[:8]}"  # Unique ID

    collection.add(
        ids=[unique_id],  # ✅ Ensure IDs are unique
        documents=[patient_text]  # ✅ Store JSON as document
    )

print("✅ 10 Unique Patient Records Stored Successfully in ChromaDB!")

✅ 10 Unique Patient Records Stored Successfully in ChromaDB!


**Test if the data stored properly in chorma db**

In [4]:
# 🔹 Query Function (Ensures OpenAI Embeddings Are Used)
def search_similar_cases(query_text, top_k=3):
    results = collection.query(
        query_texts=[query_text],  # ✅ Uses OpenAI embeddings for queries
        n_results=top_k
    )
    return results["documents"]

# 🔹 Test Query
query = "A diabetic patient with high blood pressure"
similar_cases = search_similar_cases(query)

if similar_cases and similar_cases[0]:
    print("\n🔍 **Similar Cases Found:**")
    for i, case in enumerate(similar_cases[0], 1):
        print(f"{i}. {case}\n")
else:
    print("❌ No matching records found.")


🔍 **Similar Cases Found:**
1. {"patient_name": "Sarah Johnson", "age": 32, "gender": "Female", "symptoms": ["cough", "shortness of breath", "fatigue"], "past_medical_history": ["Asthma", "Seasonal allergies"], "current_diagnosis": "Pneumonia", "medications": ["Albuterol inhaler", "Prednisone", "Azithromycin"], "treatment_plan": "Antibiotics for 10 days, inhaler as needed, rest, and follow-up chest x-ray in 2 weeks"}

2. {"patient_name": "Emily Johnson", "age": 32, "gender": "Female", "symptoms": ["cough", "shortness of breath", "fatigue"], "past_medical_history": ["Asthma"], "current_diagnosis": "Pneumonia", "medications": ["Albuterol inhaler", "Azithromycin", "Acetaminophen"], "treatment_plan": "Antibiotics for 10 days, use of inhaler as needed, rest, and plenty of fluids"}

3. {"patient_name": "Emily Johnson", "age": 32, "gender": "Female", "symptoms": ["fatigue", "abdominal pain", "nausea", "loss of appetite"], "past_medical_history": ["Irritable Bowel Syndrome", "Anemia"], "curren

## Build a Chatbot for Retrieval

In [8]:
# 🔹 Ensure your OpenAI API key is set
#client = openai.OpenAI(api_key="*****************************")

# 🔹 Initialize conversation history
chat_history = [
    {"role": "system", "content": "You are a medical AI assistant helping users find similar cases from medical records. You should provide helpful insights based on retrieved patient records and previous messages."}
]

# Function to check moderation
def is_safe_query(user_query):
    moderation_response = openai.moderations.create(input=[user_query])
    return not moderation_response.results[0].flagged  # Returns True if safe

# Function to search for similar cases in ChromaDB
def search_similar_cases(query_text, top_k=3):
    results = collection.query(query_texts=[query_text], n_results=top_k)
    return results["documents"]

# Function to chat with OpenAI API (handles multi-turn conversation)
def chat_with_ai(user_query, retrieved_cases=""):
    # Add user input to chat history
    chat_history.append({"role": "user", "content": user_query})

    # If relevant records are found, add them to chat history
    if retrieved_cases:
        case_summary = "\n".join([str(case) for case in retrieved_cases[0]])
        chat_history.append({"role": "system", "content": f"Relevant medical records found:\n{case_summary}"})

    # Get AI response
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=chat_history,
        temperature=0.3,
        max_tokens=2000
    )

    ai_response = response.choices[0].message.content
    chat_history.append({"role": "assistant", "content": ai_response})  # Save response in history

    return ai_response

**Building Chatbot using gradio interface**

In [9]:
# Function to process user input and return chatbot response
def chat_interface(user_query, history=[]):
    if not user_query.strip():
        return "Please enter a valid query."

    # 🔍 Moderation Check
    if not is_safe_query(user_query):
        return "⚠️ Your query was flagged for moderation. Please ask something else."

    # 🔹 Search for Similar Cases
    similar_cases = search_similar_cases(user_query)

    # 🤖 Get AI Response
    ai_response = chat_with_ai(user_query, similar_cases)

    return ai_response

# Create Gradio Chatbot Interface
chatbot_ui = gr.ChatInterface(
    fn=chat_interface,  # Function that processes user input
    title="Medical Query Chatbot Designed by Subham",
    description="Ask about medical conditions and check for similar cases.Mention Exit to end the conversation.",
    theme="default"
)

# Launch Gradio UI
chatbot_ui.launch(share=True)

  self.chatbot = Chatbot(


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://9e9d329bdf7b911ef8.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


