To build an AI-powered medical explanation system that:

1. Extracts medical terms using NLP.
2. Retrieves medically accurate definitions using RAG.
3. Ensures source faithfulness (no hallucination).
4. Incorporates expert validation (human review).
5. Computes a final confidence score using weighted scoring.

User Input  
→ NLP Medical Term Extraction  
→ Vector Database Retrieval (RAG)  
→ Explanation Generation  
→ Source Faithfulness Check  
→ Human Validation Layer  
→ Final Confidence Score Calculation  


Final Confidence Score:

C_final = (w1 × S_retrieval)  
          + (w2 × S_source)  
          + (w3 × S_human)

Where:

- **S_retrieval** → Cosine similarity score from vector database  
- **S_source** → Faithfulness score (LLM answer vs original source)  
- **S_human** → Expert validation score (1.0 if verified)  

---


In [10]:
import sqlite3
import pandas as pd

# Connect to database
conn = sqlite3.connect("medical_jargon.db")

# See available tables
tables = pd.read_sql_query(
    "SELECT name FROM sqlite_master WHERE type='table';", conn
)

tables


Unnamed: 0,name
0,medical_terms
1,medical_fts
2,medical_fts_data
3,medical_fts_idx
4,medical_fts_content
5,medical_fts_docsize
6,medical_fts_config


In [11]:
df = pd.read_sql_query("SELECT * FROM medical_terms;", conn)

df.head()


Unnamed: 0,term,content,__index_level_0__,term_lower,content_length,extracted_date,summary
0,Paracetamol poisoning,"Paracetamol poisoning, also known as acetamino...",0,paracetamol poisoning,23666,2026-01-23,"Paracetamol poisoning, also known as acetamino..."
1,Acromegaly,Acromegaly is a disorder that results from exc...,1,acromegaly,21318,2026-01-23,Acromegaly is a disorder that results from exc...
2,Actinic keratosis,"Actinic keratosis (AK), sometimes called solar...",2,actinic keratosis,33330,2026-01-23,"Actinic keratosis (AK), sometimes called solar..."
3,Congenital adrenal hyperplasia,Congenital adrenal hyperplasia (CAH) is a grou...,3,congenital adrenal hyperplasia,19416,2026-01-23,Congenital adrenal hyperplasia (CAH) is a grou...
4,Adrenocortical carcinoma,Adrenocortical carcinoma (ACC) is an aggressi...,4,adrenocortical carcinoma,8252,2026-01-23,Adrenocortical carcinoma (ACC) is an aggressi...


In [12]:
df.columns


Index(['term', 'content', '__index_level_0__', 'term_lower', 'content_length',
       'extracted_date', 'summary'],
      dtype='str')

In [14]:
conn = sqlite3.connect("medical_jargon.db")

df = pd.read_sql_query("SELECT * FROM medical_terms LIMIT 10", conn)
print("First 10 rows:")
display(df)

print("\nColumn types:")
print(df.dtypes)

print("\nMissing values:")
print(df.isna().sum())

First 10 rows:


Unnamed: 0,term,content,__index_level_0__,term_lower,content_length,extracted_date,summary
0,Paracetamol poisoning,"Paracetamol poisoning, also known as acetamino...",0,paracetamol poisoning,23666,2026-01-23,"Paracetamol poisoning, also known as acetamino..."
1,Acromegaly,Acromegaly is a disorder that results from exc...,1,acromegaly,21318,2026-01-23,Acromegaly is a disorder that results from exc...
2,Actinic keratosis,"Actinic keratosis (AK), sometimes called solar...",2,actinic keratosis,33330,2026-01-23,"Actinic keratosis (AK), sometimes called solar..."
3,Congenital adrenal hyperplasia,Congenital adrenal hyperplasia (CAH) is a grou...,3,congenital adrenal hyperplasia,19416,2026-01-23,Congenital adrenal hyperplasia (CAH) is a grou...
4,Adrenocortical carcinoma,Adrenocortical carcinoma (ACC) is an aggressi...,4,adrenocortical carcinoma,8252,2026-01-23,Adrenocortical carcinoma (ACC) is an aggressi...
5,Alcohol withdrawal syndrome,Alcohol withdrawal syndrome (AWS) is a set of ...,5,alcohol withdrawal syndrome,16646,2026-01-23,Alcohol withdrawal syndrome (AWS) is a set of ...
6,Alopecia areata,"Alopecia areata, also known as spot baldness, ...",6,alopecia areata,11883,2026-01-23,"Alopecia areata, also known as spot baldness, ..."
7,Altitude sickness,"Altitude sickness, the mildest form being acut...",7,altitude sickness,20260,2026-01-23,"Altitude sickness, the mildest form being acut..."
8,Amblyopia,"Amblyopia, also called lazy eye, is a disorder...",8,amblyopia,12923,2026-01-23,"Amblyopia, also called lazy eye, is a disorder..."
9,Amoebiasis,"Amoebiasis, or amoebic dysentery, is an infect...",9,amoebiasis,15410,2026-01-23,"Amoebiasis, or amoebic dysentery, is an infect..."



Column types:
term                   str
content                str
__index_level_0__    int64
term_lower             str
content_length       int64
extracted_date         str
summary                str
dtype: object

Missing values:
term                 0
content              0
__index_level_0__    0
term_lower           0
content_length       0
extracted_date       0
summary              0
dtype: int64


In [18]:
print("Loading scispaCy model...")
nlp = spacy.load("en_core_sci_sm")

# Optional: add UMLS linker if you installed the full scispacy linker
# nlp.add_pipe("scispacy_linker", config={"linker_name": "umls"})

# Embedding model (fast & good quality)
print("Loading sentence transformer...")
embedder = SentenceTransformer('all-MiniLM-L6-v2')          # fast & general
# Alternative medical-focused: 'pritamdeka/S-PubMedBert-MS-MARCO'
# embedder = SentenceTransformer('pritamdeka/S-PubMedBert-MS-MARCO')

print("Models loaded.")

Loading scispaCy model...


NameError: name 'spacy' is not defined

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

vectorstore = FAISS.from_documents(split_docs, embedding_model)
