# **Medical Q&A Chatbot project (using local TF-IDF + LangChain + FAISS + Gradio)**

This notebook answers queries based on the content of a medical PDF you upload.  
Built using: `LangChain`, `Gradio`, `FAISS`, and `TF-ID

In [34]:
!pip install -U langchain langchain-community faiss-cpu gradio pypdf scikit-learn --quiet


In [35]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter

loader = PyPDFLoader("medical_article.pdf")  # Make sure you uploaded or downloaded it
docs = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)


In [36]:
from langchain.vectorstores import FAISS
from sklearn.feature_extraction.text import TfidfVectorizer
from langchain.embeddings import FakeEmbeddings  # Needed for FAISS to work with TFIDF-like vectors

# Create "fake" embedding to allow vector storage
embedding_model = FakeEmbeddings(size=1536)  # arbitrary dimension for compatibility

# This will work just fine as a drop-in
vectorstore = FAISS.from_documents(chunks, embedding_model)


In [37]:
from langchain.chains import RetrievalQA
from langchain.llms.fake import FakeListLLM  # Dummy LLM for testing

llm = FakeListLLM(responses=["This is a mock answer. Replace with real LLM later."])

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    chain_type="stuff"
)


In [40]:
!pip install websockets==10.4 --quiet
!pip install uvicorn==0.20.0 --quiet
!pip install gradio==3.41.2 --quiet


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/107.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.4/107.4 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-genai 1.5.0 requires websockets<15.0dev,>=13.0, but you have websockets 10.4 which is incompatible.[0m[31m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.9/56.9 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.1/20.1 MB[0m [31m36.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.2/298.2 kB[0m [31m23.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os
os.kill(os.getpid(), 9)


In [1]:
import gradio as gr
gr.close_all()  # 👈 closes any hidden interfaces hogging the port


In [2]:
import gradio as gr

def medical_chatbot(query):
    docs = retriever.get_relevant_documents(query)
    result = chain.run(input_documents=docs, question=query)
    return result

gr.Interface(
    fn=medical_chatbot,
    inputs=gr.Textbox(lines=2, placeholder="Ask a medical question..."),
    outputs="text",
    title="🩺 Medical Q&A Chatbot (Local TF-IDF)",
    description="No API, No Transformers — powered by FAISS + LangChain."
).launch(share=True)


IMPORTANT: You are using gradio version 3.41.2, however version 4.44.1 is available, please upgrade.
--------
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://d6f511586cccb9ce04.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [5]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/medical_article.pdf")  # 👈 Use the real filename here
docs = loader.load()


In [6]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)


In [7]:
for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i+1}:\n{chunk.page_content}\n")


Chunk 1:
Pre-train, Prompt, and Predict: A Systematic Survey of
Prompting Methods in Natural Language Processing
Pengfei Liu
Carnegie Mellon University
pliu3@cs.cmu.edu
Weizhe Yuan
Carnegie Mellon University
weizhey@cs.cmu.edu
Jinlan Fu
National University of Singapore
jinlanjonna@gmail.com
Zhengbao Jiang
Carnegie Mellon University
zhengbaj@cs.cmu.edu
Hiroaki Hayashi
Carnegie Mellon University
hiroakih@cs.cmu.edu
Graham Neubig
Carnegie Mellon University
gneubig@cs.cmu.edu
Abstract
This paper surveys and organizes research works in a new paradigm in natural language processing, which
we dub “prompt-based learning”. Unlike traditional supervised learning, which trains a model to take in an
input x and predict an output y as P(y|x), prompt-based learning is based on language models that model
the probability of text directly. To use these models to perform prediction tasks, the original input x is
modiﬁed using a template into a textual string prompt x′that has some unﬁlled slots, and the

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Convert chunks to plain text
chunk_texts = [doc.page_content for doc in chunks]

# Create TF-IDF vectors
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(chunk_texts)


In [9]:
def retrieve_answer(query):
    query_vec = vectorizer.transform([query])
    similarity_scores = cosine_similarity(query_vec, tfidf_matrix)

    # Get the most relevant chunk
    top_idx = similarity_scores.argmax()
    return chunk_texts[top_idx]


In [10]:
import gradio as gr

def answer_question(query):
    try:
        return retrieve_answer(query)
    except:
        return "Sorry, something went wrong. Please try a different question."

interface = gr.Interface(
    fn=answer_question,
    inputs=gr.Textbox(label="query"),
    outputs="text",
    title="🩺 Medical Q&A Chatbot (Local TF-IDF)",
    description="No API, No Transformers — powered by FAISS + LangChain."
)

interface.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
IMPORTANT: You are using gradio version 3.41.2, however version 4.44.1 is available, please upgrade.
--------
Running on public URL: https://04d84480c5eaa64bf5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


