In [1]:
import os
import streamlit as st
import pickle
import time
from dotenv import load_dotenv
import langchain
from langchain import LLMChain
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.chains.qa_with_sources.loading import load_qa_with_sources_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import PyPDFLoader
from langchain import OpenAI, PromptTemplate
import glob

import google.generativeai as genai
from google.generativeai import GenerativeModel
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI

# Upgrade typing_extensions to fix ImportError
%pip install --upgrade typing_extensions




In [2]:
load_dotenv()
os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY', 'your-key-if-not-using-env')

genai.configure()

In [3]:
MODEL="gemini-2.0-flash-lite"

In [4]:
llm = ChatGoogleGenerativeAI(model=MODEL, temperature=0.5)

In [8]:
# file_path = r'/Users/shardulgore/Documents/Projects/Research-Paper-Summarizer/papers/Envisioning_Medclip_A_Deep_Dive_into_Explainability_for_Medical_Vision-Language_Models.pdf'
file_path = r'../papers/Envisioning_Medclip_A_Deep_Dive_into_Explainability_for_Medical_Vision-Language_Models.pdf'

In [6]:
def summarize(file_path, MODEL):
    loader = PyPDFLoader(file_path)
    docs = loader.load_and_split()
    llm = ChatGoogleGenerativeAI(model=MODEL, temperature=0.5) 
    chain = load_summarize_chain(llm, chain_type='map_reduce')
    summary = chain.invoke(docs)

    return summary

In [9]:
if os.path.exists(file_path):
	summ = summarize(file_path, MODEL)
	print(summ['output_text'])
else:
	print(f"File path {file_path} is not a valid file or url")

This paper investigates the explainability of MedCLIP, a vision-language model used for medical image analysis (specifically chest X-rays). It analyzes existing XAI methods, finding them insufficient for VLMs due to their complex text-image interaction and the generation of false positives. The authors propose a novel XAI approach that generates more accurate feature activation maps by integrating image and text embeddings. This method aims to improve the trustworthiness and wider adoption of VLMs in healthcare by providing better insights into model predictions, and is applicable to other VLMs. The research builds on the MIMIC-CXR dataset and leverages various AI tools and resources.


In [10]:
CUSTOM_PROMPT = "Write a detailed summary of the methodology used in following paper. Give output in markdown format."
vectorDB_path = 'faiss_store'

In [None]:
loader = PyPDFLoader(file_path)
docs = loader.load_and_split()
prompt_template = CUSTOM_PROMPT + """
Answer the following question based only on the provided context, do not use any external information. Always give a detailed answer in a language such that the answer can be used in summary:

<context>
{text}
</context>
"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=['text'])
llm = ChatGoogleGenerativeAI(model=MODEL, temperature=0.5)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.from_documents(docs, embeddings)
vector_store.save_local(vectorDB_path)

vectorstore = FAISS.load_local(vectorDB_path, embeddings, allow_dangerous_deserialization=True)
# chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vectorstore.as_retriever())

In [19]:
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vectorstore.as_retriever())

In [20]:
result = chain.invoke({"question": PROMPT.template, "context": docs})

In [21]:
print(result['answer'])

The methodology aims to enhance the explainability of Vision-Language Models (VLMs), specifically MedCLIP, by applying Explainable AI (XAI) methods to the embedding space of the VLM. The approach involves the following steps:

1.  **XAI Method Selection:** Four XAI methods are selected: Gradient backpropagation, Occlusion method, Integrated-Gradients, and Grad-Shapley. These methods are chosen to represent a diverse set of techniques.
2.  **Application of XAI Methods:** The selected XAI methods are applied to the MedCLIP model.
3.  **Proposed Approach:**
    *   **Image Embedding Analysis:** An XAI method (Mxai) is applied to the vision encoder (Venc) of MedCLIP to generate an explainability map (Fi\_map) for each embedding dimension (i). The output is M distinct maps, where M is the dimension of the embedding space. `Fi_map = Mxai(model = Venc, target= i)`.
    *   **Text Embedding Generation:** A text input (Xtxt) is encoded using the text encoder (Tenc) to generate text embeddings (

In [22]:
result = chain.invoke({"question": prompt_template}, return_only_outputs=True)

In [24]:
print(result['answer'])

Here's a detailed summary of the methodology used in the paper, formatted in Markdown:

**Methodology Summary**

The paper's methodology centers on applying Explainable AI (XAI) techniques to analyze the inner workings of Vision-Language Models (VLMs), specifically focusing on MedCLIP for chest X-ray (CXR) classification. The approach involves the following key steps:

1.  **XAI Method Selection:** The study employs four XAI methods:
    *   **Gradient Backpropagation (GB):** Identifies important features by assigning importance scores based on their contribution to model predictions.
    *   **Occlusion Method:** Analyzes region importance by occluding parts of the image and observing changes in network activation and model predictions.
    *   **Integrated-Gradients (IG):** Quantifies pixel importance by interpolating a baseline image to the actual input and integrating the gradients.
    *   **Grad-Shapley (GS):** Combines gradients with Shapley values to quantify each feature's con