In [1]:
from langchain_community.document_loaders import TextLoader

In [2]:
loader = TextLoader("sample_paper_raw_text.txt", encoding="utf-8")
docs = loader.load()

In [14]:
print(docs[0])

page_content='FOUNDATIONAL MODELS IN MEDICAL IMAGING : A
COMPREHENSIVE SURVEY AND FUTURE VISION
Bobby Azad
Electrical Engineering and Computer Science Department
South Dakota State University
Brookings, USA
Reza Azad
Faculty of Electrical Engineering and Information Technology
RWTH Aachen University
Aachen, Germany
Sania Eskandari
Department of Electrical Engineering
University of Kentucky
Lexington, USA
Afshin Bozorgpour
Faculty of Informatics and Data Science
University of Regensburg
Regensburg, Germany
Amirhossein Kazerouni
School of Electrical Engineering
Iran University of Science and Technology
Tehran, Iran
Islem Rekik
BASIRA Lab, Imperial-X and Computing Department
Imperial College London
London, UK
Dorit Merhof∗
Faculty of Informatics and Data Science
University of Regensburg
Regensburg, Germany
ABSTRACT
Foundation models, large-scale, pre-trained deep-learning models adapted to a wide range of downstream
tasks have gained significant interest lately in various deep-learning pr

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [15]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
chunks = text_splitter.split_documents(docs)

In [21]:
print(chunks[57])

page_content='1.4 Paper Organization.
The rest of the survey is organized as follows. Section 2 presents the background and preliminaries for foundation models.
We adopt the taxonomy of [2] and categorize previous studies into two main groups: those prompted by textual inputs
(discussed in section 3.1) and those driven by visual cues (discussed in section 3.2). In the context of textually prompted' metadata={'source': 'sample_paper_raw_text.txt'}


In [22]:
from langchain_ollama import OllamaEmbeddings 

In [23]:
embeddings = OllamaEmbeddings(model="nomic-embed-text")

In [24]:
from langchain_community.vectorstores import Chroma

In [25]:
vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="rag_local_collection"
)

In [26]:
retriever = vector_store.as_retriever()

In [31]:
from langchain_ollama import OllamaLLM

In [32]:
llm = OllamaLLM(model="gemma3")

In [35]:
from langchain_core.prompts import ChatPromptTemplate

In [44]:
prompt = ChatPromptTemplate.from_template(
    """
    You are a RAG assistant. Answer the question based solely on the context provided below. If the answer is not in context, please state that you don\'t know.

    Contexto: {context}

    Pergunta: {input}
    """
)

In [34]:
from langchain.chains.combine_documents import create_stuff_documents_chain

In [45]:
document_chain = create_stuff_documents_chain(llm, prompt)

In [39]:
from langchain.chains import create_retrieval_chain

In [46]:
rag_chain = create_retrieval_chain(retriever, document_chain)

In [47]:
question = 'According to the abstract, what primary gap are Foundation Models (FMs) trained to bridge, and what capabilities do they facilitate at test time?'
response = rag_chain.invoke({"input": question})

In [48]:
print(f"Answer: {response['answer']}")

Answer: According to the abstract, Foundation Models (FMs) are trained to bridge the gap between different modalities. They facilitate contextual reasoning, generalization, and prompt capabilities at test time.


In [49]:
response

{'input': 'According to the abstract, what primary gap are Foundation Models (FMs) trained to bridge, and what capabilities do they facilitate at test time?',
 'context': [Document(metadata={'source': 'sample_paper_raw_text.txt'}, page_content='development of Foundation Models (FMs). Foundation Models (FMs) are a type of artificial intelligence (AI) model that\nexhibit significant progress in their development. These models are typically trained on extensive, diverse dataset, frequently\nutilizing self-supervision techniques on a massive scale. Following this initial training, they can be further adapted, such as\nthrough fine-tuning, for a wide array of downstream tasks that are related to the original training data [1].'),
  Document(metadata={'source': 'sample_paper_raw_text.txt'}, page_content='tasks have gained significant interest lately in various deep-learning problems undergoing a paradigm\nshift with the rise of these models. Trained on large-scale dataset to bridge the gap b

In [50]:
def ask(question: str):
    response = rag_chain.invoke({"input": question})
    print(f"Answer: {response['answer']}")
    return

In [51]:
ask('In contrast to the conventional deep learning paradigm, what is the main efficiency advantage that Foundation Models offer concerning data usage for downstream tasks?')

Answer: Based on the context, foundation models offer an efficiency advantage concerning data usage for downstream tasks by being able to be adjusted for new tasks by augmenting the model databases.


In [52]:
ask('The survey proposes a methodical taxonomy of Foundation Models in medical imaging. What are the two main, broad categories used to classify existing works, based on the type of input prompt they primarily use?')

Answer: The survey proposes a methodical taxonomy of foundation models within the medical domain, proposing a classification system primarily structured around training strategies, while also incorporating additional facets such as application domains, imaging modalities, specific organs of concern. The two main, broad categories used to classify existing works are those prompted by text and those guided by visual cues.


In [53]:
ask('Beyond the common challenges of interpretability and computational requirements, which significant, domain-specific challenge in medical imaging does the paper mention that Foundation Models help address by allowing knowledge transfer without direct data access?')

Answer: The paper mentions that foundation models help address the significant, domain-specific challenge of “intricate connections between different medical data” by allowing knowledge transfer without direct data access.
