<a href="https://colab.research.google.com/github/claudio1975/PyCon_Italia_2025/blob/main/Phi_3.5/Medical_Report_Agentic_RAG_Phi_3_5_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Agentic RAG: Naive RAG with Phi and LangChain, integrated by a GroupChat of Agents

This notebook shows an easy RAG (Retrieval Augmented Generation) with Phi model from Hugging Face [`microsoft/Phi-3.5-mini-instruct`](https://huggingface.co/microsoft/Phi-3.5-mini-instruct), and LangChain.


**RAG process**

The RAG (Retrieval-Augmented Generation) system combines a retrieval system with an LLM. The system first retrieves relevant documents from a corpus using a vector database, then uses an LLM hosted in Hugging Face to generate answers based on the retrieved documents.

**Agents process**

The Writer agent generates an initial report using the retrieved context, creating structured and detailed content based on the given task. The Reviewer agent then provides constructive feedback on the Writer's output, enabling the Writer to refine and enhance the report for improved quality and accuracy


# Prepare Workspace

In [None]:
!pip install -q torch transformers sentence-transformers faiss-cpu pypdf &> /dev/null

In [None]:
!pip install -U langchain-huggingface &>/dev/null

In [None]:
!pip install -q langchain langchain-community &> /dev/null

In [None]:
!pip install ipywidgets &>/dev/null

In [None]:
! pip install huggingface_hub[hf_xet] &> /dev/null

In [None]:
! pip install -U "autogen[openai]" &>/dev/null

In [None]:
import langchain as lc
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline
from huggingface_hub import hf_hub_download
import autogen
from autogen import AssistantAgent



In [None]:
llm_config = {
    "model": "gpt-4o-mini",
    "api_key": ""
    }

## Upload the data


This source is an opinion draft from MedTech Europe, a trade association for the medical technology industry in Europe. It outlines the industry's perspective on the final text of the EU's AI Act, specifically regarding its application to medical technologies.

In [None]:
# ==========================
# 1. Data Ingestion
# ==========================

# Load content from local PDFs
pdf_url = "https://www.medtecheurope.org/wp-content/uploads/2024/03/medical-technology-industry-perspective-on-the-final-ai-act-1.pdf"
loader = PyPDFLoader(pdf_url)
docs = loader.load()

In [None]:
# Assign meaningful metadata to each document chunk
for i, doc in enumerate(docs):
    doc.metadata.update({
        'document_id': f'doc_{i}',
        'document_source': pdf_url,
        'document_create_time': "2024"
    })

In [None]:
print("\nPage Content: ", docs[0].page_content)
print("\nMeta Data: ", docs[0].metadata)


Page Content:  Medical technology 
industry perspective on 
the final AI Act  
 
Date: 13 March 2024

Meta Data:  {'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2024-03-13T16:48:27+01:00', 'author': 'Benjamin Meany', 'moddate': '2024-03-13T16:48:27+01:00', 'source': 'https://www.medtecheurope.org/wp-content/uploads/2024/03/medical-technology-industry-perspective-on-the-final-ai-act-1.pdf', 'total_pages': 6, 'page': 0, 'page_label': '1', 'document_id': 'doc_0', 'document_source': 'https://www.medtecheurope.org/wp-content/uploads/2024/03/medical-technology-industry-perspective-on-the-final-ai-act-1.pdf', 'document_create_time': '2024'}


In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)

In [None]:
print("PDF Splited by Chunks - You have {0} number of chunks.".format(len(docs)))

PDF Splited by Chunks - You have 6 number of chunks.


## Embeddings + Retriever

For embeddings I use the `HuggingFaceEmbeddings` and the [`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) embeddings model.

To create the vector database, I use `FAISS`, a library developed by Facebook AI. This library offers efficient similarity search and clustering of dense vectors.

In [None]:
# ==========================
# 2. Embeddings and Retriever
# ==========================
db = FAISS.from_documents(chunked_docs,
                          HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2'))

In [None]:
retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 3}
)

## Load the model

In [None]:
# ==========================
# 3. Language Model Setup
# ==========================

model_name = "microsoft/Phi-3.5-mini-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name,torch_dtype=torch.float16, device_map='auto')

## Set up the RAG chain

First, I create a text_generation pipeline using the loaded model and its tokenizer.

Next, I create a prompt template.

then, I combine the `llm_chain` with the retriever to create a RAG chain.

In [None]:
# Pipeline for text generation
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=500,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# Prompt template to match desired output format
prompt_template = """
=================================================================================================
You are an expert researcher tasked with providing precise and accurate answers based solely on the provided context.
Avoid generating information. If the answer is not present in the context, respond with "I haven't found the answer."
If unsure, state "I don't know." Do not attempt to infer or create responses beyond the given data.
=================================================================================================
Context:
{context}
=================================================================================================
Question: {question}
=================================================================================================
Answer:
=================================================================================================
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

llm_chain = prompt | llm | StrOutputParser()


rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)



Device set to use cuda:0


In [None]:
task = '''
Write a comprehensive report in bullet points and tables summarizing the key insights from data into the the provided document.
'''


In [None]:
initial_result = rag_chain.invoke(task)


## Agents set up

I define the role for the writer and the reviewer agents

In [None]:
# ==========================
# 5. Define Agents
# ==========================

# Initialize the Writer agent
writer = AssistantAgent(
    name="Writer",
    system_message=(
    "You are a professional writer specializing in creating comprehensive and well-structured reports. "
    "Your tasks involve producing engaging and concise reports, complete with relevant titles, based on the provided data and context. "
    "When incorporating feedback from the Reviewer, ensure that the tone remains polite and the content is refined accordingly. "
    "Your final output should integrate all feedback to produce an improved version of the report. "
    "Maintain clarity, coherence, and professionalism in your writing. Use bullet points and tables where appropriate to enhance readability."
),
    llm_config=llm_config,

)

# Initialize the Reviewer agent
reviewer = AssistantAgent(
    name="Reviewer",
    system_message=(
    "You are a meticulous reviewer tasked with evaluating the reports produced by the Writer. "
    "Your role is to provide constructive, specific, and actionable feedback aimed at enhancing the quality, clarity, and depth of the content. "
    "Focus on areas such as structure, coherence, accuracy of information, and presentation (including the use of bullet points and tables). "
    "Ensure that your feedback is clear, concise, and supportive, enabling the Writer to effectively refine and improve the report."
),
    llm_config=llm_config,

)




## Agentic RAG

Writer and reviewer interact each other in a loop to accomplish a task exploiting the rag output

In [None]:
# ==========================
# 6. Agent Interaction Loop
# ==========================

max_iterations = 1
current_iteration = 0
final_output = initial_result

print('=======================================================================================')
print("Initial RAG:")
print('=======================================================================================')
print(initial_result)

while current_iteration < max_iterations:
    print(f'\nIteration {current_iteration + 1}:')

    # Writer generates a reply based on the current output
    messages_writer = [
        {"role": "user", "content": task},
        {"role": "user", "content": f"Here is the retrieved context: {final_output}"}
    ]
    writer_reply = writer.generate_reply(messages=messages_writer)

    print('---------------------------------------------------------------------------------------')
    print("Agent Writer:")
    print('---------------------------------------------------------------------------------------')
    print(writer_reply)

    # Reviewer reviews the writer's reply
    messages_reviewer = [
        {"role": "user", "content": f"Here is the Writer's report: {writer_reply}"}
    ]
    reviewer_reply = reviewer.generate_reply(messages=messages_reviewer)

    print('---------------------------------------------------------------------------------------')
    print("Agent Reviewer:")
    print('---------------------------------------------------------------------------------------')
    print(reviewer_reply)

    # Writer revises the report based on the reviewer's feedback
    messages_writer_revision = [
        {"role": "user", "content": task},
        {"role": "user", "content": f"Here is the retrieved context: {reviewer_reply}"}
    ]
    revised_writer_reply = writer.generate_reply(messages=messages_writer_revision)

    # Update the final output for the next iteration
    final_output = revised_writer_reply
    current_iteration += 1

print('=======================================================================================')
print("Final Version:")
print('=======================================================================================')
print(final_output)

Initial RAG:

You are an expert researcher tasked with providing precise and accurate answers based solely on the provided context. 
Avoid generating information. If the answer is not present in the context, respond with "I haven't found the answer."
If unsure, state "I don't know." Do not attempt to infer or create responses beyond the given data.
Context:
[Document(id='278e91ae-e36d-42ea-b67c-ea1e97067291', metadata={'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2024-03-13T16:48:27+01:00', 'author': 'Benjamin Meany', 'moddate': '2024-03-13T16:48:27+01:00', 'source': 'https://www.medtecheurope.org/wp-content/uploads/2024/03/medical-technology-industry-perspective-on-the-final-ai-act-1.pdf', 'total_pages': 6, 'page': 3, 'page_label': '4', 'document_id': 'doc_3', 'document_source': 'https://www.medtecheurope.org/wp-content/uploads/2024/03/medical-technology-industry-perspective-on-the-final-ai-act-1.pdf', 'document_crea