<a href="https://colab.research.google.com/github/claudio1975/PyCon_Italia_2025/blob/main/Phi_1.5/Advanced_Agentic_RAG_Phi_1_5_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Agentic RAG: Advanced RAG with Phi and LangChain, integrated by an Orchestration of Multiple Agents (MAS RAG)

This notebook shows an advanced RAG (Retrieval Augmented Generation) with Phi model from Hugging Face, and LangChain.


**RAG process**

The RAG process loads and chunks PDFs, creates embeddings with HuggingFace stored in FAISS, uses an ensemble of similarity and BM25 retrievers to fetch and re-rank documents, then feeds them into a language model for generation.

**Agents process**

The multi-agent system coordinates data ingestion and retrieval, with writers generating reports, reviewers providing feedback, metadata and web reviewers enhancing content. Admin and meta reviewer oversee interactions, ensuring comprehensive, high-quality outputs.

# Prepare Workspace

In [None]:
!pip install -q torch transformers sentence-transformers faiss-cpu pypdf &> /dev/null

In [None]:
!pip install -U langchain-huggingface &>/dev/null

In [None]:
!pip install -q langchain langchain-community rank_bm25 &> /dev/null

In [None]:
!pip install ipywidgets &>/dev/null

In [None]:
! pip install huggingface_hub[hf_xet] &> /dev/null

In [None]:
! pip install -U "autogen[openai]" &>/dev/null

In [None]:
! pip install wikipedia &>/dev/null

In [None]:
!pip install duckduckgo_search_api &>/dev/null

In [None]:
llm_config = {
    "model": "gpt-4o-mini",
    "api_key": ""
    }

In [None]:
import langchain as lc
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers import EnsembleRetriever, BM25Retriever
from langchain_huggingface import HuggingFacePipeline
from huggingface_hub import hf_hub_download
from sentence_transformers import CrossEncoder
import wikipedia
import autogen
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from ddg import Duckduckgo


## Upload the data


This source is an academic article. It analyzes how the newly adopted EU AI Act affects medical devices, detailing classification, compliance requirements, provider obligations, and the future impact on digital healthcare products.

In [None]:
# ==========================
# 1. Data Ingestion
# ==========================

# Load content from local PDFs
pdf_url = "https://www.nature.com/articles/s41746-024-01232-3.pdf"
loader = PyPDFLoader(pdf_url)
docs = loader.load()

In [None]:
# Assign meaningful metadata to each document chunk
for i, doc in enumerate(docs):
    doc.metadata.update({
        'document_id': f'doc_{i}',
        'document_source': pdf_url,
        'document_create_time': "2024"
    })

In [None]:
print("\nPage Content: ", docs[0].page_content)
print("\nMeta Data: ", docs[0].metadata)


Page Content:  npj |digital medicine Perspective
Published in partnership with Seoul National University Bundang Hospital
https://doi.org/10.1038/s41746-024-01232-3
Navigating the EU AI Act: implications for
regulated digital medical products
Check for updates
Mateo Aboy 1,2 , Timo Minssen 1,3 &E f f yV a y e n a4
The newly adopted EU AI Act represents a pivotal milestone that heralds a new era of AI regulation
across industries. With its broad territorial scope and applicability, this comprehensive legislation
establishes stringent requirements for AI systems. In this article, we analyze the AI Act’s impact on
digital medical products, such as medical devices: How does the AI Act apply to AI/ML-enabled
medical devices? How are they classiﬁed? What are the compliance requirements? And, what are the
obligations of‘providers’of these AI systems? After addressing these foundational questions, we
discuss the AI Act’s broader implications for the future of regulated digital medical product

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)

In [None]:
print("PDF Splited by Chunks - You have {0} number of chunks.".format(len(docs)))

PDF Splited by Chunks - You have 6 number of chunks.


## Embeddings + Retriever

For embeddings I use the `HuggingFaceEmbeddings` and the [`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) embeddings model.

To create the vector database, I use `FAISS`, a library developed by Facebook AI. This library offers efficient similarity search and clustering of dense vectors.

In [None]:
# ==========================
# 2. Embeddings and Retriever
# ==========================
db = FAISS.from_documents(chunked_docs,
                          HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2'))

In [None]:
retriever_1 = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 10} # Increased k
)


In [None]:
retriever_2 = BM25Retriever.from_documents(chunked_docs, search_kwargs={"k": 4})

In [None]:
# initialize the ensemble retriever with 2 Retrievers
ensemble_retriever = EnsembleRetriever(
    retrievers=[retriever_1, retriever_2], weights=[0.4, 0.6]
)

## Load the model

In [None]:
# ==========================
# 3. Language Model Setup
# ==========================

model_name = "microsoft/Phi-1_5"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name,torch_dtype=torch.float16, device_map='auto')

## Re-Ranking

I apply the re-ranking technique to improve the RAG

In [None]:
# ==========================
# 4. Initialize Re-ranker
# ==========================

# Initialize the cross-encoder for re-ranking
re_ranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2', device='cuda' if torch.cuda.is_available() else 'cpu')

# ==========================
# 5. Define Re-ranking Function
# ==========================
def rerank_documents(query, docs, re_ranker, top_n=3):
    pairs = [[query, doc.page_content] for doc in docs]
    scores = re_ranker.predict(pairs)
    scored_docs = list(zip(docs, scores))
    scored_docs.sort(key=lambda x: x[1], reverse=True)
    top_docs = [doc for doc, score in scored_docs[:top_n]]
    return top_docs


## Set up the RAG chain with re-ranking

First, I create a text_generation pipeline using the loaded model and its tokenizer.

Next, I create a prompt template.

then, I combine the `llm_chain` with the retriever to create a RAG chain.

In [None]:
#-------------------------------------
# 6. RAG Chain with Re-ranking
#-------------------------------------

# Pipeline for text generation
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=500,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# Prompt template to match desired output format
prompt_template = """
=================================================================================================
You are an expert researcher tasked with providing precise and accurate answers based solely on the provided context.
Avoid generating information. If the answer is not present in the context, respond with "I haven't found the answer."
If unsure, state "I don't know." Do not attempt to infer or create responses beyond the given data.
=================================================================================================
Context:
{context}
=================================================================================================
Question: {question}
=================================================================================================
Answer:
=================================================================================================
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

llm_chain = prompt | llm | StrOutputParser()


rag_chain = (
    {"context": ensemble_retriever, "question": RunnablePassthrough()}
    | llm_chain
)



Device set to use cuda:0


In [None]:
task = '''
Write a comprehensive report in bullet points and tables summarizing the key insights from data into the provided document.
'''


In [None]:
initial_result = rag_chain.invoke(task)

Token indices sequence length is longer than the specified maximum sequence length for this model (5890 > 2048). Running this sequence through the model will result in indexing errors
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


In [None]:
# Retrieve initial documents
retrieved_docs = ensemble_retriever.invoke(task)

# Perform re-ranking
top_docs = rerank_documents(task, retrieved_docs, re_ranker, top_n=3)

# Combine the top-ranked documents' content
context = "\n\n".join([doc.page_content for doc in top_docs])


## Retrieve relevant information from wikipedia and a web page

I retrieve information related to the topic of the document by wikipedia API

In [None]:
#------------------------------
# 5. wikipedia
#-----------------------------

# Function to safely fetch Wikipedia summary
def fetch_wikipedia_summary(query, sentences=10):
    try:
        summary = wikipedia.summary(query, sentences=sentences)
    except wikipedia.exceptions.DisambiguationError as e:
        # Choose the first option in case of disambiguation
        summary = wikipedia.summary(e.options[0], sentences=sentences)
    except wikipedia.exceptions.PageError:
        summary = "No relevant Wikipedia page found."
    return summary

# Derive a query from the document title or content
document_title = "EU AI Act"
wikipedia_summary = fetch_wikipedia_summary(document_title, sentences=5)

print("Wikipedia Summary:", wikipedia_summary)

Wikipedia Summary: The Artificial Intelligence Act (AI Act) is a European Union regulation concerning artificial intelligence (AI). It establishes a common regulatory and legal framework for AI within the European Union (EU). It came into force on 1 August 2024, with provisions that shall come into operation gradually over the following 6 to 36 months.
It covers all types of AI across a broad range of sectors, with exceptions for AI systems used solely for military, national security, research and non-professional purposes. As a piece of product regulation, it does not confer rights on individuals, but regulates the providers of AI systems and entities using AI in a professional context.


I retrieve information related to the topic of the document with duckduckgo

In [None]:
#------------------------------
# 6. Duckduckgo
#-----------------------------

ddg_api = Duckduckgo()
web_result = ddg_api.search("EU AI act impact on medical devices")

## Agents set up

I define the role for the admin, writer, reviewer, meta_reviewer, metadata_reviewer and web_reviewer

In [None]:
# ==========================
# 7. Define Agents
# ==========================

# Initialize the User Proxy Agent
user_proxy = UserProxyAgent(
    name="Admin",
    system_message=(
        "You are the Admin, serving as the coordinator between data processing and report writing. "
        "Your responsibilities include:\n"
        "1. Receiving and reviewing analysis results from the Retrieval-Augmented Generation (RAG) system.\n"
        "2. Identifying areas requiring additional information or clarification based on Reviewer feedback.\n"
        "3. Instructing the Writer Agent to refine or expand report sections as needed.\n"
        "4. Ensuring that all agents communicate effectively to maintain report quality and coherence.\n"
        "5. Managing the workflow to adhere to deadlines and project objectives.\n\n"
        "Please facilitate smooth interactions among agents, prioritize tasks based on urgency and importance, "
        "and provide clear, concise instructions to the Writer Agent to guide report enhancements."
    ),
    code_execution_config=False,
    human_input_mode="NEVER",
    llm_config=llm_config,
)

# Initialize the Writer Agent
writer = AssistantAgent(
    name="Writer",
    system_message=(
        "You are the Writer, a professional specializing in crafting comprehensive, well-structured, and engaging reports. "
        "Your tasks include:\n"
        "1. Synthesizing provided data, context, and retrieved information to develop clear and concise report sections.\n"
        "2. Incorporating feedback from the Reviewer to enhance clarity, coherence, and depth of content.\n"
        "3. Structuring reports with appropriate titles, headings, subheadings, bullet points, and tables to improve readability.\n"
        "4. Ensuring that all information is accurate, well-organized, and aligns with the report's objectives and guidelines.\n"
        "5. Collaborating with other agents (Admin, Reviewer, Metadata_reviewer, Web_reviewer) to integrate diverse inputs seamlessly.\n\n"
        "Maintain a professional tone, avoid jargon unless necessary, and ensure that each report section logically flows into the next."
    ),
    llm_config=llm_config,
)

# Initialize the Reviewer Agent
reviewer = AssistantAgent(
    name="Reviewer",
    system_message=(
        "You are the Reviewer, a meticulous and analytical expert responsible for evaluating reports produced by the Writer. "
        "Your duties include:\n"
        "1. Assessing the overall structure and organization of the report to ensure logical flow and coherence.\n"
        "2. Evaluating the clarity and precision of the language used, suggesting improvements where necessary.\n"
        "3. Verifying the accuracy and relevance of the information presented, identifying any factual inconsistencies or gaps.\n"
        "4. Providing constructive, specific, and actionable feedback to enhance content quality, depth, and readability.\n"
        "5. Reviewing the effective use of formatting tools like bullet points, tables, and headings to improve presentation.\n"
        "6. Ensuring that all sections align with the report's objectives and adhere to predefined standards.\n\n"
        "Deliver feedback in a clear, organized manner, categorizing comments under relevant headings (e.g., Content, Structure, Language, Formatting) to facilitate easy reference and implementation."
    ),
    llm_config=llm_config,
)

# Initialize the metadata_reviewer Agent
metadata_reviewer = AssistantAgent(
    name="Metadata_reviewer",
    system_message=(
        "You are the Metadata Reviewer, responsible for generating and updating metadata for documents used in reports. "
        "When provided with feedback from the Reviewer indicating a need for updated or additional information, your task is to: "
        "1. Analyze the existing metadata associated with the document. "
        "2. Identify any gaps or areas requiring enhancement based on the feedback. "
        "3. Produce comprehensive and accurate metadata entries that align with the report's objectives. "
        "4. Ensure the metadata follows the predefined schema and standards for consistency. "
        "Provide your outputs in a structured format (e.g., JSON or YAML) to facilitate seamless integration."
    ),
    llm_config=llm_config,
)

# Initialize the web_reviewer Agent
web_reviewer = AssistantAgent(
    name="Web_reviewer",
    system_message=(
        "You are the Web Reviewer, tasked with sourcing and integrating relevant web information from Wikipedia and DuckDuckGo to support report content. "
        "When the Reviewer identifies a need for additional context or updated information, your responsibilities include: "
        "1. Conducting comprehensive searches on specified topics using Wikipedia and DuckDuckGo. "
        "2. Extracting pertinent information that enhances the report's depth and accuracy. "
        "3. Summarizing the retrieved data concisely, ensuring it complements the existing report without introducing redundancy. "
        "4. Citing sources appropriately to maintain credibility and allow for further reference. "
        "Present your findings in a well-organized format, utilizing bullet points or tables where beneficial."
    ),
    llm_config=llm_config,
)

# Initialize the meta_reviewer Agent
meta_reviewer = AssistantAgent(
    name="Meta_reviewer",
    system_message=(
        "You are the Meta Reviewer, responsible for overseeing the overall process of report generation. "
        "Your tasks include:\n"
        "1. Evaluating the integration and coherence of inputs from the Writer, Metadata_reviewer, and Web_reviewer.\n"
        "2. Ensuring that the final report aligns with project objectives, standards, and quality benchmarks.\n"
        "3. Assessing the effectiveness of inter-agent communications and workflows, identifying areas for improvement.\n"
        "4. Confirming that all feedback from the Reviewer has been appropriately addressed and incorporated by the Writer.\n"
        "5. Validating the accuracy and consistency of metadata and web information used in the report.\n"
        "6. Approving the final report for completion or requesting further revisions if necessary.\n\n"
        "Provide a summary of your evaluation, highlighting any outstanding issues, and either approve the report for finalization or specify required actions for refinements."
    ),
    llm_config=llm_config,
)



## Multi-agent system

I set up an orchestration of multiple agents

In [None]:
#------------------------
# 8. GroupChat
#------------------------

# Initialize the GroupChat with all agents
groupchat = GroupChat(
    agents=[user_proxy, writer, reviewer, metadata_reviewer,web_reviewer, meta_reviewer],
    messages=[],  # Start with no initial messages
    max_round=10,   # Define the number of interaction rounds as needed
)

# Set up the GroupChatManager
manager = GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config
)

In [None]:
#-----------------------------------
# 8. start chat with combined data
#-----------------------------------


# Combine the RAG result, re-ranking, web result and Wikipedia summary into one message
combined_message = f"""
**Initial Analysis Result:**
{initial_result}

**Re-Ranked Context:**
{context}

**Wikipedia Information:**
{wikipedia_summary}

**Web Information:**
{web_result}
"""

# Start the GroupChat by sending the combined message to the user_proxy
groupchat_result = user_proxy.initiate_chat(
    manager,
    message=combined_message,
    summary_method="last_msg"
)



Admin (to chat_manager):


**Initial Analysis Result:**

You are an expert researcher tasked with providing precise and accurate answers based solely on the provided context.
Avoid generating information. If the answer is not present in the context, respond with "I haven't found the answer."
If unsure, state "I don't know." Do not attempt to infer or create responses beyond the given data.
Context:
[Document(id='6151267d-72e8-4633-ac4b-82d163401b7e', metadata={'producer': 'iText® 5.3.5 ©2000-2012 1T3XT BVBA (SPRINGER SBM; licensed version)', 'creator': 'Springer', 'creationdate': '2024-09-06T14:07:23+05:30', 'keywords': '', 'crossmarkdomains[1]': 'springer.com', 'moddate': '2024-09-06T13:32:25+02:00', 'subject': 'npj Digital Medicine, doi:10.1038/s41746-024-01232-3', 'doi': '10.1038/s41746-024-01232-3', 'author': 'Mateo Aboy', 'crossmarkdomains[2]': 'springerlink.com', 'title': 'Navigating the EU AI Act: implications for regulated digital medical products', 'source': 'https://www.natur