# Lesson 4: Building a Multi-Document Agent

## Setup

In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [None]:
r

## 2. Setup the agent

### Load up TOS

In [4]:

papers = [
 'Twitter.pdf',
 'LinkedIn.pdf',
 'TikTok.pdf',
 'Reddit.pdf',
 'Snapchat.pdf',
 'Meta.pdf']


In [6]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool]

Getting tools for paper: Twitter.pdf
loading index from memory
Getting tools for paper: LinkedIn.pdf
creating and saving index to memory


Parsing nodes:   0%|          | 0/26 [00:00<?, ?it/s]

Getting tools for paper: TikTok.pdf
creating and saving index to memory


Parsing nodes:   0%|          | 0/17 [00:00<?, ?it/s]

Getting tools for paper: Reddit.pdf
creating and saving index to memory


Parsing nodes:   0%|          | 0/6 [00:00<?, ?it/s]

Getting tools for paper: Snapchat.pdf
creating and saving index to memory


Parsing nodes:   0%|          | 0/10 [00:00<?, ?it/s]

Getting tools for paper: Meta.pdf
creating and saving index to memory


Parsing nodes:   0%|          | 0/14 [00:00<?, ?it/s]

### Extend the Agent with Tool Retrieval

In [7]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [8]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex
obj_index = ObjectIndex.from_objects(
        all_tools,
        index_cls=VectorStoreIndex,
        
)

In [9]:
obj_retriever = obj_index.as_retriever(similarity_top_k=4)

In [10]:
tools = obj_retriever.retrieve(
    "Tell me about the privacy policy of meta"
)

In [13]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever =obj_retriever,
    llm=llm,
    system_prompt=""" \
    You are a paralegal agent designed to answer queries over a set of given terms of use documents.
    Please always use the tools provided to answer a question. Do not rely on prior knowledge.
    Always support your response with page numbers and file names provided by the context,
    for example: according to page (page_number) on (document name).......\
    """,
        verbose=True
)

agent = AgentRunner(agent_worker)

In [14]:
response = agent.query(
    "Tell me meta privacy policy"
)
print(str(response))

Added user message to memory: Tell me meta privacy policy
=== Calling Function ===
Calling function: vector_tool_Meta with args: {"query": "privacy policy"}
-----------------------------------------
1
=== Function Output ===
according to page 14 in Meta.pdf : Facebook Pages, Groups and Events Policy : These guidelines apply if you create or
administer a Facebook Page, group or event, or if you use Facebook to communicate or
administer a promotion.
Meta Platform T erms : These guidelines outline the Policies that apply to your use of our
platform (for example, for developers or operators of a platform application or website or if
you use social plugins).
Developer Payment T erms : These T erms apply to developers of applications that use
Facebook Payments.
Community Payment T erms : These terms apply to payments made on or through Meta
Products.
Commerce Policies : These guidelines outline the Policies that apply when you of fer
products and services for sale on Facebook.
Meta brand res

In [15]:
# Define list of questions and platform permutations

platform_permutations = [
    ["Snapchat", "Meta"],                       #Ecosystem and  Privacy-focused
    ["Twitter", "LinkedIn"],                    # Professional group
    ["TikTok", "Reddit"],                       # UGC-heavy group
    ["Reddit", "Meta", "Twitter"],              # Mixed trait group
    ["LinkedIn", "Snapchat", "TikTok"],         # Cross-category group
]

general_questions = [
    "Which of {platforms} offers the strongest privacy protections?",
    "Which of {platforms} has the most transparent advertising policy?",
]

privacy_questions = [
    "What personal data does {platforms} collect from users?",
    "Can {platforms} share user data with third parties? If so, under what circumstances?",
]


ownership_questions = [
    """After uploading a video or photo, do I still retain full ownership on {platforms}?,
    Can {platforms} use my content in advertising materials without asking me?""",
]

violations_questions = [
    "What type of user behavior or content can lead to account suspension on {platforms}?",
]

deletion_questions = [
    "What happens to my personal data if I delete my account on {platforms}?",
    "Can I request permanent deletion of my data from {platforms} servers?"
]

ads_questions = [
    "Does {platforms} allow users to monetize their content? If yes, under what conditions?",
]

legal_questions = [
    """
    Which country’s laws govern the use of {platforms}?,
    Am I required to go through arbitration instead of court for disputes with {platforms}?
    """
]

In [None]:
topics = {
    "General Questions": general_questions,
    #"Privacy & Data Usage": privacy_questions,
    #"Content Ownership & Rights": ownership_questions,
    #"User Responsibilities & Violations": violations_questions,
    #"Account Deletion & Data Retention": deletion_questions,
    #"Ads & Monetization": ads_questions,
    #"Jurisdiction & Dispute Resolution": legal_questions,
}
import time

with open("rag_topicwise_output_General.txt", "w", encoding="utf-8") as f:
    for group in platform_permutations:
        platform_str = ", ".join(group)
        for topic, questions in topics.items():
            f.write(f"\n📚 Topic: {topic} | Platforms: {platform_str}\n{'='*80}\n")
            for q in questions:
                query = q.format(platforms=platform_str)
                response = agent.query(query)
                f.write(f"Query: {query}\n")
                f.write(f"Response:\n{str(response)}\n")
                f.write(f"{'-'*80}\n\n")
                
                # Pause for 45 seconds before the next query
                time.sleep(15)


Added user message to memory: Which of Snapchat, Meta offers the strongest privacy protections?
=== Calling Function ===
Calling function: vector_tool_Snapchat with args: {"query": "privacy protections"}
-----------------------------------------
1
=== Function Output ===
according to page 6 in Snapchat.pdf : We try hard to keep our Services a safe place for all users. But we can’ t guarantee it. That’ s where you come
in. By using the Services, you agree that you will at all times comply with these T erms, including
our Community Guidelines  and any other policies Snap makes available in order to maintain the safety of the
Services.
If you fail to comply , we reserve the right to remove any of fending content, terminate or limit the visibility of
your account, and notify third parties—including law enforcement agencies—and provide those third parties
with information relating to your account. This step may be necessary to protect the safety of our users, and
others, to investigate, rem

-----------------------------------------
1
=== Function Output ===
according to page 5 in Meta.pdf : Return to top
2. How our services are funded
Instead of paying to use Facebook and the other products and services that we of fer, by
using the Meta Products covered by these T erms you agree that we can show you ads that
business and organisations pay us to promote on and of f the Meta Company Products. W e
use your personal data, such as information about your activity and interests, to show you
ads that are more relevant to you.
Protecting people's privacy is central to how we've designed our ad system. This means that
we can show you relevant and useful ads without telling advertisers who you are. W e don't
sell your personal data. W e allow advertisers to tell us things such as their business goal,
and the kind of audience that they want to see their ads (for example, people between the
ages of 18-35 who like cycling). W e then show their ad to people who might be interested.
We a

-----------------------------------------
1
=== Function Output ===
according to page 25 in LinkedIn.pdf :   . Deep-link to our Services for any purpose other than to promote
your profile or a Group on our Services, without LinkedInʼs
consent;
  . Use bots or other automated methods to access the Services,
add or download contacts, send or redirect messages;
  . Monitor the Servicesʼ availability, performance or functionality
for any competitive purpose;
  . Engage in “framing,” “mirroring,” or otherwise simulating the
appearance or function of the Services;
  . Overlay or otherwise modify the Services or their appearance
(such as by inserting elements into the Services or removing,
covering, or obscuring an advertisement included on the
Services);
  . Interfere with the operation of, or place an unreasonable load on,
the Services (e.g., spam, denial of service attack, viruses,
gaming algorithms); and/or
  . Violate the Professional Community Policies or any additional
terms concerning

=== LLM Response ===
Based on the information provided in the terms of use documents:

- TikTok offers privacy protections by allowing users to maintain and promptly update their information, keeping their account password confidential, and notifying the platform immediately if they suspect any unauthorized access to their account. TikTok also reserves the right to suspend or terminate user accounts for various reasons, including violations of terms, security issues, or inactivity (TikTok.pdf, page 4).

- Reddit grants users a personal, non-transferable, non-exclusive, revocable, limited license to use the Services. Reddit's Privacy Policy explains how information is collected, used, and shared with user consent. Reddit reserves the right to modify, suspend, or discontinue the Services at any time (Reddit.pdf, page 1).

Based on the provided information, TikTok and Reddit both offer privacy protections to users, but the extent and specific details of these protections may vary.
Added u

-----------------------------------------
1
=== Function Output ===
according to page 4 in TikTok.pdf : your acceptance of such new T erms. If you do not agree to the new T erms, you must stop accessing or using the
Services and close your account (if applicable).
For modiﬁcations to the T erms or to the Services that we need to make to meet security , safety , legal or
regulatory requirements, we may not be able to notify you in advance but we will let you know as soon as
practicable.
5. Your Account with Us
To access or use some of our Services, you must create an account with us. When you create this account, you
must provide accurate and up-to-date information. It is important that you maintain and promptly update your
details and any other information you provide to us, to keep such information current and complete. The T erms
are also accessible to you on the Platform at all times. It is important that you keep your account password
conﬁdential and that you do not disclose it to 

=== LLM Response ===
Based on the information provided in the terms of use documents:

- Reddit's terms of use document does not specifically mention a transparent advertising policy.
- Twitter's terms of use document does not explicitly state details about a transparent advertising policy.
- TikTok's terms of use document does not contain information about a transparent advertising policy.

Therefore, based on the analysis of the terms of use documents, none of Reddit, Twitter, or TikTok explicitly outline a transparent advertising policy in their terms.
Added user message to memory: Which of LinkedIn, Snapchat, TikTok offers the strongest privacy protections?
=== Calling Function ===
Calling function: vector_tool_LinkedIn with args: {"query": "privacy protections"}
-----------------------------------------
1
=== Function Output ===
according to page 7 in LinkedIn.pdf : You will not share an account with anyone else and will
follow our rules and the law.
Members are account holders. Y

=== LLM Response ===
Based on the information provided in the terms of use documents:

- **LinkedIn** emphasizes the importance of account security and states that members are responsible for their accounts. It also mentions that the party paying for a service has the right to control access to the paid service but not to personal accounts (LinkedIn.pdf, page 7).

- **Snapchat** highlights the importance of user compliance with terms, community guidelines, and safety policies to maintain the safety of the services. It mentions the actions it can take if users fail to comply, such as removing offending content, limiting account visibility, and notifying law enforcement agencies (Snapchat.pdf, page 6).

- **TikTok** stresses the need for accurate and up-to-date information for user accounts, the importance of maintaining password confidentiality, and the requirement to notify immediately if there is a suspicion of unauthorized access. It also outlines the reasons for suspending or termin