# RFP Response Generation Workflow

<a href="https://colab.research.google.com/github/run-llama/llamacloud-demo/blob/main/examples/report_generation/rfp_response/generate_rfp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook shows you how to build a workflow to generate a response to an RFP. 

In this scenario, we assume that you are Microsoft, and you are responding to the [JEDI Cloud RFP](https://imlive.s3.amazonaws.com/Federal%20Government/ID151830346965529215587195222610265670631/HQ0034-18-R-0077.pdf) put out by the federal government. The government is using the submitted responses to decide the best vendor for their needs.

![generate_rfp_img](generate_rfp_img.png)

We index a set of relevant documents that Microsoft has - including its annual report, wikipedia page on Microsoft Azure, a slide deck on the government cloud and cybersecurity capabilities. We then help you build an agentic workflow that can ingest an RFP, and generate a response for it in 
a way that adheres to its guidelines.

We use LlamaCloud to index the documents and get back a set of retrieval endpoints over the documents. This tutorial takes full advantage of LlamaCloud as an e2e RAG platform. If you want to check out a similar tutorial that uses the LlamaParse API, check this other [tutorial out instead](https://github.com/run-llama/llama_parse/blob/main/examples/report_generation/rfp_response/generate_rfp.ipynb).

In [None]:
!pip install llama-index llama-index-indices-llama-cloud llama-cloud llama-parse 

In [1]:
import nest_asyncio

nest_asyncio.apply()

## Setup LlamaCloud Index

We download the context documents for Microsoft to form the knowledge base.
1. Microsoft 2024 10-K 
2. Azure Wikipedia page
3. A slide deck on Microsoft Azure Government
4. Microsoft Digital Defense Report

In [None]:
# microsoft annual report
!wget "https://www.dropbox.com/scl/fi/4v5dx8dc9yqc8k0yw5g4h/msft_10k_2024.pdf?rlkey=jdyfrsoyb18ztlq5msunmibns&st=9w6bdyvn&dl=1" -O data/msft_10k_2024.pdf
# !wget "https://microsoft.gcs-web.com/static-files/1c864583-06f7-40cc-a94d-d11400c83cc8" -O data/msft_10k_2024.pdf

# azure wikipedia page
!wget "https://www.dropbox.com/scl/fi/7waur8ravmve3fe8nej0k/azure_wiki.pdf?rlkey=icru2w64oylx1p76ftt6y9irv&st=fr87vxob&dl=1" -O data/azure_wiki.pdf
# azure government slide deck
!wget "https://cdn.ymaws.com/flclerks.site-ym.com/resource/resmgr/2017_Fall_Conf/Presentations/2018-10-12_FCCC_Microsoft_Az.pdf" -O data/azure_gov.pdf
# microsoft cybersecurity capabilities
!wget "https://www.dropbox.com/scl/fi/qh00xz29rlom4md8ce675/microsoft_ddr.pdf?rlkey=d868nbnsu1ng41y1chw69y64b&st=24iqemb1&dl=1" -O data/msft_ddr.pdf

We then upload these documents to LlamaCloud. For best results:
- Use "3rd Party multi-modal model" in the Parse Settings
- Use Page-level segmentation and "None" for chunking configuration

In [2]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex(
  name="<index_name>", 
  project_name="<project_name>",
  organization_id="<organization_id>",
  # api_key="llx-..."
)
# enter your pipeline/index ID as wells
pipeline_id = "<pipeline_id>"

# define data output directory
data_out_dir = "data_out_rfp"
!mkdir {data_out_dir}

mkdir: data_out_rfp: File exists


We then do a pass to generate summaries as metadata, and attach those onto the documents.

In [3]:
from llama_index.core import SummaryIndex
from llama_index.llms.openai import OpenAI
from llama_cloud.client import LlamaCloud
import os

# setup client
os.environ["LLAMA_CLOUD_BASE_URL"] = "https://api.cloud.llamaindex.ai"
client = LlamaCloud(
    token=os.environ["LLAMA_CLOUD_API_KEY"],
    base_url=os.environ["LLAMA_CLOUD_BASE_URL"]
)
pipeline_docs = client.pipelines.list_pipeline_documents(pipeline_id)

In [4]:
print(len(pipeline_docs))
pipeline_docs[0].metadata

4


{'file_size': 20147265,
 'last_modified_at': '2024-10-20T04:29:30',
 'file_path': 'msft_ddr.pdf',
 'file_name': 'msft_ddr.pdf',
 'pipeline_id': '8788cb8e-34d1-4402-aebf-b52a4dc8fdf3',
 'summary': 'The Microsoft Digital Defense Report, published in October 2023, provides insights into the evolving cyber threat landscape from July 2022 to June 2023, highlighting key developments in cybercrime, nation-state threats, and the importance of collaboration in enhancing global cybersecurity resilience.'}

In [5]:
from llama_index.core.schema import Document
from llama_index.llms.openai import OpenAI

summary_llm = OpenAI(model="gpt-4o-mini")

# generate summaries and attach as metadata on the docs
for pipeline_doc in pipeline_docs:
    doc = Document.from_cloud_document(pipeline_doc)
    index = SummaryIndex([doc])
    response = index.as_query_engine(llm=summary_llm).query(
        "Generate a short 1-2 line summary of this file to help inform an agent on what this file is about."
    )
    print(f">> Generated summary: {str(response)}")
    # change the metadata of the document
    pipeline_doc.metadata["summary"] = str(response)

>> Generated summary: This file provides a comprehensive overview of Microsoft Azure, covering its history, various cloud services, deployment models, and key features such as AI capabilities, identity management, and storage solutions.
>> Generated summary: The document is the Microsoft Digital Defense Report, published in October 2023, which analyzes the evolving cyber threat landscape from July 2022 to June 2023, focusing on developments in cybercrime, nation-state threats, and the significance of collaboration in enhancing global cybersecurity resilience.
>> Generated summary: This file is Microsoft's Annual Report on Form 10-K for the fiscal year ended June 30, 2023, which outlines the company's business operations, financial performance, and strategic initiatives, highlighting significant growth in cloud services and investments in artificial intelligence.
>> Generated summary: This file provides an overview of Microsoft Azure Government, detailing its secure and compliant cloud 

In [5]:
# upsert new documents to vector database
upserted_docs = client.pipelines.upsert_batch_pipeline_documents(pipeline_id, request=pipeline_docs)
upserted_docs[0].metadata

{'file_size': 1286244,
 'last_modified_at': '2024-10-20T04:29:30',
 'file_path': 'azure_wiki.pdf',
 'file_name': 'azure_wiki.pdf',
 'pipeline_id': '8788cb8e-34d1-4402-aebf-b52a4dc8fdf3',
 'summary': 'This file provides an overview of Microsoft Azure, detailing its history, services, deployment models, and key features, including cloud computing capabilities, identity management, storage solutions, and AI services.'}

In [5]:
# verify metadata has been inserted and get file names
pipeline_docs = client.pipelines.list_pipeline_documents(pipeline_id)

### Define Retrievers

Define retrievers, one for each file. 

With LlamaCloud, you can get access to both **chunk** and **document**-level retrieval.

In [5]:
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)
from llama_index.core.tools import FunctionTool
from llama_index.core.schema import NodeWithScore
from pathlib import Path
from typing import Optional, List

DOC_RETRIEVE_PREFIX = """\
Synthesizes an answer to your question by feeding in in the entire relevant document as context. Best used for higher-level summarization options.
Do NOT use if answer can be found in a specific chunk of a given document. Use the chunk_query_engine instead for that purpose.

Document: {file_name}
"""

CHUNK_RETRIEVE_PREFIX = """\
Synthesizes an answer to your question by feeding in relevant chunks of a document as context. Best used for questions that are more pointed in nature.
Do NOT use if the question asks seems to require a general summary of any given document. Use the doc_query_engine instead for that purpose.

Document: {file_name}
"""


# function tools
def generate_tool(
    file: str, 
    file_description: Optional[str] = None,
    retrieve_document: bool = False
):
    """Return a function that retrieves only within a given file."""
    filters = MetadataFilters(
        filters=[
            MetadataFilter(key="file_path", operator=FilterOperator.EQ, value=file),
        ]
    )

    def chunk_retriever_fn(query: str) -> str:
        retriever = index.as_retriever(similarity_top_k=5, filters=filters)
        nodes = retriever.retrieve(query)

        full_text = "\n\n========================\n\n".join(
            [n.get_content(metadata_mode="all") for n in nodes]
        )

        return full_text

    # define name as a function of the file
    fn_name = Path(file).stem + "_retrieve"

    tool_description_tmpl = DOC_RETRIEVE_PREFIX if retrieve_document else CHUNK_RETRIEVE_PREFIX
    tool_description = tool_description_tmpl.format(file_name=file)
    if file_description is not None:
        tool_description += f"\n\nFile Description: {file_description}"

    tool = FunctionTool.from_defaults(
        fn=chunk_retriever_fn, name=fn_name, description=tool_description
    )

    return tool


# generate tools - include both chunk-level and document-level retrieval
tools = []
for pipeline_doc in pipeline_docs:
    # chunk-level tool
    file_name = pipeline_doc.metadata["file_name"]
    summary = pipeline_doc.metadata["summary"]
    tools.append(generate_tool(file_name, file_description=summary))
    # document-level tool
    tools.append(
        generate_tool(
            file_name, 
            file_description=summary,
            retrieve_document=True
        )
    )

In [6]:
# validate an existing function
tools[0].metadata

ToolMetadata(description='Synthesizes an answer to your question by feeding in relevant chunks of a document as context. Best used for questions that are more pointed in nature.\nDo NOT use if the question asks seems to require a general summary of any given document. Use the doc_query_engine instead for that purpose.\n\nDocument: msft_ddr.pdf\n\n\nFile Description: The Microsoft Digital Defense Report, published in October 2023, provides insights into the evolving cyber threat landscape from July 2022 to June 2023, highlighting key developments in cybercrime, nation-state threats, and the importance of collaboration in enhancing global cybersecurity resilience.', name='msft_ddr_retrieve', fn_schema=<class 'llama_index.core.tools.utils.msft_ddr_retrieve'>, return_direct=False)

## Build Workflow

Let's build a workflow that can iterate through the extracted keys/questions from the RFP, and fill them out! 

The user specifies an RFP document as input. The workflow then goes through the following steps:
1. We parse the RFP template using LlamaParse
2. We then extract out the relevant questions we'd want to ask the knowledge base given the instructions in the RFP
3. For each question, we query the knowledge base using a specialized agent to generate a response. The agent is equipped with a set of retrieval tools over the data.
4. We concatenate the questions/answers into a list of dictionaries.
5. Given the question/answer pairs, we feed it along with the source RFP template into a prompt to generate the final report.



We download the [JEDI RFP template](https://imlive.s3.amazonaws.com/Federal%20Government/ID151830346965529215587195222610265670631/HQ0034-18-R-0077.pdf).

In [35]:
# download JEDI Cloud RFP Template
!mkdir -p data
!wget "https://imlive.s3.amazonaws.com/Federal%20Government/ID151830346965529215587195222610265670631/HQ0034-18-R-0077.pdf" -O data/jedi_cloud_rfp.pdf

--2024-10-19 22:21:53--  https://imlive.s3.amazonaws.com/Federal%20Government/ID151830346965529215587195222610265670631/HQ0034-18-R-0077.pdf
Resolving imlive.s3.amazonaws.com (imlive.s3.amazonaws.com)... 52.219.193.121, 52.219.220.161, 52.219.220.73, ...
Connecting to imlive.s3.amazonaws.com (imlive.s3.amazonaws.com)|52.219.193.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 864798 (845K) [application/pdf]
Saving to: ‘data/jedi_cloud_rfp.pdf’


2024-10-19 22:21:53 (18.9 MB/s) - ‘data/jedi_cloud_rfp.pdf’ saved [864798/864798]



We setup LlamaParse (accurate/markdown mode) to parse the RFP.

In [15]:
from llama_parse import LlamaParse

# use our multimodal models for extractions
parser = LlamaParse(result_type="markdown")

In [11]:
from llama_index.core.workflow import (
    Event,
    StartEvent,
    StopEvent,
    Context,
    Workflow,
    step,
)
from llama_index.core.llms import LLM
from typing import Optional
from pydantic import BaseModel
from llama_index.core.schema import Document
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.prompts import PromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole
import logging
import json
import os

_logger = logging.getLogger(__name__)
_logger.setLevel(logging.INFO)


# this is the research agent's system prompt, tasked with answering a specific question
AGENT_SYSTEM_PROMPT = """\
You are a research agent tasked with filling out a specific form key/question with the appropriate value, given a bank of context.
You are given a specific form key/question. Think step-by-step and use the existing set of tools to help answer the question.

You MUST always use at least one tool to answer each question. Only after you've determined that existing tools do not \
answer the question should you try to reason from first principles and prior knowledge to answer the question.

You MUST try to answer the question instead of only saying 'I dont know'.

"""

# This is the prompt tasked with extracting information from an RFP file.
EXTRACT_KEYS_PROMPT = """\
You are provided an entire RFP document, or a large subsection from it. 

We wish to generate a response to the RFP in a way that adheres to the instructions within the RFP, \
including the specific sections that an RFP response should contain, and the content that would need to go \
into each section.

Your task is to extract out a list of "questions", where each question corresponds to a specific section that is required in the RFP response.
Put another way, after we extract out the questions we will go through each question and answer each one \
with our downstream research assistant, and the combined
question:answer pairs will constitute the full RFP response.

You must TRY to extract out questions that can be answered by the provided knowledge base. We provide the list of file metadata below. 

Additional requirements:
- Try to make the questions SPECIFIC given your knowledge of the RFP and the knowledge base. Instead of asking a question like \
"How do we ensure security" ask a question that actually addresses a security requirement in the RFP and can be addressed by the knowledge base.
- Make sure the questions are comprehensive and addresses all the RFP requirements.
- Make sure each question is descriptive - this gives our downstream assistant context to fill out the value for that question 
- Extract out all the questions as a list of strings.


Knowledge Base Files:
{file_metadata}

RFP Full Template:
{rfp_text}

"""

# this is the prompt that generates the final RFP response given the original template text and question-answer pairs.
GENERATE_OUTPUT_PROMPT = """\
You are an expert analyst.
Your task is to generate an RFP response according to the given RFP and question/answer pairs.

You are given the following RFP and qa pairs:

<rfp_document>
{output_template}
</rfp_document>

<question_answer_pairs>
{answers}
</question_answer_pairs>

Not every question has an appropriate answer. This is because the agent tasked with answering the question did not have the right context to answer it.
If this is the case, you MUST come up with an answer that is reasonable. You CANNOT say that you are unsure in any area of the RFP response.


Please generate the output according to the template and the answers, in markdown format.
Directly output the generated markdown content, do not add any additional text, such as "```markdown" or "Here is the output:".
Follow the original format of the template as closely as possible, and fill in the answers into the appropriate sections.
"""


class OutputQuestions(BaseModel):
    """List of keys that make up the sections of the RFP response."""

    questions: List[str]


class OutputTemplateEvent(Event):
    docs: List[Document]


class QuestionsExtractedEvent(Event):
    questions: List[str]


class HandleQuestionEvent(Event):
    question: str


class QuestionAnsweredEvent(Event):
    question: str
    answer: str


class CollectedAnswersEvent(Event):
    combined_answers: str


class LogEvent(Event):
    msg: str
    delta: bool = False
    # clear_previous: bool = False


class RFPWorkflow(Workflow):
    """RFP workflow."""

    def __init__(
        self,
        tools,
        parser: LlamaParse,
        llm: LLM | None = None,
        similarity_top_k: int = 20,
        output_dir: str = data_out_dir,
        agent_system_prompt: str = AGENT_SYSTEM_PROMPT,
        generate_output_prompt: str = GENERATE_OUTPUT_PROMPT,
        extract_keys_prompt: str = EXTRACT_KEYS_PROMPT,
        **kwargs,
    ) -> None:
        """Init params."""
        super().__init__(**kwargs)
        self.tools = tools

        self.parser = parser

        self.llm = llm or OpenAI(model="gpt-4o-mini")
        self.similarity_top_k = similarity_top_k

        self.output_dir = output_dir

        self.agent_system_prompt = agent_system_prompt
        self.extract_keys_prompt = extract_keys_prompt

        # if not exists, create
        out_path = Path(self.output_dir) / "workflow_output"
        if not out_path.exists():
            out_path.mkdir(parents=True, exist_ok=True)
            os.chmod(str(out_path), 0o0777)

        self.generate_output_prompt = PromptTemplate(generate_output_prompt)

    @step
    async def parse_output_template(
        self, ctx: Context, ev: StartEvent
    ) -> OutputTemplateEvent:
        # load output template file
        out_template_path = Path(
            f"{self.output_dir}/workflow_output/output_template.jsonl"
        )
        if out_template_path.exists():
            with open(out_template_path, "r") as f:
                docs = [Document.model_validate_json(line) for line in f]
        else:
            docs = await self.parser.aload_data(ev.rfp_template_path)
            # save output template to file
            with open(out_template_path, "w") as f:
                for doc in docs:
                    f.write(doc.model_dump_json())
                    f.write("\n")

        await ctx.set("output_template", docs)
        return OutputTemplateEvent(docs=docs)

    @step
    async def extract_questions(
        self, ctx: Context, ev: OutputTemplateEvent
    ) -> HandleQuestionEvent:
        docs = ev.docs

        # save all_questions to file
        out_keys_path = Path(f"{self.output_dir}/workflow_output/all_keys.txt")
        if out_keys_path.exists():
            with open(out_keys_path, "r") as f:
                output_qs = [q.strip() for q in f.readlines()]
        else:
            # try stuffing all text into the prompt
            all_text = "\n\n".join([d.get_content(metadata_mode="all") for d in docs])
            prompt = PromptTemplate(template=self.extract_keys_prompt)

            file_metadata = "\n\n".join([f"Name:{t.metadata.name}\nDescription:{t.metadata.description}" for t in tools])
            try:
                if self._verbose:
                    ctx.write_event_to_stream(LogEvent(msg=">> Extracting questions from LLM"))
                
                output_qs = self.llm.structured_predict(
                    OutputQuestions, 
                    prompt, 
                    file_metadata=file_metadata,
                    rfp_text=all_text,
                ).questions

                if self._verbose:
                    qs_text = "\n".join([f"* {q}" for q in output_qs])
                    ctx.write_event_to_stream(LogEvent(msg=f">> Questions:\n{qs_text}"))
            
            except Exception as e:
                _logger.error(f"Error extracting questions from page: {all_text}")
                _logger.error(e)

            with open(out_keys_path, "w") as f:
                f.write("\n".join(output_qs))

        await ctx.set("num_to_collect", len(output_qs))

        for question in output_qs:
            ctx.send_event(HandleQuestionEvent(question=question))

        return None

    @step
    async def handle_question(
        self, ctx: Context, ev: HandleQuestionEvent
    ) -> QuestionAnsweredEvent:
        question = ev.question

        # initialize a Function Calling "research" agent where given a task, it can pull responses from relevant tools and synthesize over it
        research_agent = FunctionCallingAgentWorker.from_tools(
            tools, llm=llm, verbose=False, system_prompt=self.agent_system_prompt
        ).as_agent()

        # ensure the agent's memory is cleared
        response = await research_agent.aquery(question)

        if self._verbose:
            # instead of printing the message directly, write the event to stream!
            msg = f">> Asked question: {question}\n>> Got response: {str(response)}"
            ctx.write_event_to_stream(LogEvent(msg=msg))

        return QuestionAnsweredEvent(question=question, answer=str(response))

    @step
    async def combine_answers(
        self, ctx: Context, ev: QuestionAnsweredEvent
    ) -> CollectedAnswersEvent:
        num_to_collect = await ctx.get("num_to_collect")
        results = ctx.collect_events(ev, [QuestionAnsweredEvent] * num_to_collect)
        if results is None:
            return None

        combined_answers = "\n".join([result.model_dump_json() for result in results])
        # save combined_answers to file
        with open(
            f"{self.output_dir}/workflow_output/combined_answers.jsonl", "w"
        ) as f:
            f.write(combined_answers)

        return CollectedAnswersEvent(combined_answers=combined_answers)

    @step
    async def generate_output(
        self, ctx: Context, ev: CollectedAnswersEvent
    ) -> StopEvent:
        output_template = await ctx.get("output_template")
        output_template = "\n".join(
            [doc.get_content("none") for doc in output_template]
        )

        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> GENERATING FINAL OUTPUT"))

        resp = await self.llm.astream(
            self.generate_output_prompt,
            output_template=output_template,
            answers=ev.combined_answers,
        )

        final_output = ""
        async for r in resp:
            ctx.write_event_to_stream(LogEvent(msg=r, delta=True))
            final_output += r

        # save final_output to file
        with open(f"{self.output_dir}/workflow_output/final_output.md", "w") as f:
            f.write(final_output)

        return StopEvent(result=final_output)

In [12]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")
workflow = RFPWorkflow(
    tools,
    parser=parser,
    llm=llm,
    verbose=True,
    timeout=None,  # don't worry about timeout to make sure it completes
)

#### Visualize the workflow

In [13]:
from llama_index.utils.workflow import draw_all_possible_flows

draw_all_possible_flows(RFPWorkflow, filename="rfp_workflow.html")

rfp_workflow.html


## Run the Workflow

Let's run the full workflow and generate the output! 

This will take 5-20 minutes to run and complete. You can inspect the intermediate verbose outputs below as the intermediate questions/answers are generated. The response is streamed back to the user at the end - the response itself is quite long so will take a while to complete! You can also integrate with an observability provider like LlamaTrace/Arize Phoenix in order to view the results.

In [14]:
from IPython.display import clear_output

handler = workflow.run(rfp_template_path="data/jedi_cloud_rfp.pdf")
async for event in handler.stream_events():
    if isinstance(event, LogEvent):

        if event.delta:
            print(event.msg, end="")
        else:
            print(event.msg)

response = await handler
print(str(response))

Running step parse_output_template
Started parsing the file under job_id 7d9930d7-a038-4ee6-a22d-d3d585dfc8fd
Step parse_output_template produced event OutputTemplateEvent
Running step extract_questions
Step extract_questions produced no event
Running step handle_question
>> Extracting questions from LLM
>> Questions: * What is the proposed approach for achieving secure data transfer using a Transfer Cross Domain Solution consistent with the 2018 Raise the Bar Cross Domain Solution Design and Implementation Requirements?
* How will the proposed Transfer Cross Domain Solution address secure one-way data transfer between logical enclaves within JEDI Cloud, to external destinations, and across classification levels?
* What is the proposed logical isolation architecture and implementation for unclassified and classified offerings, specifically regarding encryption of data at rest and in transit?
* How does the proposed solution ensure logical separation with cryptographic certainty of proc