# Chat with PDF - test, evaluation and experimentation


We will walk you through how to use prompt flow Python SDK to test, evaluate and experiment with the "Chat with PDF" flow.

## 0. Install dependencies

In [None]:
%pip install -r requirements.txt

## 1. Create connections
Connection in prompt flow is for managing settings of your application behaviors incl. how to talk to different services (Azure OpenAI for example).

In [2]:
import promptflow

pf = promptflow.PFClient()

# List all the available connections
for c in pf.connections.list():
    print(c.name + " (" + c.type + ")")



sweden-aoai (AzureOpenAI)
doc-intelligence-connection (Custom)
acs-connection (CognitiveSearch)
aoaisweden505 (AzureOpenAI)
open_ai_connection (AzureOpenAI)


You will need to have a connection named "open_ai_connection" to run the chat_with_pdf flow.

In [3]:
# create needed connection
from promptflow.entities import AzureOpenAIConnection, OpenAIConnection

try:
    conn_name = "open_ai_connection"
    conn = pf.connections.get(name=conn_name)
    print("using existing connection")
except:
    # Follow https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal to create an Azure OpenAI resource.
    connection = AzureOpenAIConnection(
        name=conn_name,
        api_key="<user-input>",
        api_base="<test_base>",
        api_type="azure",
        api_version="<test_version>",
    )

    # use this if you have an existing OpenAI account
    # connection = OpenAIConnection(
    #     name=conn_name,
    #     api_key="<user-input>",
    # )
    conn = pf.connections.create_or_update(connection)
    print("successfully created connection")

print(conn)

using existing connection
auth_mode: key
name: open_ai_connection
module: promptflow.connections
type: azure_open_ai
api_key: '******'
api_base: https://aoai-sweden-505.openai.azure.com/
api_type: azure
api_version: '2024-02-01'
resource_id: 
  /subscriptions/f804f2da-c27b-45ac-bf80-16d4d331776d/resourceGroups/re-aoai-505/providers/Microsoft.CognitiveServices/accounts/aoai-sweden-505



## 2. Test the flow

**Note**: this sample uses `predownloaded PDFs` and `prebuilt FAISS Index` to speed up execution time.
You can remove the folders to start a fresh run.

In [4]:
# ./chat_with_pdf/.pdfs/ stores predownloaded PDFs
# ./chat_with_pdf/.index/ stores prebuilt index files

output = pf.flows.test(
    ".",
    inputs={
        "chat_history": [],
        "pdf_url": "https://arxiv.org/pdf/1810.04805.pdf",
        "question": "what is BERT?",
    },
)
print(output)

Prompt flow service has started...
2025-02-27 08:51:39 +0000   73187 execution.flow     INFO     Start executing nodes in thread pool mode.
2025-02-27 08:51:39 +0000   73187 execution.flow     INFO     Start to run 6 nodes with concurrency level 16.
2025-02-27 08:51:39 +0000   73187 execution.flow     INFO     Executing node setup_env. node run id: bc945476-ea24-4898-ab0f-28ca6922ff98_setup_env_0
2025-02-27 08:51:39 +0000   73187 execution.flow     INFO     Node setup_env completes.
2025-02-27 08:51:39 +0000   73187 execution.flow     INFO     Executing node download_tool. node run id: bc945476-ea24-4898-ab0f-28ca6922ff98_download_tool_0
2025-02-27 08:51:39 +0000   73187 execution.flow     INFO     Executing node rewrite_question_tool. node run id: bc945476-ea24-4898-ab0f-28ca6922ff98_rewrite_question_tool_0
2025-02-27 08:51:39 +0000   73187 execution.flow     INFO     [download_tool in line 0 (index starts from 0)] stdout> Pdf already exists in /home/azureuser/promptflow-demo/chat-wit

You can view the trace detail from the following URL:
http://127.0.0.1:23333/v1.0/ui/traces/?#collection=flow&uiTraceId=0x4ca7d7fb900f1409a3d18e5865c3bed8
You can view the trace detail from the following URL:
http://127.0.0.1:23333/v1.0/ui/traces/?#collection=flow&uiTraceId=0xf7f851022d2c2c01931801d1be815b30
You can view the trace detail from the following URL:
http://127.0.0.1:23333/v1.0/ui/traces/?#collection=flow&uiTraceId=0xda3ccf5fe4e4a9bcf888c157f6014132
You can view the trace detail from the following URL:
http://127.0.0.1:23333/v1.0/ui/traces/?#collection=flow&uiTraceId=0xac5578e82905660f28e29e8d984acc5b
You can view the trace detail from the following URL:
http://127.0.0.1:23333/v1.0/ui/traces/?#collection=flow&uiTraceId=0xae237c6f528918c592d8da8cb26a9513
You can view the trace detail from the following URL:
http://127.0.0.1:23333/v1.0/ui/traces/?#collection=flow&uiTraceId=0xc98521354913eb3e45e9471e7bb8d47d
You can view the trace detail from the following URL:
http://127.0.0.1

## 3. Run the flow with a data file

In [5]:
flow_path = "."
data_path = "./data/bert-paper-qna-3-line.jsonl"

config_2k_context = {
    "EMBEDDING_MODEL_DEPLOYMENT_NAME": "text-embedding-ada-002",
    "CHAT_MODEL_DEPLOYMENT_NAME": "gpt-4",  # change this to the name of your deployment if you're using Azure OpenAI
    "PROMPT_TOKEN_LIMIT": 2000,
    "MAX_COMPLETION_TOKENS": 256,
    "VERBOSE": True,
    "CHUNK_SIZE": 1024,
    "CHUNK_OVERLAP": 64,
}

column_mapping = {
    "question": "${data.question}",
    "pdf_url": "${data.pdf_url}",
    "chat_history": "${data.chat_history}",
    "config": config_2k_context,
}
run_2k_context = pf.run(flow=flow_path, data=data_path, column_mapping=column_mapping)
pf.stream(run_2k_context)

print(run_2k_context)

[2025-02-27 08:51:50 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run flow_variant_0_20250227_085150_740749, log path: /home/azureuser/.promptflow/.runs/flow_variant_0_20250227_085150_740749/logs.txt


Prompt flow service has started...
You can view the traces in local from http://127.0.0.1:23333/v1.0/ui/traces/?#run=flow_variant_0_20250227_085150_740749
2025-02-27 08:52:11 +0000   76430 execution.bulk     INFO     Process 76506 terminated.
2025-02-27 08:51:51 +0000   73187 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-02-27 08:51:53 +0000   73187 execution.bulk     INFO     Set process count to 3 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 3}.
2025-02-27 08:51:55 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-2:1)-Process id(76498)-Line number(0) start execution.
2025-02-27 08:51:55 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-2:2)-Process id(76506)-Line number(1) start execution.
2025-02-27 08:51:55 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-2:3)-Process id(76513)-Line number(2) start execution.
2025-0

In [6]:
pf.get_details(run_2k_context)

Unnamed: 0,inputs.question,inputs.pdf_url,inputs.chat_history,inputs.config,inputs.line_number,outputs.answer,outputs.context
0,What is the main difference between BERT and p...,https://arxiv.org/pdf/1810.04805.pdf,[],{'EMBEDDING_MODEL_DEPLOYMENT_NAME': 'text-embe...,0,Las diferencias clave entre BERT y los modelos...,[.\n• We show that pre-trained representations...
1,What is the size of the vocabulary used by BERT?,https://arxiv.org/pdf/1810.04805.pdf,[],{'EMBEDDING_MODEL_DEPLOYMENT_NAME': 'text-embe...,1,"Based on the provided context, there is no inf...","[E (L=12, H=768, A=12, Total Param-\neters=110..."
2,论文写作中论文引言有什么注意事项？,https://grs.pku.edu.cn/docs/2018-03/2018030108...,[],{'EMBEDDING_MODEL_DEPLOYMENT_NAME': 'text-embe...,2,撰写学术论文的引言部分时，应注意以下要点：\n\n1. 引言内容应简明扼要，切合主题，引出论...,[须言之成理，论据可靠 ，严格遵循本学科国际通行的学术规范。 内容包括：第 一\n章引言（或...


## 4. Evaluate the "groundedness"
The `eval-groundedness flow` is using ChatGPT/GPT4 model to grade the answers generated by chat-with-pdf flow.

In [8]:
eval_groundedness_flow_path = "../../evaluation/eval-groundedness/"
eval_groundedness_2k_context = pf.run(
    flow=eval_groundedness_flow_path,
    run=run_2k_context,
    column_mapping={
        "question": "${run.inputs.question}",
        "answer": "${run.outputs.answer}",
        "context": "${run.outputs.context}",
    },
    display_name="eval_groundedness_2k_context",
)
pf.stream(eval_groundedness_2k_context)

print(eval_groundedness_2k_context)

[2025-02-27 08:54:35 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run eval_groundedness_variant_0_20250227_085435_742659, log path: /home/azureuser/.promptflow/.runs/eval_groundedness_variant_0_20250227_085435_742659/logs.txt


Prompt flow service has started...
You can view the traces in local from http://127.0.0.1:23333/v1.0/ui/traces/?#run=eval_groundedness_variant_0_20250227_085435_742659
2025-02-27 08:54:41 +0000   78545 execution.bulk     INFO     Process 78597 terminated.
2025-02-27 08:54:37 +0000   73187 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-02-27 08:54:37 +0000   73187 execution.bulk     INFO     Set process count to 3 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 3}.
2025-02-27 08:54:39 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-4:1)-Process id(78591)-Line number(0) start execution.
2025-02-27 08:54:39 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-4:2)-Process id(78597)-Line number(1) start execution.
2025-02-27 08:54:39 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-4:3)-Process id(78605)-Line number(2) start exec

In [9]:
pf.get_details(eval_groundedness_2k_context)

Unnamed: 0,inputs.question,inputs.answer,inputs.context,inputs.line_number,outputs.groundedness
0,What is the main difference between BERT and p...,Las diferencias clave entre BERT y los modelos...,['.\n• We show that pre-trained representation...,0,9
1,What is the size of the vocabulary used by BERT?,"Based on the provided context, there is no inf...","['E (L=12, H=768, A=12, Total Param-\neters=11...",1,1
2,论文写作中论文引言有什么注意事项？,撰写学术论文的引言部分时，应注意以下要点：\n\n1. 引言内容应简明扼要，切合主题，引出论...,['须言之成理，论据可靠 ，严格遵循本学科国际通行的学术规范。 内容包括：第 一\n章引言（...,2,7


In [10]:
pf.get_metrics(eval_groundedness_2k_context)

{'groundedness': 5.666666666666667}

In [11]:
pf.visualize(eval_groundedness_2k_context)



The HTML file is generated at '/tmp/pf-visualize-detail-u89sujog.html'.
Trying to view the result in a web browser...
Successfully visualized from the web browser.


You will see a web page like this. It gives you detail about how each row is graded and even the details how the evaluation run executes:
![pf-visualize-screenshot](./media/chat-with-pdf/pf-visualize-screenshot.png)

## 5. Try a different configuration and evaluate again - experimentation

NOTE: since we only use 3 lines of test data in this example, and because of the non-deterministic nature of LLMs, don't be surprised if you see exact same metrics when you run this process.

In [12]:
config_3k_context = {
    "EMBEDDING_MODEL_DEPLOYMENT_NAME": "text-embedding-ada-002",
    "CHAT_MODEL_DEPLOYMENT_NAME": "gpt-4",  # change this to the name of your deployment if you're using Azure OpenAI
    "PROMPT_TOKEN_LIMIT": 3000,
    "MAX_COMPLETION_TOKENS": 256,
    "VERBOSE": True,
    "CHUNK_SIZE": 1024,
    "CHUNK_OVERLAP": 64,
}

run_3k_context = pf.run(flow=flow_path, data=data_path, column_mapping=column_mapping)
pf.stream(run_3k_context)

print(run_3k_context)

[2025-02-27 08:56:10 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run flow_variant_0_20250227_085610_947108, log path: /home/azureuser/.promptflow/.runs/flow_variant_0_20250227_085610_947108/logs.txt


Prompt flow service has started...
You can view the traces in local from http://127.0.0.1:23333/v1.0/ui/traces/?#run=flow_variant_0_20250227_085610_947108
2025-02-27 08:56:27 +0000   79837 execution.bulk     INFO     Process 79895 terminated.
2025-02-27 08:56:12 +0000   73187 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-02-27 08:56:12 +0000   73187 execution.bulk     INFO     Set process count to 3 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 3}.
2025-02-27 08:56:14 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-8:1)-Process id(79880)-Line number(0) start execution.
2025-02-27 08:56:14 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-8:3)-Process id(79895)-Line number(1) start execution.
2025-02-27 08:56:14 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-8:2)-Process id(79887)-Line number(2) start execution.
2025-0

In [13]:
eval_groundedness_3k_context = pf.run(
    flow=eval_groundedness_flow_path,
    run=run_3k_context,
    column_mapping={
        "question": "${run.inputs.question}",
        "answer": "${run.outputs.answer}",
        "context": "${run.outputs.context}",
    },
    display_name="eval_groundedness_3k_context",
)
pf.stream(eval_groundedness_3k_context)

print(eval_groundedness_3k_context)

[2025-02-27 08:56:29 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run eval_groundedness_variant_0_20250227_085629_061524, log path: /home/azureuser/.promptflow/.runs/eval_groundedness_variant_0_20250227_085629_061524/logs.txt


Prompt flow service has started...
You can view the traces in local from http://127.0.0.1:23333/v1.0/ui/traces/?#run=eval_groundedness_variant_0_20250227_085629_061524
2025-02-27 08:56:34 +0000   80186 execution.bulk     INFO     Process 80237 terminated.
2025-02-27 08:56:30 +0000   73187 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-02-27 08:56:30 +0000   73187 execution.bulk     INFO     Set process count to 3 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 3}.
2025-02-27 08:56:32 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-10:1)-Process id(80230)-Line number(0) start execution.
2025-02-27 08:56:32 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-10:3)-Process id(80245)-Line number(1) start execution.
2025-02-27 08:56:32 +0000   73187 execution.bulk     INFO     Process name(ForkProcess-10:2)-Process id(80237)-Line number(2) start e

In [14]:
pf.get_details(eval_groundedness_3k_context)

Unnamed: 0,inputs.question,inputs.answer,inputs.context,inputs.line_number,outputs.groundedness
0,What is the main difference between BERT and p...,BERT differs from earlier language representat...,['BERT: Pre-training of Deep Bidirectional Tra...,0,9
1,What is the size of the vocabulary used by BERT?,The provided contexts do not specify the exact...,"['E (L=12, H=768, A=12, Total Param-\neters=11...",1,1
2,论文写作中论文引言有什么注意事项？,撰写学术论文时，引言部分应包括第一章引言（或绪论、序言、导论等），内容要有逻辑性，并且严格遵...,['须言之成理，论据可靠 ，严格遵循本学科国际通行的学术规范。 内容包括：第 一\n章引言（...,2,7


In [16]:
pf.get_metrics(eval_groundedness_3k_context)

{'groundedness': 5.666666666666667}

In [15]:
pf.visualize([eval_groundedness_2k_context, eval_groundedness_3k_context])

The HTML file is generated at '/tmp/pf-visualize-detail-g_y063kw.html'.
Trying to view the result in a web browser...
Successfully visualized from the web browser.
