<a href="https://colab.research.google.com/github/kynthesis/HaystackResearch/blob/main/7_Agent_QA_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Cách xây dựng một pipeline QA suy luận đa bước với Agent**



# 1. Kiểm tra GPU runtime

In [1]:
%%bash

nvidia-smi

Sun Jul  2 12:13:04 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   31C    P0    43W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# 2. Cài đặt Haystack

In [None]:
%%bash

pip install --upgrade pip
pip install farm-haystack[colab,inference]
pip install datasets

# 3. Bật chế độ logging

In [3]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

# 3. Khởi tạo DocumentStore

In [4]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_bm25=True)

INFO:haystack.telemetry:Haystack sends anonymous usage data to understand the actual usage and steer dev efforts towards features that are most meaningful to users. You can opt-out at anytime by manually setting the environment variable HAYSTACK_TELEMETRY_ENABLED as described for different operating systems in the [documentation page](https://docs.haystack.deepset.ai/docs/telemetry#how-can-i-opt-out). More information at [Telemetry](https://docs.haystack.deepset.ai/docs/telemetry).
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


# 4. Chuẩn bị các file tài liệu

In [5]:
from haystack.utils import fetch_archive_from_http

doc_dir = "data/witcher"

fetch_archive_from_http(
    url="https://github.com/kynthesis/HaystackResearch/raw/main/witcher.zip",
    output_dir=doc_dir,
)

INFO:haystack.utils.import_utils:Fetching from https://github.com/kynthesis/HaystackResearch/raw/main/witcher.zip to 'data/witcher'


True

# 5. Indexing các file tài liệu vào DocumentStore

In [None]:
import os
from haystack.pipelines.standard_pipelines import TextIndexingPipeline

files_to_index = [doc_dir + "/" + f for f in os.listdir(doc_dir)]
indexing_pipeline = TextIndexingPipeline(document_store)
indexing_pipeline.run_batch(file_paths=files_to_index)

# 6. Tạo pipeline QA gồm Retriever và Reader

In [23]:
from haystack.nodes import EmbeddingRetriever, FARMReader
from haystack.pipelines import ExtractiveQAPipeline

retriever = EmbeddingRetriever(
    document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1", use_gpu=True
)
document_store.update_embeddings(retriever=retriever)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
witcher_qa = ExtractiveQAPipeline(reader=reader, retriever=retriever)

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.nodes.retriever.dense:Init retriever using embeddings of model sentence-transformers/multi-qa-mpnet-base-dot-v1
INFO:haystack.document_stores.memory:Updating embeddings for 0 docs ...


Updating Embedding:   0%|          | 0/4307 [00:00<?, ? docs/s]

Batches:   0%|          | 0/135 [00:00<?, ?it/s]

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'deepset/roberta-base-squad2' (Roberta)
INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'deepset/roberta-base-squad2' (Roberta model) from model hub.
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


# 7. Thử nghiệm pipeline QA

In [35]:
from haystack.utils import print_answers

result = witcher_qa.run("What is the hair color of Geralt's adoptive daughter?")

print_answers(result, "minimum")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

"Query: What is the hair color of Geralt's adoptive daughter?"
'Answers:'
[   {   'answer': 'celadon-green',
        'context': ' raised, the young mermaid stretched charmingly, shaking '
                   'out her wet celadon-green hair, and began to sing '
                   'melodiously.\n'
                   '"What?"The duke leaned over th'},
    {   'answer': 'honey',
        'context': 'rtly, her lips thinning.\n'
                   'Geralt shrugged his shoulders and turned to the '
                   'honey-haired one. She seemed to him to be the youngest of '
                   'the three, but he c'},
    {   'answer': 'straw',
        'context': 'eyes the colour of the sea.\n'
                   '\n'
                   'She was almost as tall as him. She wore her straw-coloured '
                   'hair unevenly cut, just below the\n'
                   'ears. She stood with one han'},
    {   'answer': 'Nut-brown',
        'context': ' deep breath, bowed her head, and stare

# 8. Sử dụng API OpenAI

In [14]:
from getpass import getpass

api_key_prompt = "Enter OpenAI API key:"
api_key = getpass(api_key_prompt)

Enter OpenAI API key:··········


# 9. Khởi tạo Agent

In [15]:
from haystack.agents import Agent
from haystack.nodes import PromptNode

prompt_node = PromptNode(model_name_or_path="text-davinci-003", api_key=api_key, stop_words=["Observation:"])
agent = Agent(prompt_node=prompt_node)



# 10. Dùng pipeline làm công cụ cho Agent

In [25]:
from haystack.agents import Tool

search_tool = Tool(
    name="Witcher_QA",
    pipeline_or_node=witcher_qa,
    description="useful for when you need to answer questions related to the Witcher.",
    output_variable="answers",
)
agent.add_tool(search_tool)



# 11. Thử nghiệm pipeline QA sử dụng Agent

In [36]:
result = agent.run("What is the hair color of Geralt's adoptive daughter?")

print(result["transcript"].split("---")[0])


Agent zero-shot-react started with {'query': "What is the hair color of Geralt's adoptive daughter?", 'params': None}
[32m find[0m[32m out[0m[32m who[0m[32m Geral[0m[32mt[0m[32m's[0m[32m adoptive[0m[32m daughter[0m[32m is[0m[32m.[0m[32m [0m[32m
[0m[32mTool[0m[32m:[0m[32m Witcher[0m[32m_[0m[32mQ[0m[32mA[0m[32m [0m[32m
[0m[32mTool[0m[32m Input[0m[32m:[0m[32m Who[0m[32m is[0m[32m Geral[0m[32mt[0m[32m's[0m[32m adoptive[0m[32m daughter[0m[32m?[0m[32m
[0m

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Observation: [33mCiri[0m
Thought: [32m Now[0m[32m that[0m[32m I[0m[32m know[0m[32m who[0m[32m she[0m[32m is[0m[32m,[0m[32m let[0m[32m's[0m[32m try[0m[32m to[0m[32m find[0m[32m out[0m[32m her[0m[32m hair[0m[32m color[0m[32m.[0m[32m [0m[32m
[0m[32mTool[0m[32m:[0m[32m Witcher[0m[32m_[0m[32mQ[0m[32mA[0m[32m [0m[32m
[0m[32mTool[0m[32m Input[0m[32m:[0m[32m What[0m[32m is[0m[32m C[0m[32miri[0m[32m's[0m[32m hair[0m[32m color[0m[32m?[0m[32m
[0m[32m
[0m

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Observation: [33mashen[0m
Thought: [32m Yes[0m[32m,[0m[32m that[0m[32m is[0m[32m the[0m[32m correct[0m[32m answer[0m[32m.[0m[32m [0m[32m
[0m[32mFinal[0m[32m Answer[0m[32m:[0m[32m as[0m[32mhen[0m find out who Geralt's adoptive daughter is. 
Tool: Witcher_QA 
Tool Input: Who is Geralt's adoptive daughter?

Observation: Ciri
Thought: Now that I know who she is, let's try to find out her hair color. 
Tool: Witcher_QA 
Tool Input: What is Ciri's hair color?


Observation: ashen
Thought: Yes, that is the correct answer. 
Final Answer: ashen
