## Load the Haystack pipeline from Amazon S3 and run inferences - Part 2
---

In this notebook, we will we load the haystack pipeline that is stored in S3 as a `yml` file. Once the haystack pipeline is loaded, we will run a series of questions against the pipeline and measure different metrics, such as latency, accuracy metrics, etc. 

In [None]:
# import libraries
import os
import json
import boto3
import pickle
import logging
import tempfile
import pandas as pd
import globals as g
from typing import List

In [None]:
!pip install -Uq haystack

In [None]:
# Set a logger
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

Now, we will load the pipeline from the s3 bucket where it was saved as a `yml` file, and then see the contents of the pipeline. We will then run a series of inference requests against the pipeline and measure latency and accuracy metrics

In [None]:
# Initialize the S3 client
s3_client = boto3.client('s3')

# Create temporary file and download pipeline
with tempfile.NamedTemporaryFile(mode='wb', suffix='.yml', delete=False) as tmp_file:
    # Download from the same location where we uploaded
    s3_client.download_fileobj(
        g.HAYSTACK_PIPELINE_BUCKET, 
        g.HAYSTACK_PIPELINE_KEY, 
        tmp_file
    )
    logger.info(f"Downloaded the haystack pipeline from {g.HAYSTACK_PIPELINE_BUCKET}/{g.HAYSTACK_PIPELINE_KEY} to {tmp_file.name}")
    tmp_file_path = tmp_file.name

In [None]:
# Download and load document store
doc_store_key = "pipelines/document_store/documents.json"
with tempfile.NamedTemporaryFile(mode='wb', suffix='.pkl', delete=False) as tmp_file:
    s3_client.download_fileobj(
        g.HAYSTACK_PIPELINE_BUCKET, 
        doc_store_key, 
        tmp_file
    )
    print(f"Downloaded the document store from {g.HAYSTACK_PIPELINE_BUCKET}/{doc_store_key} to {tmp_file.name}")
    docstore_tmp_path = tmp_file.name

In [None]:
from haystack import Pipeline
# Now we will load the pipeline from the temporary file path
with open(tmp_file_path, 'r') as file:
    loaded_pipeline = Pipeline.load(file)
    print(f"Loaded the haystack pipeline from {tmp_file_path}")

In [None]:
# Now we will clean up the temporary file path and then see the contents of the pipeline
# that we had saved to s3
os.remove(tmp_file_path)
logger.info("Loaded Pipeline Structure:")
loaded_pipeline.show()

In [None]:
# load the documents stored earlier and get those documents in the doc store
import json
from haystack_integrations.document_stores.chroma import ChromaDocumentStore

# Load the documents from the JSON file
with open(docstore_tmp_path, 'r') as f:
    documents_dicts = json.load(f)

from haystack import Document
import numpy as np

# Convert embeddings back to NumPy arrays
for doc_dict in documents_dicts:
    if 'embedding' in doc_dict and isinstance(doc_dict['embedding'], list):
        doc_dict['embedding'] = np.array(doc_dict['embedding'])

# Reconstruct Document objects
documents = [Document.from_dict(doc_dict) for doc_dict in documents_dicts]
document_store = ChromaDocumentStore()
# Write the documents to the document store
document_store.write_documents(documents)


In [None]:
# Integrate the document store into your pipeline
retriever = loaded_pipeline.get_component('retriever')
retriever.document_store = document_store

### Run inferences against the haystack pipeline using Amazon Bedrock
---

Now that we have loaded the pipeline from `s3`, we can run some inferences against this RAG pipeline. As we run inferences, we will measure latency and semantic similarity using the `SASEvaluator` module from haystack.

In [None]:
# Load the dataset that we processed in the first notebook
df = pd.read_csv(os.path.join(g.DATA_DIR, g.PUBMED_QA_CSV_FNAME)) 

# We will select the first 10 rows of the dataset for testing
test_data = df.head(20)
test_data

In [None]:
question: str = "What is a neurodegenerative disease? Give me examples"
document_store = loaded_pipeline.get_component('retriever')
# Directly query the document store to simulate what the BM25 retriever would retrieve
retrieved_docs = document_store.run(question)  # Use the question as the query

# Print the retrieved documents' content or metadata
print(f"Retrieved Documents: {retrieved_docs}")

# Now proceed with the full pipeline run to generate the final response
response = loaded_pipeline.run({
    "retriever": {"query": question},
    "prompt_builder": {"question": question}
})
print("LLM Response:")
print(response["llm"]["replies"][0])


### Measure the `SASEvaluation` score

Now, we will use Haystack's `SAS evaluator` to generate answers to the first 20 questions of the dataset. `SASEvaluator` will evaluate the answer predicted my the pipeline that we have loaded and then compare the semantic similarity to the ground truth provided in the dataset.

In [None]:
from haystack.components.evaluators import SASEvaluator

# Extract the questions and ground truth answers
test_queries = test_data['instruction'].tolist()
ground_truth_answers = test_data['response'].tolist()

# Initialize the SAS evaluator
sas_evaluator = SASEvaluator()
sas_evaluator.warm_up()

# Running the pipeline to get predictions
results = []
for question in test_queries:
    # Run the pipeline to get the predicted answer
    response = loaded_pipeline.run({
        "retriever": {"query": question},
        "prompt_builder": {"question": question}
    })
    predicted_answer = response["llm"]["replies"][0]
    results.append(predicted_answer)

# Evaluate the results using SAS evaluator
sas_evaluation = sas_evaluator.run(ground_truth_answers=ground_truth_answers, predicted_answers=results)

# Print the evaluation results
print("Semantic Answer Similarity Evaluation:")
print("Individual scores:", sas_evaluation["individual_scores"])
print("Overall mean SAS score:", sas_evaluation["score"])

# Optionally, print detailed results
print("\nDetailed Results:")
for i, (query, ground_truth, predicted) in enumerate(zip(test_queries, ground_truth_answers, results)):
    print(f"\nQuery {i+1}: {query}")
    print(f"Ground Truth: {ground_truth}")
    print(f"Predicted: {predicted}")
    print(f"SAS Score: {sas_evaluation['individual_scores'][i]}")
