### Retrieval Augmented Generation using Haystack and Aana SDK

This notebook demonstrates how to use Haystack and Aana SDK to build an application to answer user's quries about videos with Retrieval Augmented Generation (RAG).

The application works as follows:
- Whisper model is used to transcribe the video.
- The transcribed text is split into chunks and an embedding is generated for each chunk.
- Chunks and their embeddings are indexed in a datastore.
- When a user asks a question, the question is used to retrieve relevant chunks from the datastore.
- The retrieved chunks are used to generate a prompt.
- The prompt is passed to an LLM to generate an answer.

In [1]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Create Aana SDK and connect to the cluster.

In [2]:
from aana.sdk import AanaSDK

aana_app = AanaSDK().connect()

  from .autonotebook import tqdm as notebook_tqdm
2024-06-26 10:02:05,158	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.

2024-06-26 10:02:10,748	INFO worker.py:1740 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m


We will need to deploy a Whisper model and an LLM model. We will use predefined `WhisperDeployment` and `HfTextGenerationDeployment` classes to deploy these models.

In [3]:
from aana.deployments.whisper_deployment import (
    WhisperComputeType,
    WhisperConfig,
    WhisperDeployment,
    WhisperModelSize,
)

asr_deployment = WhisperDeployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.25},
    user_config=WhisperConfig(
        model_size=WhisperModelSize.MEDIUM,
        compute_type=WhisperComputeType.FLOAT16,
    ).model_dump(mode="json"),
)

aana_app.register_deployment(
    name="asr_deployment",
    instance=asr_deployment,
    deploy=True,
)

  torchaudio.set_audio_backend("soundfile")
The new client HTTP config differs from the existing one in the following fields: ['location']. The new HTTP config is ignored.
2024-06-26 10:02:25,266	INFO handle.py:126 -- Created DeploymentHandle '6283397g' for Deployment(name='WhisperDeployment', app='asr_deployment').
2024-06-26 10:02:25,268	INFO handle.py:126 -- Created DeploymentHandle '52rx2zc8' for Deployment(name='WhisperDeployment', app='asr_deployment').
2024-06-26 10:02:40,422	INFO handle.py:126 -- Created DeploymentHandle '00dcwn9z' for Deployment(name='WhisperDeployment', app='asr_deployment').
2024-06-26 10:02:40,425	INFO api.py:584 -- Deployed app 'asr_deployment' successfully.


In [4]:
from aana.deployments.hf_text_generation_deployment import (
    HfTextGenerationConfig,
    HfTextGenerationDeployment,
)

hf_text_generation_deployment = HfTextGenerationDeployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.5},
    user_config=HfTextGenerationConfig(
        model_id="microsoft/Phi-3-mini-4k-instruct",
        model_kwargs={
            "trust_remote_code": True,
        },
    ).model_dump(mode="json"),
)
aana_app.register_deployment(
    name="llm_deployment",
    instance=hf_text_generation_deployment,
    deploy=True,
)

The new client HTTP config differs from the existing one in the following fields: ['location']. The new HTTP config is ignored.
2024-06-26 10:02:40,509	INFO handle.py:126 -- Created DeploymentHandle 'tp3nze3w' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').
2024-06-26 10:02:40,511	INFO handle.py:126 -- Created DeploymentHandle 'hsfoe5jo' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').


2024-06-26 10:03:09,807	INFO handle.py:126 -- Created DeploymentHandle '2f015cnh' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').
2024-06-26 10:03:09,808	INFO api.py:584 -- Deployed app 'llm_deployment' successfully.


Aana SDK provides a deployment class for Haystack components, `HaystackComponentDeployment`. `HaystackComponentDeployment` is a class that allows to deploy Haystack components as a separate deployment. This is quite useful for deploying components that represent deep learning models. This has a few advantages:
- It allows to deploy the model only once and reuse it from multiple Haystack Pipelines. This leads to more efficient resource usage like GPU memory.
- It allows you to scale Haystack Pipelines to a cluster of machines with minimal effort. 

We will deploy text embedder and document embdedder that we will be using to build Haystack pipelines.

In [5]:
from aana.deployments.haystack_component_deployment import (
    HaystackComponentDeployment,
    HaystackComponentDeploymentConfig,
)

text_embedder_deployment = HaystackComponentDeployment.options(
    num_replicas=1,
    ray_actor_options={"num_gpus": 0.1},
    user_config=HaystackComponentDeploymentConfig(
        component="haystack.components.embedders.SentenceTransformersTextEmbedder",
        params={"model": "sentence-transformers/all-mpnet-base-v2"},
    ).model_dump(),
)
aana_app.register_deployment(
    name="text_embedder_deployment",
    instance=text_embedder_deployment,
    deploy=True,
)

The new client HTTP config differs from the existing one in the following fields: ['location']. The new HTTP config is ignored.
2024-06-26 10:03:10,040	INFO handle.py:126 -- Created DeploymentHandle '0e3kkek3' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').
2024-06-26 10:03:10,041	INFO handle.py:126 -- Created DeploymentHandle 'v853vgjg' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').
2024-06-26 10:03:21,107	INFO handle.py:126 -- Created DeploymentHandle '92o499mo' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').
2024-06-26 10:03:21,108	INFO api.py:584 -- Deployed app 'text_embedder_deployment' successfully.


In [6]:
document_embedder_deployment = HaystackComponentDeployment.options(
    num_replicas=1,
    max_concurrent_queries=1000,
    ray_actor_options={"num_gpus": 0.1},
    user_config=HaystackComponentDeploymentConfig(
        component="haystack.components.embedders.SentenceTransformersDocumentEmbedder",
        params={"model": "sentence-transformers/all-mpnet-base-v2"},
    ).model_dump(),
)
aana_app.register_deployment(
    name="document_embedder_deployment",
    instance=document_embedder_deployment,
    deploy=True,
)

The new client HTTP config differs from the existing one in the following fields: ['location']. The new HTTP config is ignored.
2024-06-26 10:03:21,131	INFO handle.py:126 -- Created DeploymentHandle '04jrkkcw' for Deployment(name='HaystackComponentDeployment', app='document_embedder_deployment').
2024-06-26 10:03:21,131	INFO handle.py:126 -- Created DeploymentHandle '2m5s1s16' for Deployment(name='HaystackComponentDeployment', app='document_embedder_deployment').
2024-06-26 10:03:33,231	INFO handle.py:126 -- Created DeploymentHandle 'tqe6usks' for Deployment(name='HaystackComponentDeployment', app='document_embedder_deployment').
2024-06-26 10:03:33,232	INFO api.py:584 -- Deployed app 'document_embedder_deployment' successfully.


Now that we have deployed all the necessary models, we can build two Haystack pipelines:
- Indexing pipeline: This pipeline will be used to split the transcribed text into chunks and index them in a datastore. The transcription step will be done outside the pipeline but if you really want it to be a part of the pipeline, you can create custom Haystack components for it.
- Query pipeline: This pipeline will be used to retrieve relevant chunks from the datastore and generate an answer using the LLM model.

First, we need to transcribe the video. We will download the video and extract the audio from it.

In [7]:
from aana.core.models.video import VideoInput

video_input = VideoInput(url="https://www.youtube.com/watch?v=UQuIVsNzqDk")

In [8]:
from aana.integrations.external.yt_dlp import download_video
from aana.processors.video import extract_audio

video = download_video(video_input=video_input)
audio = extract_audio(video=video)

[youtube] Extracting URL: https://www.youtube.com/watch?v=UQuIVsNzqDk
[youtube] UQuIVsNzqDk: Downloading webpage
[youtube] UQuIVsNzqDk: Downloading ios player API JSON
[youtube] UQuIVsNzqDk: Downloading android player API JSON


We already deployed the Whisper model for ASR. Now we need to create a handle that we can use to interact with the model.

In [9]:
from aana.deployments.aana_deployment_handle import AanaDeploymentHandle

asr_handle = await AanaDeploymentHandle.create("asr_deployment")

2024-06-26 10:03:42,922	INFO handle.py:126 -- Created DeploymentHandle '5at1j4b1' for Deployment(name='WhisperDeployment', app='asr_deployment').
2024-06-26 10:03:42,937	INFO pow_2_scheduler.py:260 -- Got updated replicas for Deployment(name='WhisperDeployment', app='asr_deployment'): {'9ueivx35'}.


Send the audio to the Whisper model to transcribe it.

In [10]:
transcription_result = await asr_handle.transcribe(audio=audio)
transcription = transcription_result["transcription"].text
transcription

2024-06-26 10:03:42,966	INFO handle.py:126 -- Created DeploymentHandle 'ubtapa51' for Deployment(name='WhisperDeployment', app='asr_deployment').


" Do you ever feel like visual effects in old movies were better? What if I told you that wasn't just nostalgia speaking? Back in the 1960s, Disney invented a technology that was in many ways superior to the green screen. But that tech has long since been forgotten. And what if I told you that we found a way to recreate it? Being able to layer one moving image over another is the fundamental building block of visual effects. Every single crazy effect shot from every movie you love relies on this basic core technique. And the primary way we do that is with green screen. Or blue screen. But there are lots of problems with green screen. Even in this modern era, you can't film blurry or transparent things. You can't wear clothes that are the same color as the screen. And the spill of the color oftentimes ruins footage. If I wanted to make a movie about a clown wearing all the colors of the rainbow getting married on Mars, I can't. And that bothers me. If I could get my hands on an inventio

Now, let's create a Haystack pipeline to index the transcribed text. 

We will use Qdrant as the datastore to store the chunks and their embeddings. You need to set up Qdrant before running this notebook. You can find the instructions to install Qdrant [here](https://qdrant.tech/documentation/guides/installation/). Alternatively, you can use [Qdrant Cloud](https://cloud.qdrant.io) or use the following one-liner to run Qdrant locally (not recommended for production use, only for testing purposes):

```bash
curl -L https://github.com/qdrant/qdrant/releases/download/v1.9.7/qdrant-x86_64-unknown-linux-gnu.tar.gz | tar xz && ./qdrant
```

First step of creating a Haystack pipeline is to define the components that will be used in the pipeline. We will use the following components:
- `DocumentCleaner`: This component is used to clean the text before splitting it into chunks.
- `DocumentSplitter`: This component is used to split the text into chunks.
- `DocumentEmbedder`: This component is used to generate embeddings for the chunks. We will use the text embedder that we deployed earlier. For that we need to use `RemoteHaystackComponent` and pass the name of the deployment that we created earlier. Make sure to call `warm_up()` on the component before building the pipeline to initialize the component.
- `DocumentWriter`: This component is used to write the chunks and their embeddings to the datastore.

In [11]:
from haystack import Document, Pipeline
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore

from aana.deployments.haystack_component_deployment import RemoteHaystackComponent

cleaner = DocumentCleaner()
splitter = DocumentSplitter(split_by="sentence", split_length=1)
document_store = QdrantDocumentStore(
    url="http://localhost:6333", index="video_transcriptions"
)
document_embedder = RemoteHaystackComponent("document_embedder_deployment")
writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)

document_embedder.warm_up()

2024-06-26 10:04:37,883	INFO handle.py:126 -- Created DeploymentHandle '9w7e9514' for Deployment(name='HaystackComponentDeployment', app='document_embedder_deployment').
2024-06-26 10:04:37,932	INFO handle.py:126 -- Created DeploymentHandle 'y8pedui7' for Deployment(name='HaystackComponentDeployment', app='document_embedder_deployment').


2024-06-26 10:04:37,909	INFO pow_2_scheduler.py:260 -- Got updated replicas for Deployment(name='HaystackComponentDeployment', app='document_embedder_deployment'): {'01s99fez'}.
2024-06-26 10:04:39,593	INFO pow_2_scheduler.py:260 -- Got updated replicas for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment'): {'v5912u0o'}.
2024-06-26 10:04:40,083	INFO pow_2_scheduler.py:260 -- Got updated replicas for Deployment(name='HfTextGenerationDeployment', app='llm_deployment'): {'tnji8e4l'}.


Now we can create a pipeline using these components.

In [12]:
indexing_pipeline = Pipeline()

indexing_pipeline.add_component("cleaner", cleaner)
indexing_pipeline.add_component("splitter", splitter)
indexing_pipeline.add_component("document_embedder", document_embedder)
indexing_pipeline.add_component("writer", writer)

indexing_pipeline.connect("cleaner.documents", "splitter.documents")
indexing_pipeline.connect("splitter.documents", "document_embedder.documents")
indexing_pipeline.connect("document_embedder.documents", "writer.documents")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7fb31863b880>
🚅 Components
  - cleaner: DocumentCleaner
  - splitter: DocumentSplitter
  - document_embedder: RemoteHaystackComponent
  - writer: DocumentWriter
🛤️ Connections
  - cleaner.documents -> splitter.documents (List[Document])
  - splitter.documents -> document_embedder.documents (List[Document])
  - document_embedder.documents -> writer.documents (List[Document])

Let's run the pipeline to index the transcribed text we got from the video.

In [13]:
transcription_doc = Document(content=transcription)
result = indexing_pipeline.run({"cleaner": {"documents": [transcription_doc]}})
result

2024-06-26 10:04:38,117	INFO handle.py:126 -- Created DeploymentHandle '10r9q4i8' for Deployment(name='HaystackComponentDeployment', app='document_embedder_deployment').


300it [00:00, 395.35it/s]                         


{'writer': {'documents_written': 253}}

The result of the indexing pipeline should tell you that there are some documents added to the datastore. That means the indexing was successful.

Now we can create a query pipeline to answer user's questions.

We will use the following components in the query pipeline:
- `TextEmbedder`: This component is used to generate embeddings for the question. We will use the text embedder that we deployed earlier. For that we need to use `RemoteHaystackComponent` and pass the name of the deployment that we created earlier. Make sure to call `warm_up()` on the component before building the pipeline to initialize the component.
- `QdrantEmbeddingRetriever`: This component is used to retrieve relevant chunks from the datastore given embeddings.
- `PromptBuilder`: This component is used to generate a prompt from the retrieved documents based on the provided template.

In [14]:
from haystack.components.builders.prompt_builder import PromptBuilder

prompt_template = """
Given these documents, answer the question.
Documents:
{% for doc in documents %}
    {{ doc.content }}
{% endfor %}
Question: {{question}}
Answer:
"""

text_embedder = RemoteHaystackComponent("text_embedder_deployment")
retriever = QdrantEmbeddingRetriever(document_store=document_store)
prompt_builder = PromptBuilder(template=prompt_template)

text_embedder.warm_up()

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", text_embedder)
query_pipeline.add_component("retriever", retriever)
query_pipeline.add_component("prompt_builder", prompt_builder)

query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")

2024-06-26 10:04:39,583	INFO handle.py:126 -- Created DeploymentHandle 'p84nhckr' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').


2024-06-26 10:04:39,653	INFO handle.py:126 -- Created DeploymentHandle '8myuo6ey' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').


<haystack.core.pipeline.pipeline.Pipeline object at 0x7fb30d243460>
🚅 Components
  - text_embedder: RemoteHaystackComponent
  - retriever: QdrantEmbeddingRetriever
  - prompt_builder: PromptBuilder
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])

Notice that we didn't use the LLM model directly in the pipeline. Instead, we used `PromptBuilder` to generate a prompt and then we will use the LLM model to generate an answer based on the prompt. This is not the only way to do it and we will show you how to use the LLM model directly in the pipeline later.

Now let's run the query pipeline to get the prompt.

In [15]:
question = "What is a sodium vapour process?"

result = query_pipeline.run(
    {"text_embedder": {"text": question}, "prompt_builder": {"question": question}}
)
prompt = result["prompt_builder"]["prompt"]
print(prompt)

2024-06-26 10:04:39,683	INFO handle.py:126 -- Created DeploymentHandle 'epha0907' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').



Given these documents, answer the question.
Documents:

     The sodium vapor process.

     So sodium vapor is another one of those essential steps in this progress towards having perfect transparency for compositing and visual effects then.

     So we all know that sodium vapor should give scientifically better results.

     All right, so it's time to try the sodium vapor process.

     If the sodium vapor process is superior, it won't have any of these issues.

     Sodium vapor mats.

     This will be the first test of the sodium vapor process in over 30 years.

     See that tiny little blip of sodium vapor? Yeah, science.

     And hopefully the sodium vapor process lets us do something.

     So the magic of the sodium vapor process is they used a beam splitter prism so that the light that comes through the lens gets split onto two strips of film at the same time.

Question: What is a sodium vapour process?
Answer:


We have a prompt now. To send the prompt to the LLM model, we need to create a handle that we can use to interact with the model.

In [16]:
llm_handle = await AanaDeploymentHandle.create("llm_deployment")

2024-06-26 10:04:40,068	INFO handle.py:126 -- Created DeploymentHandle '3omfzzor' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').


The LLM deployment expects `ChatDialog` as an input. We can use the `ChatDialog.from_prompt` method to create a `ChatDialog` object from the prompt and then send it to the LLM model.

In [17]:
from aana.core.models.chat import ChatDialog

await llm_handle.chat(dialog=ChatDialog.from_prompt(prompt))

2024-06-26 10:04:40,124	INFO handle.py:126 -- Created DeploymentHandle '3qg8qsuh' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').


{'message': ChatMessage(content='The sodium vapor process is a technique used in compositing and visual effects that involves using sodium vapor to achieve scientifically better results in creating transparent images. It utilizes a beam splitter prism to split light that comes through a lens onto two strips of film simultaneously, allowing for precise and accurate compositing. This process has been considered superior to other methods and is being revisited after a long period of not being used, with the hope that it can contribute to advancements in the field. The sodium vapor process is a photographic technique that was historically used in the film industry to create seamless composites and visual effects. It involves the use of sodium vapor lamps to illuminate a scene, which then emits a distinct yellow light. This light is captured on film, which can be used to isolate and manipulate elements within a scene with high precision. The process is known for its ability to produce clear

We got the answer from the LLM model. But we got in as a single response. We can use `chat_stream` method to stream the tokens from the LLM to get the answer in a more interactive way.

In [18]:
async for chunk in llm_handle.chat_stream(dialog=ChatDialog.from_prompt(prompt)):
    print(chunk["text"], end="")

2024-06-26 10:04:53,589	INFO handle.py:126 -- Created DeploymentHandle 'qenercv9' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').


The sodium vapor process is a technique used in compositing and visual effects that involves using sodium vapor to achieve scientifically better results in creating transparent images. It utilizes a beam splitter prism to split light that comes through a lens onto two strips of film simultaneously, allowing for precise and accurate compositing. This process has been considered superior to other methods and is being revisited after a long period of not being used, with the hope that it can contribute to advancements in the field. The sodium vapor process is a photographic technique that was historically used in the film industry to create seamless composites and visual effects. It involves the use of sodium vapor lamps to illuminate a scene, which then emits a distinct yellow light. This light is captured on film, which can be used to isolate and manipulate elements within a scene with high precision. The process is known for its ability to produce clear and accurate results, which is w

This is it! Now we have two pipelines: one for indexing the transcribed text and one for answering user's questions. We used the Whisper model for ASR, the text embedder for generating embeddings, and the LLM model for generating answers. We also used Qdrant as the datastore to store the chunks and their embeddings. We used `PromptBuilder` to generate a prompt and then used the LLM model to generate an answer based on the prompt.

Now you can package these pipelines into Aana Endpoints to create an Aana Application. See [tutorial](/docs/pages/tutorial.md) for more details on how to create an Aana Application.

As I promised before, I will show you how to use the LLM model directly in the pipeline to generate an answer. For that we can use `AanaDeploymentComponent` that allows to wrap any Aana deployments into a Haystack component. We will use `AanaDeploymentComponent` to wrap the LLM deployment and use it in the pipeline.

In [19]:
from aana.integrations.haystack.deployment_component import AanaDeploymentComponent

llm_component = AanaDeploymentComponent(llm_handle, "chat")
llm_component.run(dialog=ChatDialog.from_prompt(prompt))

2024-06-26 10:05:07,224	INFO handle.py:126 -- Created DeploymentHandle 'vt8fzelm' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').


{'message': ChatMessage(content='The sodium vapor process is a technique used in compositing and visual effects that involves using sodium vapor to achieve scientifically better results in creating transparent images. It utilizes a beam splitter prism to split light that comes through a lens onto two strips of film simultaneously, allowing for precise and accurate compositing. This process has been considered superior to other methods and is being revisited after a long period of not being used, with the hope that it can contribute to advancements in the field. The sodium vapor process is a photographic technique that was historically used in the film industry to create seamless composites and visual effects. It involves the use of sodium vapor lamps to illuminate a scene, which then emits a distinct yellow light. This light is captured on film, which can be used to isolate and manipulate elements within a scene with high precision. The process is known for its ability to produce clear

That gives us a component for LLM. But the issue is that our LLM deployment expects `ChatDialog` as an input. What we can do is to create a custom component that will take the prompt and generate a `ChatDialog` object from it. See [Creating Custom Components](https://docs.haystack.deepset.ai/docs/custom-components) for more details on how to create custom components in Haystack.

In [20]:
from haystack import component


@component
class ChatDialogGenerator:
    """A component generating a chat dialog from a given prompt."""

    @component.output_types(dialog=ChatDialog, note=str)
    def run(self, prompt: str):
        """Generate a chat dialog from a given prompt."""
        dialog = ChatDialog.from_prompt(prompt)
        return {"dialog": dialog, "note": "chat dialog is generated from the prompt"}

Now we can update the query pipeline to use the LLM model directly to generate an answer.

In [21]:
text_embedder = RemoteHaystackComponent("text_embedder_deployment")
retriever = QdrantEmbeddingRetriever(document_store=document_store)
prompt_builder = PromptBuilder(template=prompt_template)
chat_dialog_generator = ChatDialogGenerator()
llm_component = AanaDeploymentComponent(llm_handle, "chat")

text_embedder.warm_up()

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", text_embedder)
query_pipeline.add_component("retriever", retriever)
query_pipeline.add_component("prompt_builder", prompt_builder)
query_pipeline.add_component("chat_dialog_generator", chat_dialog_generator)
query_pipeline.add_component("llm", llm_component)

query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder.prompt", "chat_dialog_generator.prompt")
query_pipeline.connect("chat_dialog_generator.dialog", "llm.dialog")

2024-06-26 10:05:20,456	INFO handle.py:126 -- Created DeploymentHandle 'xfvlwe9u' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').
2024-06-26 10:05:20,469	INFO handle.py:126 -- Created DeploymentHandle 'n1o4brzn' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').


<haystack.core.pipeline.pipeline.Pipeline object at 0x7fb30d242500>
🚅 Components
  - text_embedder: RemoteHaystackComponent
  - retriever: QdrantEmbeddingRetriever
  - prompt_builder: PromptBuilder
  - chat_dialog_generator: ChatDialogGenerator
  - llm: AanaDeploymentComponent
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> chat_dialog_generator.prompt (str)
  - chat_dialog_generator.dialog -> llm.dialog (ChatDialog)

Let's run the query pipeline to get the answer.

In [22]:
from aana.core.models.sampling import SamplingParams

question = "What is a sodium vapour process?"

result = query_pipeline.run(
    {
        "text_embedder": {"text": question},
        "prompt_builder": {"question": question},
        "llm": {"sampling_params": SamplingParams()},
    }
)
result["llm"]["message"]

2024-06-26 10:05:20,498	INFO handle.py:126 -- Created DeploymentHandle 'waala4g5' for Deployment(name='HaystackComponentDeployment', app='text_embedder_deployment').
2024-06-26 10:05:20,547	INFO handle.py:126 -- Created DeploymentHandle 'yrakqgl0' for Deployment(name='HfTextGenerationDeployment', app='llm_deployment').


ChatMessage(content='The sodium vapor process is a technique used in compositing and visual effects that involves using sodium vapor to achieve scientifically better results in creating transparent images. It utilizes a beam splitter prism to split light that comes through a lens onto two strips of film simultaneously, allowing for precise and accurate compositing. This process has been considered superior to other methods and is being revisited after a long period of not being used, with the hope that it can contribute to advancements in the field. The sodium vapor process is a photographic technique that was historically used in the film industry to create seamless composites and visual effects. It involves the use of sodium vapor lamps to illuminate a scene, which then emits a distinct yellow light. This light is captured on film, which can be used to isolate and manipulate elements within a scene with high precision. The process is known for its ability to produce clear and accurat

It works but Haystack Pipeline doesn't support streaming. That's why we recommend using `PromptBuilder` to generate a prompt and then use the LLM in streaming mode.