# Advent of Haystack: Day 2

In this challenge, your mission is to help a couple of fictional elves in the film "A Very Weaviate Christmas".
1. Find out what's happening in the film "A Very Weaviate Christmas"
2. This will lead you to a clue that will let you discover which Weaviate Collection to peak into.
3. While submitting the challenge, tell us what you find there!


### Components to use:
1. [`OpenAITextEmbedder`](https://docs.haystack.deepset.ai/docs/openaitextembedder)
2. [`OpenAIGenerator`](https://docs.haystack.deepset.ai/docs/openaigenerator)
3. [`PromptBuilder`](https://docs.haystack.deepset.ai/docs/promptbuilder)
4. [`WeaviateDocumentStore`](https://docs.haystack.deepset.ai/docs/weaviatedocumentstore)
5. [`WeaviateEmbeddingRetriever`](https://docs.haystack.deepset.ai/reference/integrations-weaviate#weaviateembeddingretriever)


üéÑ **Your task is to complete steps 3 and 4**. But make sure you run the code cells before. You should know what each prior step is doing.

## 1) Setup and Installation

In [None]:
!pip install haystack-ai weaviate-haystack
!pip install -q --upgrade openai # not to get the OpenAI proxies error: https://community.openai.com/t/error-with-openai-1-56-0-client-init-got-an-unexpected-keyword-argument-proxies/1040332/2

To get started, first provide your API keys below. We're providing you with a read-only API Key for Weaviate.

For this challenge, we've prepared a Weaviate Collection for you which contains lots of movies and their overviews.

In [4]:
%load_ext dotenv
%dotenv

import os
from getpass import getpass

os.environ["WEAVIATE_API_KEY"] = "b3jhGwa4NkLGjaq3v1V1vh1pTrlKjePZSt91"

huggingface_api_key = os.getenv("HUGGINGFACE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")
#if "OPENAI_API_KEY" not in os.environ:
#os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


## 2) Weaviate Setup

Next, you can connect to the right `WeaviateDocumentStore` (we've already added the right code for you below with the client URL in place).

In this document store, there are many movies, their titles and ther overviews.

In [3]:
from haystack_integrations.document_stores.weaviate import WeaviateDocumentStore, AuthApiKey
from haystack import Document
import os


auth_client_secret = AuthApiKey()

document_store = WeaviateDocumentStore(url="https://zgvjwlycsr6p5j1ziuyea.c0.europe-west3.gcp.weaviate.cloud",
                                       auth_client_secret=auth_client_secret)

  from .autonotebook import tqdm as notebook_tqdm


## 3) The RAG Pipeline

Now, you're on your own. Complete the code blocks below.

First, create a RAG pipeline that can answer questions based on the overviews of the movies in your `document_store`.

‚≠êÔ∏è You should then be able to run the pipeline and answer the questions "What happens in the film 'A Very Weaviate Christmas'?"

**üíö Hint 1:** The embedding model that was used to populate the vectors was `text-embedding-3-small` by OpenAI.

**üíô Hint 2:** We've added an import to the OpenAIGenerator but feel free to use something else!

In [5]:
from haystack import Pipeline
from haystack.components.embedders import OpenAITextEmbedder
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.components.builders import PromptBuilder
from haystack_integrations.components.retrievers.weaviate import WeaviateEmbeddingRetriever
from haystack.utils import Secret

text_embedder = OpenAITextEmbedder(api_key=Secret.from_token(openai_api_key), model="text-embedding-3-small")
template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
"""
prompt_builder = PromptBuilder(template = template)
generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
                                    api_params={"model": "mistralai/Mistral-Nemo-Instruct-2407"},#"Qwen/QwQ-32B-Preview"},
                                    token=Secret.from_token(huggingface_api_key))
embedding_retriever = WeaviateEmbeddingRetriever(document_store=document_store)

rag = Pipeline()
rag.add_component("text_embedder", text_embedder)
rag.add_component("retriever", embedding_retriever)
rag.add_component("prompt_builder", prompt_builder)
rag.add_component("llm", generator)

In [8]:
rag.connect("text_embedder.embedding", "retriever.query_embedding")
rag.connect("retriever", "prompt_builder")
rag.connect("prompt_builder.prompt", "llm.prompt")

<haystack.core.pipeline.pipeline.Pipeline object at 0x17d401160>
üöÖ Components
  - text_embedder: OpenAITextEmbedder
  - retriever: WeaviateEmbeddingRetriever
  - prompt_builder: PromptBuilder
  - llm: HuggingFaceAPIGenerator
üõ§Ô∏è Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.prompt (str)

In [10]:
query = "What happens in the film 'A Very Weaviate Christmas'?"
reply = rag.run({"text_embedder": {"text": query}, "prompt_builder": {"question": query}})

print(reply["llm"]["replies"][0])

 In 'A Very Weaviate Christmas', two of Santa's elves, Daniel and Philip, are on a mission to recover stolen vectors hidden in an unknown Collection and return them to 'Santas_Grotto' before Christmas Day.


## 4) Solve the Mystery

By this point, you should know what's happening.. There is a Collection where everything has been hidden.

Complete the code cell below by providing the right Collection name, and tell us the following:

1. Who is the culprit? Watch out, because there may be `decoys`.
2. What have they stolen?

**üíö Hint:** Once you've connected to the right collection, take a look at all the Objects in there. Then, you may be able to use filters to avoid the decoys!

- [Weaviate Documentation: Read all Objects](https://weaviate.io/developers/weaviate/manage-data/read-all-objects)
- [Weaviate Documentation: Filters](https://weaviate.io/developers/weaviate/search/filters)

In [11]:
import weaviate

from weaviate.classes.init import Auth

headers = {"X-OpenAI-Api-Key": openai_api_key}
client = weaviate.connect_to_weaviate_cloud(cluster_url="https://zgvjwlycsr6p5j1ziuyea.c0.europe-west3.gcp.weaviate.cloud",
                                            auth_credentials=Auth.api_key(os.getenv("WEAVIATE_API_KEY")),
                                            headers=headers)

In [14]:
schema = client.collections.list_all()

In [15]:
schema

{'Default': _CollectionConfigSimple(name='Default', description=None, generative_config=None, properties=[_Property(name='_original_id', description=None, data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, vectorizer='none'), _Property(name='content', description=None, data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, vectorizer='none'), _Property(name='dataframe', description=None, data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, vectorizer='none'), _Property(name='blob_data', description=None, data_type=<DataType.BLOB: 'blob'>, index_filterable=Tru

In [16]:
# Provide the name of the collection in client.collections.get() below üëá
plot = client.collections.get("Santas_Grotto")

In [20]:
for item in plot.iterator(
    include_vector=True  # If using named vectors, you can specify ones to include e.g. ['title', 'body'], or True to include all
):
    print(item.properties)

{'plot': 'Tuana is here with not just all the vectors but also all the presents that are supposed to be delivered around the World!', 'decoy': False}
{'plot': "Sebastian is here, but he seems unsure what's going on", 'decoy': True}
{'plot': "JP is here, looks like he's feasting on cookies", 'decoy': True}


In [19]:
from weaviate.classes.query import Filter

filtered_response = plot.query.fetch_objects(
    filters=Filter.by_property("decoy").equal(False)
)

for o in filtered_response.objects:
    print(o.properties)

{'plot': 'Tuana is here with not just all the vectors but also all the presents that are supposed to be delivered around the World!', 'decoy': False}
