# Retrievers & retrievers.py
In order to retrieve relevant chunks of text from Weaviate, we must first vectorize our query using the same model we created our text chunk embeddings with. We then give that query vector to Weaviate. Weaviate uses the query vector to perform a cosine similarity search for the most similar text chunks.

If you've already run your indexer, the below code should work:

In [2]:
from cheat_code.common_components.vectorizers import Vectorizer
from cheat_code.common_components.wcs_client_adapter import WcsClientAdapter

vectorizer = Vectorizer()
wcs_client_adapter = WcsClientAdapter()

query = "How is Naive RAG different from Advanced RAG?"
query_vector = vectorizer.vectorize_query(query)
retrieved_chunks_list = wcs_client_adapter.retrieve(query_vector, k=2)

print(retrieved_chunks_list[0][0:1000])
print("\n====================================\n")
print(retrieved_chunks_list[1][0:1000])

as showed in Figure 3. Despite RAG method are cost-effective and surpass the performance of the native LLM, they also exhibit several limitations. The development of Advanced RAG and Modular RAG is a response to these specific shortcomings in Naive RAG. The Naive RAG research paradigm represents the earliest methodology, which gained prominence shortly after the widespread adoption of ChatGPT. The Naive RAG follows a traditional process that includes indexing, retrieval, and generation, which is also characterized as a “Retrieve-Read” framework [7]. Indexing starts with the cleaning and extraction of raw data in diverse formats like PDF, HTML, Word, and Markdown, which is then converted into a uniform plain text format. To accommodate the context limitations of language models, text is segmented into smaller, digestible chunks. Chunks are then encoded into vector representations using an embedding model and stored in vector database. This step is crucial for enabling efficient similari

## Retriever task #1: implement vectorize_query() in workshop_code/common_components/vectorizers.py
This should be easy because it's essentially a simpler version of `vectorize_text_chunks()`, which you've already completed.

In [1]:
from workshop_code.common_components.vectorizers import Vectorizer
from workshop_code.common_components.wcs_client_adapter import WcsClientAdapter

vectorizer = Vectorizer()
wcs_client_adapter = WcsClientAdapter()

query = "How is Naive RAG different from Advanced RAG?"
query_vector = vectorizer.vectorize_query(query)
retrieved_chunks_list = wcs_client_adapter.retrieve(query_vector, k=2)

print(retrieved_chunks_list[0][0:1000])
print("\n====================================\n")
print(retrieved_chunks_list[1][0:1000])

as showed in Figure 3. Despite RAG method are cost-effective and surpass the performance of the native LLM, they also exhibit several limitations. The development of Advanced RAG and Modular RAG is a response to these specific shortcomings in Naive RAG. The Naive RAG research paradigm represents the earliest methodology, which gained prominence shortly after the widespread adoption of ChatGPT. The Naive RAG follows a traditional process that includes indexing, retrieval, and generation, which is also characterized as a “Retrieve-Read” framework [7]. Indexing starts with the cleaning and extraction of raw data in diverse formats like PDF, HTML, Word, and Markdown, which is then converted into a uniform plain text format. To accommodate the context limitations of language models, text is segmented into smaller, digestible chunks. Chunks are then encoded into vector representations using an embedding model and stored in vector database. This step is crucial for enabling efficient similari

## Retriever task #2: understand `retrievers.py`
Look over the code of `retrievers.py`. If something doesn't make sense, ask one of us.