# **Haystack**

We talked about LangChain’s features, and how to utilise them to build language applications. While LangChain supports quite a lot of different use cases in NLP, we are going to talk about another open-source tool called Haystack that is used in building large-scale search systems. Information retrieval which is an area of focus for Haystack, and is also an area of overlap with LangChain. Haystack also supports prompting to achieve summarization, question-answering, translation, etc.


# **What is Haystack?**

Haystack is a versatile open-source Python framework that provides developers with a toolkit to create powerful search systems that can efficiently handle large document collections. Whether you’re building a search engine for a web application, an e-commerce platform, or a knowledge management system, Haystack makes it easy to integrate advanced search capabilities into your project.


In [None]:
%%bash

pip install haystack-ai
pip install "datasets>=2.6.1"
pip install "sentence-transformers>=2.2.0"


In [None]:
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()


In [None]:
from datasets import load_dataset
from haystack import Document

In [None]:
dataset=load_dataset("bilgeyucel/seven-wonders", split="train")

In [None]:
dataset

In [None]:
for doc in dataset:
  print(doc['content'])

In [None]:
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

In [None]:
docs

In [None]:
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()


In [None]:
docs_with_embeddings = doc_embedder.run(docs)

In [None]:
document_store.write_documents(docs_with_embeddings["documents"])

In [None]:
from haystack.components.embedders import SentenceTransformersTextEmbedder

text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

In [None]:
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

retriever = InMemoryEmbeddingRetriever(document_store)

In [None]:
from haystack.components.builders import PromptBuilder

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
"""

prompt_builder = PromptBuilder(template=template)

In [None]:
import os
from getpass import getpass
from haystack.components.generators import OpenAIGenerator

In [None]:
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

In [None]:
generator = OpenAIGenerator(model="gpt-3.5-turbo")

In [None]:
from haystack import Pipeline

basic_rag_pipeline = Pipeline()

In [None]:
# Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", generator)

In [None]:
# Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")


In [None]:
question = "What does Rhodes Statue look like?"

In [None]:
response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

In [None]:
examples = [
    "Where is Gardens of Babylon?",
    "Why did people build Great Pyramid of Giza?",
    "What does Rhodes Statue look like?",
    "Why did people visit the Temple of Artemis?",
    "What is the importance of Colossus of Rhodes?",
    "What happened to the Tomb of Mausolus?",
    "How did Colossus of Rhodes collapse?",
]

In [None]:
question="Why did people visit the Temple of Artemis?"

In [None]:
response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

In [None]:
response["llm"]["replies"][0]