# Lecture 10-3

# Messing around with LLMs with Haystack

Step 0: Get OpenAI API Key

To get an OpenAI API key, you need to sign up for an OpenAI account and subscribe to their API service. 

- Go to the OpenAI developer website. https://platform.openai.com/
- Click on "Sign Up" to create a new account, or "Log In" if you already have an account.
- Click on the Settings Gear. 
- Navigate to "API Keys" 
- You may need to set up a billing plan. If you stick with gpt 3.5, it is cheap. You can buy $5 of credits, which will be more than enough for experimentation.
- Go to the API keys section of your OpenAI account.
- Click on "Create new secret key" to generate a new API key.
- Copy the generated API key.
- Create a `.env` file in your project directory.
- In the `.env` file, enter:

```
OPENAI_API_KEY=text_of_super_secret_api_key
```

- You won't be able to see the api key again once you close the dialog. If you don't store it and lose it, you'll need to create a new API key.  

Step 1: Create a New Virtual Environment

- Open your terminal
- Create a new virtual environment and activate it:

```
python -m venv haystackai
haystackai\Scripts\activate
```

just remember to deactivate your virtual environment at the end

```
deactivate
```


See: https://haystack.deepset.ai/tutorials/27_first_rag_pipeline

Once you've activated the haystack environment, install Jupyter, Haystack, and other necessary packages in your Conda environment:

```
pip install openai python-dotenv
pip install torch
pip install haystack-ai
pip install "datasets>=2.6.1"
pip install "sentence-transformers>=2.2.0"
```

Open the notebook

In [1]:
# cell 1: let's haystack know you are running a tutorial
from haystack.telemetry import tutorial_running

tutorial_running(27)


ModuleNotFoundError: No module named 'haystack'

In [None]:
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()


In [None]:
from datasets import load_dataset
from haystack import Document

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]


In [None]:
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()


In [None]:
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])


In [None]:
from haystack.components.embedders import SentenceTransformersTextEmbedder

text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")


In [None]:
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

retriever = InMemoryEmbeddingRetriever(document_store)


In [None]:
from haystack.components.builders import PromptBuilder

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
"""

prompt_builder = PromptBuilder(template=template)


In [None]:
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

from getpass import getpass
from haystack.components.generators import OpenAIGenerator

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
generator = OpenAIGenerator(model="gpt-3.5-turbo")


In [None]:
from haystack import Pipeline

basic_rag_pipeline = Pipeline()
# Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", generator)

# Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")


In [None]:
question = "What does Rhodes Statue look like?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(response["llm"]["replies"][0])


In [None]:
question = "When did construction for the Rhodes statue begin?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(response["llm"]["replies"][0])


In [None]:
examples = [
    "Where is Gardens of Babylon?",
    "Why did people build Great Pyramid of Giza?",
    "What is UCLA?"
]


In [None]:
for question in examples:
    print(question)
    response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})
    print(response["llm"]["replies"][0])
