# Build with Gemma and Haystack 2.x

<img src="https://huggingface.co/blog/assets/gemma/Gemma-logo-small.png" width="200" style="display:inline;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="https://haystack.deepset.ai/images/haystack-ogimage.png" width="430" style="display:inline;">



We will see what we can build with the new [Google Gemma open models](https://blog.google/technology/developers/gemma-open-models/) and the [Haystack LLM framework](https://haystack.deepset.ai/).

## Installation

In [40]:
%pip install haystack-ai google-ai-haystack wikipedia rich

Collecting rich
  Downloading rich-13.9.4-py3-none-any.whl.metadata (18 kB)
Collecting markdown-it-py>=2.2.0 (from rich)
  Using cached markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich)
  Using cached mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Downloading rich-13.9.4-py3-none-any.whl (242 kB)
Downloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Installing collected packages: mdurl, markdown-it-py, rich
Successfully installed markdown-it-py-3.0.0 mdurl-0.1.2 rich-13.9.4





## Authorization

- you need an Hugging Face account
- you need to accept Google conditions here: https://huggingface.co/google/gemma-7b-it and wait for the authorization

In [None]:
from huggingface_hub import login
login()

from dot_env import load_dotenv
load_dotenv()

## Chat with Gemma (travel assistant) 🛩

```curl \
  -H 'Content-Type: application/json' \
  -d '{"contents":[{"parts":[{"text":"Explain how AI works"}]}]}' \
  -X POST 'https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=YOUR_API_KEY'```


In [None]:
from haystack.utils import Secret
from haystack_integrations.components.generators.google_ai import GoogleAIGeminiGenerator, GoogleAIGeminiChatGenerator

from dot_env import load_dotenv
load_dotenv()

generator = GoogleAIGeminiChatGenerator(model="gemini-1.5-flash-latest")
generator

<haystack_integrations.components.generators.google_ai.chat.gemini.GoogleAIGeminiChatGenerator object at 0x0000024B02AE78F0>
Inputs:
  - messages: List[ChatMessage]
  - streaming_callback: Optional[Callable[]]
Outputs:
  - replies: List[ChatMessage]

In [34]:
res = generator.run(messages = [ChatMessage.from_user("What is the most interesting thing you know?")])
for answer in res["replies"]:
    print(answer.content)
    print("--")

As a large language model, I don't have personal interests or opinions. I'm designed to process information and respond to queries in a helpful and informative way. So, I can't really say what the "most interesting" thing I know is, because that would be subjective.

However, I can share some fascinating facts that I've learned:

* **The human brain contains more connections than there are stars in the Milky Way galaxy.** That's a lot of information processing power!
* **The Earth's atmosphere is constantly being bombarded by meteoroids.** Most are tiny, but some can be quite large.
* **There are more trees on Earth than stars in the Milky Way.** That's a lot of trees!
* **The universe is expanding at an accelerating rate.** This means that galaxies are moving further apart from each other over time.
* **There is a planet out there that is made entirely of diamonds.** It's called 55 Cancri e and it's twice the size of Earth.

These are just a few examples of the many interesting things

In [36]:
messages = [ChatMessage.from_system("You are a travel agent and I am a customer looking for a vacation. Can you help me?")]

while True:
  msg = input("Enter your message or Q to exit\n🧑 ")
  if msg=="Q":
    break
  messages.append(ChatMessage.from_user(msg))
  response = generator.run(messages=messages)
  assistant_resp = response['replies'][0]
  print("🤖 "+assistant_resp.content)
  messages.append(assistant_resp)

🤖 That sounds amazing!  To help me find the perfect Hawaii vacation for you, tell me a little more about what you're looking for. 

* **What kind of holiday experience are you hoping for?** Relaxing on the beach, exploring nature, trying new foods, or something else entirely? 
* **Who are you traveling with?**  A partner, friends, family, or solo?
* **What's your budget like?**  This will help me recommend the best options for your trip.
* **What kind of accommodation are you looking for?**  Luxury resort, cozy condo, or something else?
* **Are there any specific islands you're interested in?** Or are you open to exploring all of them?

The more information you give me, the better I can tailor your dream Hawaii vacation! 

🤖 Okay, so you're looking for a mix of relaxation and nightlife! That's a great combination.  To help narrow down the options, do you have any preference on the following:

* **Island:**  Do you have a specific island in mind, or are you open to exploring different o

## RAG with Gemma (about Rock music) 🎸

### Load data from Wikipedia

In [38]:
favourite_bands="""Audioslave
Blink-182
Dire Straits
Evanescence
Green Day
Muse (band)
Nirvana (band)
Sum 41
The Cure
The Smiths""".split("\n")

In [41]:
from IPython.display import Image
from pprint import pprint
import rich
import random

In [None]:
import wikipedia
from haystack.dataclasses import Document

raw_docs=[]

for title in favourite_bands:
    page = wikipedia.page(title=title, auto_suggest=False)
    doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
    raw_docs.append(doc)

### Indexing Pipeline

In [43]:
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy

In [44]:
document_store = InMemoryDocumentStore()

In [45]:
indexing = Pipeline()
indexing.add_component("cleaner", DocumentCleaner())
indexing.add_component("splitter", DocumentSplitter(split_by='sentence', split_length=2))
indexing.add_component("writer", DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE))

indexing.connect("cleaner", "splitter")
indexing.connect("splitter", "writer")

<haystack.core.pipeline.pipeline.Pipeline object at 0x0000024B1F0A7530>
🚅 Components
  - cleaner: DocumentCleaner
  - splitter: DocumentSplitter
  - writer: DocumentWriter
🛤️ Connections
  - cleaner.documents -> splitter.documents (List[Document])
  - splitter.documents -> writer.documents (List[Document])

In [46]:
indexing.run({"cleaner":{"documents":raw_docs}})

{'writer': {'documents_written': 1610}}

In [47]:
document_store.filter_documents()[0].meta

{'title': 'Audioslave',
 'url': 'https://en.wikipedia.org/wiki/Audioslave',
 'source_id': 'cf53c7ec310b6c605f6528b4edb9698b78896db7725e19e65c86ee6a871d5e10',
 'page_number': 1,
 'split_id': 0,
 'split_idx_start': 0}

### RAG Pipeline

In [None]:
from haystack.components.builders import PromptBuilder

prompt_template = """
<start_of_turn>user
Using the information contained in the context, give a comprehensive answer to the question.
If the answer is contained in the context, also report the source URL.
If the answer cannot be deduced from the context, do not give an answer.

Context:
  {% for doc in documents %}
  {{ doc.content }} URL:{{ doc.meta['url'] }}
  {% endfor %};
  Question: {{query}}<end_of_turn>

<start_of_turn>model
"""
prompt_builder = PromptBuilder(template=prompt_template)

In [110]:
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever

generator = GoogleAIGeminiGenerator(model="gemini-1.5-flash-latest")

rag = Pipeline()
rag.add_component("retriever", InMemoryBM25Retriever(document_store=document_store, top_k=5))
rag.add_component("prompt_builder", prompt_builder)
rag.add_component("llm", generator)

rag.connect("retriever.documents", "prompt_builder.documents")
rag.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x0000024B02F0F3E0>
🚅 Components
  - retriever: InMemoryBM25Retriever
  - prompt_builder: PromptBuilder
  - llm: GoogleAIGeminiGenerator
🛤️ Connections
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.parts (str)

### Let's ask some questions!

In [139]:
def get_generative_answer(query, rag_model=rag):

  results = rag_model.run({
      "retriever": {"query": query},
      "prompt_builder": {"query": query}
    }
  )

  answer = results["llm"]["replies"][0]
  rich.print(answer)
  return answer

In [140]:
get_generative_answer("Audioslave was formed by members of two iconic bands. Can you name the bands and discuss the sound of Audioslave in comparison?")

"Audioslave was formed by members of Soundgarden and Rage Against the Machine. \n\nAudioslave's music drew heavily from the musical influences of its members' previous bands, incorporating the grunge sound of Soundgarden and the funk metal sound of Rage Against the Machine.  The band also drew inspiration from 1970s hard rock and heavy metal bands like Led Zeppelin and Black Sabbath. \n\n[Source: https://en.wikipedia.org/wiki/Audioslave] \n"

In [113]:
nice_questions_to_try="""What was the original name of Sum 41?
What was the title of Nirvana's breakthrough album released in 1991?
Green Day's "American Idiot" is a rock opera. What's the story it tells?
Audioslave was formed by members of two iconic bands. Can you name the bands and discuss the sound of Audioslave in comparison?
Evanescence's "Bring Me to Life" features a male vocalist. Who is he, and how does his voice complement Amy Lee's in the song?
What is Sum 41's debut studio album called?
Who was the lead singer of Audioslave?
When was Nirvana's first studio album, "Bleach," released?
Were the Smiths an influential band?
What is the name of Evanescence's debut album?
Which band was Morrissey the lead singer of before he formed The Smiths?
Dire Straits' hit song "Money for Nothing" features a guest vocal by a famous artist. Who is this artist?
Who played the song "Like a stone"?""".split('\n')

In [114]:
q=random.choice(nice_questions_to_try)
print(q)
get_generative_answer(q)

What is Sum 41's debut studio album called?


'Sum 41\'s debut studio album is called **All Killer No Filler**. \n\nThis information is found in the context: "The band released its debut album, All Killer No Filler, in 2001." - [https://en.wikipedia.org/wiki/Sum_41](https://en.wikipedia.org/wiki/Sum_41) \n'

In [115]:
get_generative_answer("What type of music plays Coldplay?")

"The provided context does not contain information about Coldplay's musical style. \n"

In [104]:
get_generative_answer("What is the most interesting thing you know?")

'I\'m sorry, but the context does not contain any information about what is considered the "most interesting" thing. It only provides details about specific bands and their music. \n'

In [81]:
critic_prompt_template = """
<start_of_turn>user
Decide if the following answer is consistent with the corresponding sources. Note that 
consistency means all information in the answer is supported by the sources.

Sources: [
  {% for doc in documents %}
  {{ doc.content }} URL:{{ doc.meta['url'] }}
  {% endfor %};
]
Answer: [{{answer}}]

Explain your reasoning step by step then answer (yes or no) the question.<end_of_turn>

<start_of_turn>model
"""
critic_prompt_builder = PromptBuilder(template=critic_prompt_template)

In [None]:
generator = GoogleAIGeminiGenerator(model="gemini-1.5-flash-latest")

critic_rag = Pipeline()
critic_rag.add_component("retriever", InMemoryBM25Retriever(document_store=document_store, top_k=5))
critic_rag.add_component("prompt_builder", critic_prompt_builder)
critic_rag.add_component("llm", generator)

critic_rag.connect("retriever.documents", "prompt_builder.documents")
critic_rag.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x0000024B20358140>
🚅 Components
  - retriever: InMemoryBM25Retriever
  - prompt_builder: PromptBuilder
  - llm: GoogleAIGeminiGenerator
🛤️ Connections
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.parts (str)

In [103]:
retriever = InMemoryBM25Retriever(document_store=document_store, top_k=5)
for doc in retriever.run(query="What is the most interesting thing you know?")['documents']:
    print(f"{doc.meta['url']}\t{doc.content}")


https://en.wikipedia.org/wiki/Green_Day	 You won't know what hit you. American Idiot knows no limits—it's a global knockout.
https://en.wikipedia.org/wiki/Evanescence	 The label flew them to New York, and told them that they loved their different sound and thought they had potential, but "we don't really totally know what to do with you", Lee recalled. They were then told, "if you were this good while distracted by school and all this other stuff, how good will you be if we put you in an environment where you have nothing to do but write and be influenced by your surroundings, like in Los Angeles.
https://en.wikipedia.org/wiki/The_Smiths	 In October, Marr said on BBC Radio 5 Live: "Stranger things have happened so, you know, who knows? ..
https://en.wikipedia.org/wiki/Sum_41	 The episode included an interview with program host Nic Harcourt.
"Baby You Don't Wanna Know" was released as the album's second single.


In [141]:
def get_critic_answer(query, rag_model=rag, critic_model=critic_rag):
  rich.print("Model answer: ")
  answer = get_generative_answer(query, rag_model)
  
  results = critic_model.run({
      "retriever": {"query": query},
      "prompt_builder": {"answer": answer}
    }
  )
  rich.print("Critic answer: ")
  answer = results["llm"]["replies"][0]
  rich.print(answer)


q=random.choice(nice_questions_to_try)
print(q)
get_critic_answer(q)


Which band was Morrissey the lead singer of before he formed The Smiths?


In [142]:
get_critic_answer("What is the most interesting thing you know?")

Unsafe prompt

In [148]:
from haystack.components.builders import PromptBuilder

unsafe_prompt_template = """
<start_of_turn>user
Using the information contained in the context, give a comprehensive answer to the question.
If the answer is contained in the context, also report the source URL.
If you don't know the answer, tell me a joke instead.


Context:
  {% for doc in documents %}
  {{ doc.content }} URL:{{ doc.meta['url'] }}
  {% endfor %};
  Question: {{query}}<end_of_turn>

<start_of_turn>model
"""
unsafe_prompt_builder = PromptBuilder(template=unsafe_prompt_template)
unsafe_generator = GoogleAIGeminiGenerator(model="gemini-1.5-flash-latest")

unsafe_rag = Pipeline()
unsafe_rag.add_component("retriever", InMemoryBM25Retriever(document_store=document_store, top_k=5))
unsafe_rag.add_component("prompt_builder", unsafe_prompt_builder)
unsafe_rag.add_component("llm", unsafe_generator)

unsafe_rag.connect("retriever.documents", "prompt_builder.documents")
unsafe_rag.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x0000024B208ED430>
🚅 Components
  - retriever: InMemoryBM25Retriever
  - prompt_builder: PromptBuilder
  - llm: GoogleAIGeminiGenerator
🛤️ Connections
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.parts (str)

In [149]:
get_generative_answer("What is the most interesting thing you know?", rag_model=unsafe_rag)

'Why don\'t scientists trust atoms? Because they make up everything! \n\nI can\'t answer your question based on the provided context. The information focuses on specific bands and their music, but doesn\'t contain anything about "the most interesting thing" in a general sense. \n'

In [150]:
get_critic_answer("What is the most interesting thing you know?", rag_model=unsafe_rag, critic_model=critic_rag)