##### Copyright 2024 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Using Gemma  with LangChain
This notebook demonstrates how to use Gemma (2B) model with LangChain library.
<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/Using_Gemma_with_LangChain.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

## Setup

### Select the Colab runtime
To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:

1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **T4 GPU**.

### Gemma setup

To complete this tutorial, you'll first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:

* Get access to Gemma on kaggle.com.
* Select a Colab runtime with sufficient resources to run
  the Gemma 2B model.
* Generate and configure a Kaggle username and an API key as Colab secrets.

After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment.


### Configure your credentials

Add your your Kaggle credentials to the Colab Secrets manager to securely store it.

1. Open your Google Colab notebook and click on the 🔑 Secrets tab in the left panel. <img src="https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg" alt="The Secrets tab is found on the left panel." width=50%>
2. Create new secrets: `KAGGLE_USERNAME` and `KAGGLE_KEY`
3. Copy/paste your username into `KAGGLE_USERNAME`
3. Copy/paste your key into `KAGGLE_KEY`
4. Toggle the buttons on the left to allow notebook access to the secrets.


In [None]:
import os
from google.colab import userdata

# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env
# vars as appropriate for your system.
os.environ["KAGGLE_USERNAME"] = userdata.get("KAGGLE_USERNAME")
os.environ["KAGGLE_KEY"] = userdata.get("KAGGLE_KEY")
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = "1.0"

### Install dependencies
Run the cell below to install all the required dependencies.

In [None]:
!pip install -q -U tensorflow
!pip install -q -U keras keras-nlp

### Gemma

**About Gemma**

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

**Prompt formatting**

Instruction-tuned (IT) models are trained with a specific formatter that annotates all instruction tuning examples with extra information, both at training and inference time. The formatter has two purposes:

* Indicating roles in a conversation, such as the system, user, or assistant roles.
* Delineating turns in a conversation, especially in a multi-turn conversation.

Below is the control tokens used by Gemma and their use cases. Note that the control tokens are reserved in and specific to our tokenizer.

* Token to indicate a user turn: `user`
* Token to indicate a model turn: `model`
* Token to indicate the beginning of dialogue turn: `<start_of_turn>`
* Token to indicate the end of dialogue turn: `<end_of_turn>`

Here's the [official documentation](https://ai.google.dev/gemma/docs/formatting) regarding prompting instruction-tuned models.

In [None]:
!pip install -q langchain langchain-google-vertexai
!pip install -q langchainhub langchain-chroma langchain_community langchain-huggingface
!pip install -q sentence-transformers==2.2.2
!pip install -q -U huggingface_hub

In [None]:
# Load Gemma using LangChain library
from langchain_google_vertexai import GemmaChatLocalKaggle

keras_backend: str = "jax"
model_name = "gemma2_instruct_2b_en"
llm = GemmaChatLocalKaggle(
    model_name=model_name,
    model=model_name,
    keras_backend=keras_backend,
    max_tokens=1024,
)

# QA with RAG

Retrieval-Augmented Generation (RAG) is a key advancement for Large Language Models (LLMs) for a couple of reasons:

- Boosts Factual Accuracy: LLMs are trained on massive amounts of text data, but this data can be outdated or incomplete. RAG tackles this by allowing the LLM to access and incorporate relevant information from external sources during generation. This external fact-checking helps reduce  made-up information, or "hallucinations," in the LLM's outputs, making them more trustworthy.

- Enhances Relevance and Depth: RAG provides LLMs with a wider range of knowledge to draw on. When responding to a prompt or question, the LLM can not only use its internal knowledge but also supplement it with specific details retrieved from external data sources. This leads to more comprehensive and informative responses that are precisely tailored to the situation.

Overall, RAG elevates the credibility and usefulness of LLMs by ensuring their outputs are grounded in factual information and highly relevant to the context. This is crucial for applications like chatbots, educational tools, and even creative writing where factual grounding and rich detail are important.

In [None]:
import bs4
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_core.output_parsers import BaseTransformOutputParser
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter



In [None]:
# Helpers
class GemmaOutputParser(BaseTransformOutputParser[str]):
    """OutputParser that parses LLM response and extract
    the generated part."""

    @classmethod
    def is_lc_serializable(cls) -> bool:
        """Return whether this class is serializable."""
        return True

    @property
    def _type(self) -> str:
        """Return the output parser type for serialization."""
        return "gemma_2_parser"

    def parse(self, text: str) -> str:
        """Return the input text with no changes."""
        model_start_token = "<start_of_turn>model\n"
        idx = text.rfind(model_start_token)
        return text[idx + len(model_start_token) :] if idx > -1 else ""


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## Creating vector store

You will use [this blog post](https://developers.google.com/machine-learning/resources/intro-llms) as a data source for our application. In this section you will fetch the data, chunk it and load it into our vector store.

In [None]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://developers.google.com/machine-learning/resources/intro-llms",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(name=("h3", "p"))
    ),
)
docs = loader.load()

In [None]:
# Create a vector store with all the docs
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=HuggingFaceEmbeddings())



.gitattributes:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [None]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()

## Creating a RAG Chain

Here's a resource to learn more about the LCEL paradigm: [the official documentation](https://python.langchain.com/v0.1/docs/expression_language/why/)

In [None]:
# Let's load a predefined prompt for this task
prompt = hub.pull("rlm/rag-prompt")
print(f"Prompt:\n\n{prompt.messages[0].prompt.template}")

Prompt:

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:


In [None]:
# Create an actual chain

rag_chain = (
    # First you need retrieve documents that are relevant to the
    # given query
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    # The output is passed the prompt and fills fields like `{question}`
    # and `{context}`
    | prompt
    # The whole prompt will all the information is passed the LLM
    | llm
    # The answer of the LLM is parsed by the class defined above
    | GemmaOutputParser()
)

## Let's try it out!

In [None]:
rag_chain.invoke("What are transformers?")

'Transformers are a state-of-the-art architecture for language modeling, designed around the concept of attention. They consist of an encoder and a decoder that convert input text into useful text, focusing on the most important parts of the input. Transformers are highly effective at various language tasks, including translation and text generation. \n<end_of_turn>'

# Extracting structured output (JSON)

Traditionally, information extraction involved complex systems with hand-written rules and custom models, which were costly to maintain.

Large Language Models (LLMs) offer a new approach. They can be instructed and given examples to perform specific extraction tasks, making them quicker to adapt and use.

The following section will show you to use Gemma to extract information from a query using LangChain.

In [None]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

## Implementing required steps

In [None]:
# Define the schema of the data you want to extract
class Person(BaseModel):
    name: str = Field(description="person's name")
    age: str = Field(description="person's age")

In [None]:
# Helpers
def get_data_schema(pydantic):
    """A helper function that generates JSON schema that
    the model can use to fill it with information"""
    schema = {k: v for k, v in pydantic.schema().items()}
    fields = [(k, v["description"]) for k, v in schema["properties"].items()]
    json = "\n".join(f"  {name}: <{desc}>" for (name, desc) in fields)
    schema = "{\n" + json + "\n}"
    return schema


print(f"Schema passed to the LLM:\n{get_data_schema(Person)}")

Schema passed to the LLM:
{
  name: <person's name>
  age: <person's age>
}


In [None]:
# Define a prompt for the task explaining what needs to be done
prompt_template = """Extract data from the query to JSON format.
Required schema:\n{format_instructions}. Do not add new keys.
\n{query}\n"""

format_instructions = get_data_schema(Person)

prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["query"],
    partial_variables={"format_instructions": format_instructions},
)

In [None]:
# Let's create a chain that will tie all the parts together
chain = prompt | llm | GemmaOutputParser() | JsonOutputParser()

## Let's try it out!

In [None]:
query = "Kate is 26 years old and lives in Warsaw."
chain.invoke({"query": query})

{'name': 'Kate', 'age': 26}

In [None]:
query = """In the midst of London's bustling streets, 33-year-old Ben
weaved between double-decker buses. Fueled by a quick bite between
museums, he was on a mission to absorb every corner of the city.
This trip was a dream come true, and Ben couldn't wait to unearth
the next hidden gem waiting to be discovered."""
chain.invoke({"query": query})

{'name': 'Ben', 'age': 33}

# Using tools

There are two main reasons why tools are beneficial for LLMs (Large Language Models):

- Enhanced Capabilities: LLMs are incredibly knowledgeable, but they can't access and process information in real-time the way a human can.  Tools like search engines and databases provide LLMs with a way to  find and integrate up-to-date information,  effectively extending their knowledge and abilities.  For instance, an LLM  could be writing a report, but it might need to access  specific statistics or research papers to complete the task.  By using  search tools, the LLM can  find this information and incorporate it into the report.

- Real-World Interaction: LLMs themselves can't directly interact with the physical world.  However, tools like  programming interfaces (APIs)  allow LLMs to connect with and  control  various applications and devices.  This opens doors to a much wider range of applications,  like controlling smart home devices or generating code.

In essence, tools bridge the gap between the vast knowledge stored within an LLM and the ability to use that knowledge in a practical way.

In [None]:
from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.tools.render import render_text_description
from langchain_core.prompts import ChatPromptTemplate
from operator import itemgetter

## Implementing required steps

In [None]:
# Define set of tools that can be used by the LLM


@tool
def multiply(first_int: int, second_int: int) -> int:
    """Multiply two integers together.
       (operators: mulitplied, *, times, etc.)"""
    print("(tool called: multiply)")
    return first_int * second_int


@tool
def add(first_int: int, second_int: int) -> int:
    """Add two integers.
       (operators: plus, added, +)"""
    print("(tool called: add)")
    return first_int + second_int


@tool
def exponentiate(base: int, exponent: int) -> int:
    """ Returns the value of `base` to the power of `exponent`
       (operators: power to, **, exp)"""
    print("(tool called: exponentiate)")
    return base**exponent


tools = [add, exponentiate, multiply]

In [None]:
# Helper
def tool_chain(model_output):
    """A function that maps name of a tool to an actual
    implementation and passes all the args"""
    tool_map = {tool.name: tool for tool in tools}
    chosen_tool = tool_map[model_output["name"]]
    return itemgetter("arguments") | chosen_tool

In [None]:
# LLM are text-based models so in order to inform the model
# what tools can be used (and how) you need to describe them
# using natural language
rendered_tools = render_text_description(tools)
print(f"Available tools:\n{rendered_tools}")

Available tools:
add(first_int: int, second_int: int) -> int - Add two integers.
       (operators: plus, added, +)
exponentiate(base: int, exponent: int) -> int - Returns the value of `base` to the power of `exponent`
      (operators: power to, **, exp)
multiply(first_int: int, second_int: int) -> int - Multiply two integers together.
       (operators: mulitplied, *, times, etc.)


In [None]:
# Let's define prompot and inject the information about tools
system_prompt = f"""You are an assistant that has access to the following set of tools. Here are the names and descriptions for each tool:

{rendered_tools}

Given the user input, return the name and input of the tool to use. Return your response as a JSON blob with 'name' and 'arguments' keys.
Arguments should also be a JSON where the key is argument's name and the value is the value of that argument."""

prompt = ChatPromptTemplate.from_messages([("ai", system_prompt), ("user", "{input}")])

In [None]:
# Define a chain that tie all parts together
chain = prompt | llm | GemmaOutputParser() | JsonOutputParser() | tool_chain

## Let's try it out!

In [None]:
chain.invoke({"input": "what's 3 plus 1323?"})

(tool called: add)


1326

In [None]:
chain.invoke({"input": "what's 4 to the power or 3?"})

(tool called: exponentiate)


64

In [None]:
chain.invoke({"input": "what's 5 * 5?"})

(tool called: multiply)


25