<a target="_blank" href="https://colab.research.google.com/github/shareproduct123/cohere_notebooks/blob/main/notebooks/guides/getting-started/tutorial_pt1.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Cohere Tutorial

#### Build your first Cohere application: An onboarding assistant for new hires

Welcome to the Cohere tutorial – a hands-on introduction to Cohere!

In this tutorial, you will learn how to use the Cohere API, specifically three endpoints: Chat, Embed, and Rerank.

This tutorial is split over seven parts, with each part focusing on one use case, as follows:

- Part 1: Installation and Setup (Pre-requisite)
- Part 2: Text Generation
- Part 3: Chatbots
- Part 4: Semantic Search
- Part 5: Reranking
- Part 6: Retrieval-Augmented Generation (RAG)
- Part 7: Agents with Tool Use

You'll learn about these use cases by building an onboarding assistant that helps new hires onboard to a fictitious company called Co1t. The assistant can help write introductions, answer user questions about the company, search for information from e-mails, and create meeting appointments.

We recommend that you follow the parts sequentially. However, feel free to skip to specific parts if you want (apart from Part 1, which is a pre-requisite) because each part also works as a standalone tutorial.

Total Duration: ~15 minutes

## Installation and Setup

The Cohere platform lets developers access large language model (LLM) capabilities with a few lines of code. These LLMs can solve a broad spectrum of natural language use cases, including classification, semantic search, paraphrasing, summarization, and content generation.

Cohere's models can be accessed through the playground, SDK, and CLI tool. We support SDKs in four different languages: Python, Typescript, Java, and Go.

This tutorial uses the Python SDK and accesses the models through the Cohere platform.

To get started, first install the Cohere Python SDK.

In [1]:
!pip install cohere --upgrade # Upgrade to the latest version of cohere

Collecting cohere
  Downloading cohere-5.13.0-py3-none-any.whl.metadata (3.5 kB)
Collecting fastavro<2.0.0,>=1.9.4 (from cohere)
  Downloading fastavro-1.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Collecting httpx-sse==0.4.0 (from cohere)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting parameterized<0.10.0,>=0.9.0 (from cohere)
  Downloading parameterized-0.9.0-py2.py3-none-any.whl.metadata (18 kB)
Collecting types-requests<3.0.0,>=2.0.0 (from cohere)
  Downloading types_requests-2.32.0.20241016-py3-none-any.whl.metadata (1.9 kB)
Downloading cohere-5.13.0-py3-none-any.whl (249 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m249.7/249.7 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx_sse-0.4.0-py3-none-any.whl (7.8 kB)
Downloading fastavro-1.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 

Next, we'll import the `cohere` library and create a client to be used throughout the examples. We create a client by passing the Cohere API key as an argument. To get an API key, [sign up with Cohere](https://dashboard.cohere.com/welcome/register) and get the API key [from the dashboard](https://dashboard.cohere.com/api-keys).

In [2]:
import cohere

co = cohere.ClientV2(api_key="1GLAqzgbTVGbUTfUeblk3ByYtQyN0fjvHLbs5fPr") # Get your API key here: https://dashboard.cohere.com/api-keys
co

# Tivaly Key: "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"   "RtM9PLbik0060hmsY6hv8pwoVJKBBojafBgvL8x2 "

<cohere.client_v2.ClientV2 at 0x79735dc11870>

# Accessing Cohere from Other Platforms

The Cohere platform is the fastest way to access Cohere's models and get started.

However, if you prefer other options, you can access Cohere's models through other platforms such as Amazon Bedrock, Amazon SageMaker, Azure AI Studio, and Oracle Cloud Infrastructure (OCI) Generative AI Service.

Read this documentation on [Cohere SDK cloud platform compatibility](https://docs.cohere.com/docs/cohere-works-everywhere).

## Amazon Bedrock

The following is how you can create a Cohere client on Amazon Bedrock.

For further information, read this documentation on [Cohere on Bedrock](https://docs.cohere.com/docs/cohere-on-aws#amazon-bedrock).

In [None]:
import cohere

co = cohere.BedrockClient(
    aws_region="...",
    aws_access_key="...",
    aws_secret_key="...",
    aws_session_token="...",
)

## Amazon SageMaker

The following is how you can create a Cohere client on Amazon SageMaker.

For further information, read this documentation on [Cohere on SageMaker](https://docs.cohere.com/docs/cohere-on-aws#amazon-sagemaker).

In [None]:
import cohere

co = cohere.SagemakerClient(
    aws_region="us-east-1",
    aws_access_key="...",
    aws_secret_key="...",
    aws_session_token="...",
)

## Microsoft Azure

The following is how you can create a Cohere client on Microsoft Azure.

For further information, read this documentation on [Cohere on Azure](https://docs.cohere.com/docs/cohere-on-microsoft-azure).

In [None]:
import cohere

co = cohere.Client(
  api_key="...",
  base_url="...",
)

In Part 2, we'll get started with the first use case - text generation.

# Text Generation

Command is Cohere’s flagship LLM. It generates a response based on a user message or prompt. It is trained to follow user commands and to be instantly useful in practical business applications, like summarization, copywriting, extraction, and question-answering.

Command R and Command R+ are the most recent models in the Command family. They are the market-leading models that balance high efficiency with strong accuracy to enable enterprises to move from proof of concept into production-grade AI.

You'll use Chat, the Cohere endpoint for accessing the Command models.

In this tutorial, you'll learn about:
- Basic text generation
- Prompt engineering
- Parameters for controlling output
- Structured output generation
- Streamed output

You'll learn these by building an onboarding assistant for new hires.

## Basic text generation

To get started with Chat, we need to pass two parameters, `model` for the LLM model ID and `messages`, which we add a single user message. We then call the Chat endpoint through the client we created earlier.

The response contains several objects. For simplicity, what we want right now is the `message.content[0].text` object.

Here's an example of the assistant responding to a new hire's query asking for help to make introductions.

In [3]:
# Add the user message
message = "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."
'''
# Generate the response
# Use generate with prompt instead of chat with messages
response = co.generate(
    model="command-r-plus-08-2024",
    prompt=message,
    max_tokens=200, # adjust as needed
)

'''
response = co.chat(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": message}],
)

print(response.message.content[0].text)

Here's a draft of an introduction message for your first day at Co1t:

"Hello everyone!

My name is [Your Name], and I am thrilled to join the Co1t family as a new team member! Starting today, I will be working with all of you, and I couldn't be more excited to contribute to this innovative startup.

A little about myself: I have a background in [Your Educational or Professional Field] and have always been passionate about [Relevant Skills or Interests]. I believe my experience in [Specific Skills or Projects] will be valuable in helping Co1t achieve its goals. I'm looking forward to learning from all of you and sharing my insights as well.

I'm excited to collaborate, brainstorm, and tackle the challenges ahead as a team. Feel free to reach out if you'd like to connect and chat further. Let's make some amazing things happen together!

Cheers,
[Your Name]"

Feel free to customize and add more personal details to make the introduction more engaging and reflective of your personality! Go

Further reading:
- [Chat endpoint API reference](https://docs.cohere.com/v2/reference/chat)
- [Documentation on Chat fine-tuning](https://docs.cohere.com/docs/chat-fine-tuning)
- [Documentation on Command R+](https://docs.cohere.com/docs/command-r-plus)
- [LLM University module on text generation](https://cohere.com/llmu#text-generation)


## Prompt engineering

Prompting is at the heart of working with LLMs. The prompt provides context for the text that we want the model to generate. The prompts we create can be anything from simple instructions to more complex pieces of text, and they are used to encourage the model to produce a specific type of output.

In this section, we'll look at a couple of prompting techniques.

The first is to add more specific instructions to the prompt. The more instructions you provide in the prompt, the closer you can get to the response you need.

The limit of how long a prompt can be is dependent on the maximum context length that a model can support (in the case Command R/R+, it's 128k tokens).

Below, we'll add one additional instruction to the earlier prompt: the length we need the response to be.

In [4]:
# Add the user message
message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."

# Generate the response
response = co.chat(model="command-r-plus-08-2024",
                   messages=[{"role": "user", "content": message}])
                #    messages=[cohere.UserMessage(content=message)])

print(response.message.content[0].text)

"Excited to be on board as the newest member of Co1t, I'm [Your Name], a [Your Role/Position], eager to contribute my skills and collaborate with this talented team to drive innovation and success."


All our prompts so far use what is called zero-shot prompting, which means that provide instruction without any example. But in many cases, it is extremely helpful to provide examples to the model to guide its response. This is called few-shot prompting.

Few-shot prompting is especially useful when we want the model response to follow a particular style or format. Also, it is sometimes hard to explain what you want in an instruction, and easier to show examples.

Below, we want the response to be similar in style and length to the convention, as we show in the examples.

In [5]:
# Add the user message
user_input = "Why can't I access the server? Is it a permissions issue?"

# Create a prompt containing example outputs
message=f"""Write a ticket title for the following user request:

User request: Where are the usual storage places for project files?
Ticket title: Project File Storage Location

User request: Emails won't send. What could be the issue?
Ticket title: Email Sending Issues

User request: How can I set up a connection to the office printer?
Ticket title: Printer Connection Setup

User request: {user_input}
Ticket title:"""

# Generate the response
response = co.chat(model="command-r-plus-08-2024",
                   messages=[{"role": "user", "content": message}])

print(response.message.content[0].text)

Server Access Denied: Possible Permissions Issue


Further reading:
- [Documentation on prompt engineering](https://docs.cohere.com/docs/crafting-effective-prompts)
- [LLM University module on prompt engineering](https://cohere.com/llmu#prompt-engineering)

## Parameters for controlling output

The Chat endpoint provides developers with an array of options and parameters.

For example, you can choose from several variations of the Command model. Different models produce different output profiles, such as quality and latency.

In [6]:
# Add the user message
message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."

# Generate the response
response = co.chat(model="command-r-plus-08-2024",
                   messages=[{"role": "user", "content": message}])

print(response.message.content[0].text)

"Excited to be on board as the newest member of Co1t, I'm [Your Name], a [Your Role/Position], eager to contribute my skills and collaborate with this talented team!"


Often, you’ll need to control the level of randomness of the output. You can control this using a few parameters.

The most commonly used parameter is `temperature`, which is a number used to tune the degree of randomness. You can enter values between 0.0 to 1.0.

A lower temperature gives more predictable outputs, and a higher temperature gives more "creative" outputs.

Here's an example of setting `temperature` to 0.

In [10]:
# Add the user message
message = "I like learning about the industrial revolution and how it shapes the modern world. How I can introduce myself in five words or less."

# Generate the response multiple times by specifying a low temperature value
for idx in range(3):
    response = co.chat(model="command-r-plus-08-2024",
                    messages=[{"role": "user", "content": message}],
                    temperature=0)

    print(f"{idx+1}: {response.message.content[0].text}\n")

1: History buff, curious about industrialization.



TooManyRequestsError: status_code: 429, body: data=None message="You are using a Trial key, which is limited to 10 API calls / minute. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"

And here's an example of setting `temperature` to 1.

In [9]:
# Add the user message
message = "I like learning about the industrial revolution and how it shapes the modern world. How I can introduce myself in five words or less."

# Generate the response multiple times by specifying a low temperature value
for idx in range(3):
    response = co.chat(model="command-r-plus-08-2024",
                    messages=[{"role": "user", "content": message}],
                    temperature=1)

    print(f"{idx+1}: {response.message.content[0].text}\n")

1: "I explore history's revolutionary impact."

2: Hi, I'm an Industrial Revolution enthusiast.

3: History enthusiast, Industrial Revolution lover.



Further reading:
- [Available models for the Chat endpoint](https://docs.cohere.com/docs/models#command)
- [Documentation on predictable outputs](https://docs.cohere.com/v2/docs/predictable-outputs)
- [Documentation on advanced generation parameters](https://docs.cohere.com/docs/advanced-generation-hyperparameters)


## Structured output generation

By adding the `response_format` parameter, you can get the model to generate the output as a JSON object. By generating JSON objects, you can structure and organize the model's responses in a way that can be used in downstream applications.

The `response_format` parameter allows you to specify the schema the JSON object must follow. It takes the following parameters:
- `message`: The user message
- `response_format`: The schema of the JSON object

In [11]:
import json # import the json module
# Add the user message
user_input = "Why can't I access the server? Is it a permissions issue?"
message = f"""Create an IT ticket for the following user request. Generate a JSON object.
{user_input}"""

# Generate the response multiple times by adding the JSON schema
response = co.chat(
  model="command-r-plus-08-2024",
  messages=[{"role": "user", "content": message}],
  response_format={
    "type": "json_object",
    "schema": {
      "type": "object",
      "required": ["title", "category", "status"],
      "properties": {
        "title": { "type": "string"},
        "category": { "type" : "string", "enum" : ["access", "software"]},
        "status": { "type" : "string" , "enum" : ["open", "closed"]}
      }
    }
  },
)

json_object = json.loads(response.message.content[0].text)

print(json_object)



{'title': 'Server Access Issue', 'category': 'access', 'status': 'open'}


Further reading:
- [Documentation on Structured Generations (JSON)](https://docs.cohere.com/docs/structured-outputs-json)

## Streaming responses

All the previous examples above generate responses in a non-streamed manner. This means that the endpoint would return a response object only after the model has generated the text in full.

The Chat endpoint also provides streaming support. In a streamed response, the endpoint would return a response object for each token as it is being generated. This means you can display the text incrementally without having to wait for the full completion.

To activate it, use `co.chat_stream()` instead of `co.chat()`.

In streaming mode, the endpoint will generate a series of objects. To get the actual text contents, we take objects whose `event_type` is `content-delta`.

In [12]:
# Add the user message
message = "I'm joining a new startup called Co1t today. Could you help me write a one-sentence introduction message to my teammates."

# Generate the response by streaming it
response = co.chat_stream(model="command-r-plus-08-2024",
                          messages=[{"role": "user", "content": message}])

for event in response:
    if event:
        if event.type == "content-delta":
            print(event.delta.message.content.text, end="")

"Excited to be on board as the newest member of Co1t, I'm [Your Name], a [Your Role/Position], eager to contribute my skills and collaborate with this talented team to drive innovation and success."

Further reading:
- [Documentation on streaming responses](https://docs.cohere.com/docs/streaming)

## Conclusion

In this tutorial, you learned about:
- How to get started with a basic text generation
- How to improve outputs with prompt engineering
- How to control outputs using parameter changes
- How to generate structured outputs
- How to stream text generation outputs

However, we have only done all this using direct text generations. As its name implies, the Chat endpoint can also support building chatbots, which require features to support multi-turn conversations and maintain the conversation state.

In Part 3, you'll learn how to build chatbots with the Chat endpoint.

# Chatbots

As its name implies, the Chat endpoint enables developers to build chatbots that can handle conversations. At the core of a conversation is a multi-turn dialog between the user and the chatbot. This requires the chatbot to have the state (or “memory”) of all the previous turns to maintain the state of the conversation.

In this tutorial, you'll learn about:
- Creating a custom preamble
- Creating a single-turn conversation
- Building the conversation memory
- Running a multi-turn conversation
- Viewing the chat history

You'll learn these by building an onboarding assistant for new hires.

## Setup

To get started, first we need to install the `cohere` library and create a Cohere client.

In [14]:
# pip install cohere

#import cohere

#co = cohere.ClientV2(api_key"COHERE_API_KEY") # Get your free API key: https://dashboard.cohere.com/api-keys

## Creating a custom preamble

A conversation starts with a system message, or a preamble, to help steer a chatbot’s response toward certain characteristics.

For example, if we want the chatbot to adopt a formal style, the preamble can be used to encourage the generation of more business-like and professional responses.

The recommended approach is to use two H2 Markdown headers: "Task and Context" and "Style Guide" in the exact order.

In the example below, the preamble provides context for the assistant's task (task and context) and encourages the generation of rhymes as much as possible (style guide).

In [13]:
# Add the user message
message = "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."

# Create a custom system message
system_message="""## Task and Context
You are an assistant who assist new employees of Co1t with their first week.

## Style Guide
Try to speak in rhymes as much as possible. Be professional."""

# Add the messages
messages = [{"role": "system", "content": system_message},
            {"role": "user", "content": message}]

# Generate the response
response = co.chat(model="command-r-plus-08-2024",
                   messages=messages)

print(response.message.content[0].text)

Here's a little note, a fun way to say hello,
To your new colleagues, a warm welcome you'll bestow:

"Hello, fellow Co1t crew, a new adventure awaits!
I'm thrilled to join and eager to collaborate.
My name is [Your Name], a [Your Role] here to contribute,
With skills and enthusiasm, I'm ready to disseminate.

Let's connect and create, and together we'll innovate,
Exciting times ahead, a journey we'll navigate!"

A fun rhyme to break the ice,
A great team player, you'll be nice.


Further reading:
- [Documentation on preambles](https://docs.cohere.com/docs/preambles)

## Starting the first conversation turn

Let's start with the first conversation turn.

Here, we are also adding a custom preamble or system message for generating a concise response, just to keep the outputs brief for this tutorial.

In [16]:
# Add the user message
message = "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."

# Create a custom system message
system_message="""## Task and Context
Generate concise responses, with maximum one-sentence."""

# Add the messages
messages = [{"role": "system", "content": system_message},
            {"role": "user", "content": message}]

# Generate the response
response = co.chat(model="command-r-plus-08-2024",
                   messages=messages)

print(response.message.content[0].text)

"Hi everyone! I'm thrilled to join Co1t today and look forward to collaborating with this talented team and contributing to our startup's success."


## Building the conversation memory

Now, we want the model to refine the earlier response. This requires the next generation to have access to the state, or memory, of the conversation.

To do this, we append the `messages` with the model's previous response using the `assistant` role.

Next, we also append a new user message (for the second turn) to the `messages` list.

Looking at the response, we see that the model is able to get the context from the chat history. The model is able to capture that "it" in the user message refers to the introduction message it had generated earlier.

In [14]:
# Append the previous response
messages.append({'role' : 'assistant', 'content': response.message.content[0].text})

# Add the user message
message = "Make it more upbeat and conversational."

# Append the user message
messages.append({"role": "user", "content": message})

# Generate the response with the current chat history as the context
response = co.chat(model="command-r-plus-08-2024",
                   messages=messages)

print(messages)
print("********\n")
print(response.message.content[0].text)

[{'role': 'system', 'content': '## Task and Context\nYou are an assistant who assist new employees of Co1t with their first week.\n\n## Style Guide\nTry to speak in rhymes as much as possible. Be professional.'}, {'role': 'user', 'content': "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."}, {'role': 'assistant', 'content': 'Here\'s a little note, a fun way to say hello,\nTo your new colleagues, a warm welcome you\'ll bestow:\n\n"Hello, fellow Co1t crew, a new adventure awaits!\nI\'m thrilled to join and eager to collaborate.\nMy name is [Your Name], a [Your Role] here to contribute,\nWith skills and enthusiasm, I\'m ready to disseminate.\n\nLet\'s connect and create, and together we\'ll innovate,\nExciting times ahead, a journey we\'ll navigate!"\n\nA fun rhyme to break the ice,\nA great team player, you\'ll be nice.'}, {'role': 'user', 'content': 'Make it more upbeat and conversational.'}]
********

Let's get this par

Further reading:
- [Documentation on using the Chat endpoint](https://docs.cohere.com/docs/chat-api)

## Running a multi-turn conversation


You can continue doing this for any number of turns by continuing to append the chatbot's response and the new user message to the `messages` list.

In [18]:
# Append the previous response
messages.append({"role": "assistant", "content": response.message.content[0].text})

# Add the user message
message = "Thanks. Could you create another one for my DM to my manager."

# Append the user message
messages.append({"role": "user", "content": message})

# Generate the response with the current chat history as the context
response = co.chat(model="command-r-plus-08-2024",
                   messages=messages)

print(response.message.content[0].text)

"Hey [Manager's Name], just wanted to express my excitement about starting at Co1t and looking forward to learning and growing under your guidance!"


## Viewing the chat history

To look at the current chat history, you can print the `messages` list, which contains a list of `user` and `assistant` turns in the same sequence as they were created.

In [19]:
# Append the previous response
messages.append({"role": "assistant", "content": response.message.content[0].text})

# View the chat history
for message in messages:
    print(message,"\n")

{'role': 'system', 'content': '## Task and Context\nGenerate concise responses, with maximum one-sentence.'} 

{'role': 'user', 'content': "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."} 

{'role': 'assistant', 'content': '"Hi everyone! I\'m thrilled to join Co1t today and look forward to collaborating with this talented team and contributing to our startup\'s success."'} 

{'role': 'user', 'content': 'Make it more upbeat and conversational.'} 

{'role': 'assistant', 'content': '"Hey, team! Super excited to be a part of the Co1t family now, can\'t wait to get to know you all and dive into some awesome projects together!"'} 

{'role': 'user', 'content': 'Thanks. Could you create another one for my DM to my manager.'} 

{'role': 'assistant', 'content': '"Hey [Manager\'s Name], just wanted to express my excitement about starting at Co1t and looking forward to learning and growing under your guidance!"'} 



## Conclusion

In this tutorial, you learned about:
- How to create a custom preamble
- How to create a single-turn conversation
- How to build the conversation memory
- How to run a multi-turn conversation
- How to view the chat history

You will use the same method for running a multi-turn conversation when you learn about other use cases such as RAG (Part 6) and tool use (Part 7).

But to fully leverage these other capabilities, you will need another type of language model that generates text representations, or embeddings.

In Part 4, you will learn how text embeddings can power an important use case for RAG, which is semantic search.

# Semantic Search

Text embeddings are a list of numbers that represent the context or meaning inside a piece of text. This is particularly useful in search or information retrieval applications. With text embeddings, this is called semantic search.

Semantic search solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles to capture the context or meaning of a piece of text.

With Cohere, you can generate text embeddings through the Embed endpoint (Embed v3 being the latest model), which supports over 100 languages.

In this tutorial, you'll learn about:
- Embedding the documents
- Embedding the query
- Performing semantic search
- Multilingual semantic search
- Changing embedding compression types

You'll learn these by building an onboarding assistant for new hires.

## Setup

To get started, first we need to install the `cohere` library and create a Cohere client.

In [15]:
# pip install cohere

#import cohere
import numpy as np

#co = cohere.ClientV2(api_key="COHERE_API_KEY") # Get your free API key: https://dashboard.cohere.com/api-keys

## Embedding the documents

The Embed endpoint takes in texts as input and returns embeddings as output.

For semantic search, there are two types of documents we need to turn into embeddings.
- The list of documents that we want to search from.
- The query that will be used to search the documents.

Right now, we are doing the former. We call the Embed endpoint using `co.embed()` and pass the following arguments:
- `model`: Here we choose `embed-english-v3.0`, which generates embeddings of size 1024
- `input_type`: We choose `search_document` to ensure the model treats these as the documents for search
- `texts`: The list of texts (the FAQs)
- `embedding_types`: We choose `float` to get the float embeddings.

In [16]:
# Define the documents
faqs_long = [
    {"text": "Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged."},
    {"text": "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee."},
    {"text": "Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!"},
    {"text": "Working Hours Flexibility: We prioritize work-life balance. While our core hours are 9 AM to 5 PM, we offer flexibility to adjust as needed."},
    {"text": "Side Projects Policy: We encourage you to pursue your passions. Just be mindful of any potential conflicts of interest with our business."},
    {"text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."},
    {"text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."},
    {"text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."},
    {"text": "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year."},
    {"text": "Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead."},
]

# Embed the documents
doc_emb = co.embed(
            model="embed-english-v3.0",
            input_type="search_document",
            texts=[doc['text'] for doc in faqs_long],
            embedding_types=["float"]).embeddings.float

Further reading:
- [Embed endpoint API reference](https://docs.cohere.com/reference/embed)
- [Documentation on the Embed endpoint](https://docs.cohere.com/docs/embeddings)
- [Documentation on the models available on the Embed endpoint](https://docs.cohere.com/docs/cohere-embed)
- [LLM University module on Text Representation](https://cohere.com/llmu#text-representation)

## Embedding the query

Next, we add a query, which asks about how to stay connected to company updates.

We choose `search_query` as the `input_type` to ensure the model treats this as the query (instead of documents) for search.

In [17]:
# Add the user query
query = "Ways to connect with my teammates"

# Embed the query
query_emb = co.embed(
            model="embed-english-v3.0",
            input_type="search_query",
            texts=[query],
            embedding_types=["float"]).embeddings.float

## Perfoming semantic search

Now, we want to search for the most relevant documents to the query. We do this by computing the similarity between the embeddings of the query and each of the documents.

There are various approaches to compute similarity between embeddings, and we'll choose the dot product approach. For this, we use the `numpy` library which comes with the implementation.

Each query-document pair returns a score, which represents how similar the pair is. We then sort these scores in descending order and select the top-most similar pairs, which we choose 2 (this is an arbitrary choice, you can choose any number).

Here, we show the most relevant documents with their similarity scores.

In [18]:
# Compute dot product similarity and display results
def return_results(query_emb, doc_emb, documents):
    n = 2 # customize your top N results
    scores = np.dot(query_emb, np.transpose(doc_emb))[0]
    max_idx = np.argsort(-scores)[:n]

    for rank, idx in enumerate(max_idx):
        print(f"Rank: {rank+1}")
        print(f"Score: {scores[idx]}")
        print(f"Document: {documents[idx]}\n")

return_results(query_emb, doc_emb, faqs_long)

Rank: 1
Score: 0.38729846176279636
Document: {'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'}

Rank: 2
Score: 0.3272549670724578
Document: {'text': 'Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead.'}



## Multilingual semantic search

The Embed endpoint also supports multilingual semantic search via the `embed-multilingual-...` models. This means you can perform semantic search on texts in different languages.

Specifically, you can do both multilingual and cross-lingual searches using one single model.

Multilingual search happens when the query and the result are of the same language. For example, an English query of “places to eat” returning an English result of “Bob's Burgers.” You can replace English with other languages and use the same model for performing search.

Cross-lingual search happens when the query and the result are of a different language. For example, a Hindi query of “खाने की जगह” (places to eat) returning an English result of “Bob's Burgers.”

In the example below, we repeat the steps of performing semantic search with one difference – changing the model type to the multilingual version. Here, we use the `embed-multilingual-v3.0` model. Here, we are searching a French version of the FAQ list using an English query.

In [19]:
# Define the documents
faqs_short_fr = [
    {"text" : "Remboursement des frais de voyage : Gérez facilement vos frais de voyage en les soumettant via notre outil financier. Les approbations sont rapides et simples."},
    {"text" : "Travailler de l'étranger : Il est possible de travailler à distance depuis un autre pays. Il suffit de coordonner avec votre responsable et de vous assurer d'être disponible pendant les heures de travail."},
    {"text" : "Avantages pour la santé et le bien-être : Nous nous soucions de votre bien-être et proposons des adhésions à des salles de sport, des cours de yoga sur site et une assurance santé complète."},
    {"text" : "Fréquence des évaluations de performance : Nous organisons des bilans informels tous les trimestres et des évaluations formelles deux fois par an."}
]

# Embed the documents
doc_emb = co.embed(
            model="embed-multilingual-v3.0",
            input_type="search_document",
            texts=[doc['text'] for doc in faqs_short_fr],
            embedding_types=["float"]).embeddings.float

# Add the user query
query = "What's your remote-working policy?"

# Embed the query
query_emb = co.embed(
            model="embed-multilingual-v3.0",
            input_type="search_query",
            texts=[query],
            embedding_types=["float"]).embeddings.float

# Compute dot product similarity and display results
return_results(query_emb, doc_emb, faqs_short_fr)

Rank: 1
Score: 0.44275861574398384
Document: {'text': "Travailler de l'étranger : Il est possible de travailler à distance depuis un autre pays. Il suffit de coordonner avec votre responsable et de vous assurer d'être disponible pendant les heures de travail."}

Rank: 2
Score: 0.32783563708365715
Document: {'text': 'Avantages pour la santé et le bien-être : Nous nous soucions de votre bien-être et proposons des adhésions à des salles de sport, des cours de yoga sur site et une assurance santé complète.'}



Further reading:
- [The list of supported languages for multilingual Embed](https://docs.cohere.com/docs/cohere-embed#list-of-supported-languages)

# Changing embedding compression types

Semantic search over large datasets can require a lot of memory, which is expensive to host in a vector database. Changing the embeddings compression type can help reduce the memory footprint.

A typical embedding model generates embeddings as float32 format (consuming 4 bytes). By compressing the embeddings to int8 format (1 byte), we can reduce the memory 4x while keeping 99.99% of the original search quality.

We can go even further and use the binary format (1 bit), which reduces the needed memory 32x while keeping 90-98% of the original search quality.

The Embed endpoint supports the following formats: `float`, `int8`, `unint8`, `binary`, and `ubinary`. You can get these different compression levels by passing the `embedding_types` parameter.

In the example below, we embed the documents in two formats: `float` and `int8`.

In [25]:
# Embed the documents with the given embedding types
doc_emb = co.embed(
            model="embed-english-v3.0",
            input_type="search_document",
            texts=[doc['text'] for doc in faqs_long],
            embedding_types=["float","int8"]).embeddings

# Add the user query
query = "Ways to connect with my teammates"

# Embed the query
query_emb = co.embed(
            model="embed-english-v3.0",
            input_type="search_query",
            texts=[query],
            embedding_types=["float","int8"]).embeddings

Here are the search results of using the `float` embeddings (same as the earlier example).

In [26]:
# Compute dot product similarity and display results
return_results(query_emb.float, doc_emb.float, faqs_long)

Rank: 1
Score: 0.38822364122128694
Document: {'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'}

Rank: 2
Score: 0.32768235463023454
Document: {'text': 'Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead.'}



And here are the search results of using the `int8` embeddings.

In [27]:
# Compute dot product similarity and display results
return_results(query_emb.int8, doc_emb.int8, faqs_long)

Rank: 1
Score: 614131
Document: {'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'}

Rank: 2
Score: 516092
Document: {'text': 'Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead.'}



Further reading:
- [Documentation on embeddings compression levels](https://docs.cohere.com/docs/embeddings#compression-levels)

## Conclusion

In this tutorial, you learned about:
- How to embed documents for search
- How to embed queries
- How to perform semantic search
- How to perform multilingual semantic search
- How to change the embedding compression types

A high-performance and modern search system typically includes a reranking stage, which further boosts the search results.

In Part 5, you will learn how to add reranking to a search system.

# Reranking

Reranking is a technique that leverages embeddings as the last stage of a retrieval process, and is especially useful in RAG systems.

We can rerank results from semantic search as well as any other search systems such as lexical search. This means that companies can retain an existing keyword-based (also called “lexical”) or semantic search system for the first-stage retrieval and integrate the Rerank endpoint in the second-stage reranking.

In this tutorial, you'll learn about:
- Reranking lexical/semantic search results
- Reranking semi-structured data
- Reranking tabular data
- Multilingual reranking

You'll learn these by building an onboarding assistant for new hires.

## Setup

To get started, first we need to install the `cohere` library and create a Cohere client.

In [39]:
# pip install cohere

#import cohere

#co = cohere.ClientV2(api_key="COHERE_API_KEY") # Get your free API key: https://dashboard.cohere.com/api-keys

## Reranking lexical/semantic search results

Rerank requires just a single line of code to implement.

Suppose we have a list of search results of an FAQ list, which can come from semantic, lexical, or any other types of search systems. But this list may not be optimally ranked for relevance to the user query.

This is where Rerank can help. We call the endpoint using `co.rerank()` and pass the following arguments:
- `query`: The user query
- `documents`: The list of documents
- `top_n`: The top reranked documents to select
- `model`: We choose Rerank English 3

In [20]:
# Define the documents
faqs_short = [
    {"text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."},
    {"text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."},
    {"text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."},
    {"text": "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year."}
]

In [21]:
# Add the user query
query = "Are there fitness-related perks?"

# Rerank the documents
results = co.rerank(query=query,
                    documents=faqs_short,
                    top_n=2,
                    model='rerank-english-v3.0')

print(results)



In [22]:
# Display the reranking results
def return_results(results, documents):
    for idx, result in enumerate(results.results):
        print(f"Rank: {idx+1}")
        print(f"Score: {result.relevance_score}")
        print(f"Document: {documents[result.index]}\n")

return_results(results, faqs_short)

Rank: 1
Score: 0.01798621
Document: {'text': 'Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance.'}

Rank: 2
Score: 8.463939e-06
Document: {'text': 'Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year.'}



Further reading:
- [Rerank endpoint API reference](https://docs.cohere.com/reference/rerank)
- [Documentation on Rerank](https://docs.cohere.com/docs/overview)
- [Documentation on Rerank fine-tuning](https://docs.cohere.com/docs/rerank-fine-tuning)
- [Documentation on Rerank best practices](https://docs.cohere.com/docs/reranking-best-practices)
- [LLM University module on Text Representation](https://cohere.com/llmu#text-representation)

## Reranking semi-structured data

The Rerank 3 model supports multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables. By setting the rank fields, you can select which fields the model should consider for reranking.

In the following example, we'll use an email data example. It is a semi-stuctured data that contains a number of fields – `from`, `to`, `date`, `subject`, and `text`.

Suppose the new hire now wants to search for any emails about check-in sessions. Let's pretend we have a list of 5 emails retrieved from the email provider's API.

To perform reranking over semi-structured data, we add an additional parameter, `rank_fields`, which contains the list of available fields.

The model will rerank based on order of the fields passed in. For example, given rank_fields=['title','author','text'], the model will rerank using the values in title, author, and text sequentially.

In [23]:
# Define the documents
emails = [
    {"from": "hr@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "A Warm Welcome to Co1t!", "text": "We are delighted to welcome you to the team! As you embark on your journey with us, you'll find attached an agenda to guide you through your first week."},
    {"from": "it@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "Setting Up Your IT Needs", "text": "Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts."},
    {"from": "john@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "First Week Check-In", "text": "Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!"}
]

In [24]:
# Add the user query
query = "Any email about check ins?"

# Rerank the documents
results = co.rerank(query=query,
                    documents=emails,
                    top_n=2,
                    model='rerank-english-v3.0',
                    rank_fields=["from", "to", "date", "subject", "body"]
                    )

return_results(results, emails)

TypeError: V2Client.rerank() got an unexpected keyword argument 'rank_fields'

## Reranking tabular data

Many enterprises rely on tabular data, such as relational databases, CSVs, and Excel. To perform reranking, you can transform a dataframe into a list of JSON records and use Rerank 3's JSON capabilities to rank them.

Here's an example of reranking a CSV file that contains employee information.

In [45]:
import pandas as pd
from io import StringIO

# Create a demo CSV file
data = """name,role,join_date,email,status
Rebecca Lee,Senior Software Engineer,2024-07-01,rebecca@co1t.com,Full-time
Emma Williams,Product Designer,2024-06-15,emma@co1t.com,Full-time
Michael Jones,Marketing Manager,2024-05-20,michael@co1t.com,Full-time
Amelia Thompson,Sales Representative,2024-05-20,amelia@co1t.com,Part-time
Ethan Davis,Product Designer,2024-05-25,ethan@co1t.com,Contractor"""
data_csv = StringIO(data)

# Load the CSV file
df = pd.read_csv(data_csv)
df.head()

Unnamed: 0,name,role,join_date,email,status
0,Rebecca Lee,Senior Software Engineer,2024-07-01,rebecca@co1t.com,Full-time
1,Emma Williams,Product Designer,2024-06-15,emma@co1t.com,Full-time
2,Michael Jones,Marketing Manager,2024-05-20,michael@co1t.com,Full-time
3,Amelia Thompson,Sales Representative,2024-05-20,amelia@co1t.com,Part-time
4,Ethan Davis,Product Designer,2024-05-25,ethan@co1t.com,Contractor


In [47]:
# Define the documents and rank fields
employees = df.to_dict('records')
rank_fields = df.columns.tolist()

# Add the user query
query = "Any full-time product designers who joined recently?"
'''
# Rerank the documents
results = co.rerank(query=query,
                    documents=employees,
                    top_n=1,
                    model='rerank-english-v3.0',
                    #rank_fields=rank_fields
                    )

return_results(results, employees)
'''

"\n# Rerank the documents\nresults = co.rerank(query=query,\n                    documents=employees,\n                    top_n=1,\n                    model='rerank-english-v3.0',\n                    #rank_fields=rank_fields\n                    )\n\nreturn_results(results, employees)\n"

## Multilingual reranking

The Rerank endpoint also supports multilingual semantic search via the `rerank-multilingual-...` models. This means you can perform semantic search on texts in different languages.

In the example below, we repeat the steps of performing reranking with one difference – changing the model type to a multilingual one. Here, we use the `rerank-multilingual-v3.0` model. Here, we are reranking the FAQ list using an Arabic query.

In [None]:
# Define the query
query = "هل هناك مزايا تتعلق باللياقة البدنية؟" # Are there fitness benefits?

# Rerank the documents
results = co.rerank(query=query,
                    documents=faqs_short,
                    top_n=2,
                    model='rerank-multilingual-v3.0')

return_results(results, faqs_short)

## Conclusion

In this tutorial, you learned about:
- How to rerank lexical/semantic search results
- How to rerank semi-structured data
- How to rerank tabular data
- How to perform Multilingual reranking

We have now seen two critical components of a powerful search system - semantic search, or dense retrieval (Part 4) and reranking (Part 5). These building blocks are essential for implementing RAG solutions.

In Part 6, you will learn how to implement RAG.

# RAG

The Chat endpoint provides comprehensive support for various text generation use cases, including retrieval-augmented generation (RAG).

While LLMs are good at maintaining the context of the conversation and generating responses, they can be prone to hallucinate and include factually incorrect or incomplete information in their responses.

RAG enables a model to access and utilize supplementary information from external documents, thereby improving the accuracy of its responses.

When using RAG with the Chat endpoint, these responses are backed by fine-grained citations linking to the source documents. This makes the responses easily verifiable.

In this tutorial, you'll learn about:
- Basic RAG
- Search query generation
- Retrieval with Embed
- Reranking with Rerank
- Response and citation generation

You'll learn these by building an onboarding assistant for new hires.

## Setup

To get started, first we need to install the `cohere` library and create a Cohere client.

In [25]:

# pip install cohere

import cohere
import numpy as np
import json
from typing import List

#co = cohere.ClientV2(api_key="COHERE_API_KEY") # Get your free API key: https://dashboard.cohere.com/api-keys

## Basic RAG


To see how RAG works, let's define the documents that the application has access to. We'll use a short list of documents consisting of internal FAQs about the fictitious company Co1t (in production, these documents are massive).

In this example, each document is a `data` object with one field, `text`. But we can define any number of fields we want, depending on the nature of the documents. For example, emails could contain `title` and `text` fields.

In [26]:
documents = [
  {
    "data": {
      "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
    }
  },
  {
    "data": {
      "text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."
    }
  },
  {
    "data": {
      "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
    }
  }
]

To call the Chat API with RAG, pass the following parameters at a minimum. This tells the model to run in RAG-mode and use these documents in its response.

- `model` for the model ID
- `messages` for the user's query.
- `documents` for defining the documents.

Let's create a query asking about the company's support for personal well-being, which is not going to be available to the model based on the data its trained on. It will need to use external documents.

RAG introduces additional objects in the Chat response. One of them is `citations`, which contains details about:
- specific text spans from the retrieved documents on which the response is grounded.
- the documents referenced in the citations.

In [28]:
# Add the user query
query = "Are there health benefits?"

# Generate the response
response = co.chat(model="command-r-plus-08-2024",
                   messages=[{'role': 'user', 'content': query}],
                   documents=documents
                   )



# Display the response
print(response.message.content[0].text)

# Display the citations and source documents
if response.message.citations:
    print("\nCITATIONS:")
    for citation in response.message.citations:
        print(citation, "\n")

Yes, there are health benefits. We offer gym memberships, on-site yoga classes, and comprehensive health insurance.

CITATIONS:
start=41 end=115 text='gym memberships, on-site yoga classes, and comprehensive health insurance.' sources=[DocumentSource(type='document', id='doc:2', document={'id': 'doc:2', 'text': 'Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance.'})] 



## Search query generation

The previous example showed how to get started with RAG, and in particular, the augmented generation portion of RAG. But as its name implies, RAG consists of other steps, such as retrieval.

In a basic RAG application, the steps involved are:

- Transforming the user message into search queries
- Retrieving relevant documents for a given search query
- Generating the response and citations

Let's now look at the first step—search query generation. The chatbot needs to generate an optimal set of search queries to use for retrieval.

There are different possible approaches to this. In this example, we'll take a [tool use](v2/docs/tool-use) approach.

Here, we build a tool that takes a user query and returns a list of relevant document snippets for that query. The tool can generate zero, one or multiple search queries depending on the user query.

In [29]:
def generate_search_queries(message: str) -> List[str]:

    # Define the query generation tool
    query_gen_tool = [
        {
            "type": "function",
            "function": {
                "name": "internet_search",
                "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "queries": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": "a list of queries to search the internet with.",
                        }
                    },
                    "required": ["queries"],
                },
            },
        }
    ]


    # Define a preamble to optimize search query generation
    instructions = "Write a search query that will find helpful information for answering the user's question accurately. If you need more than one search query, write a list of search queries. If you decide that a search is very unlikely to find information that would be useful in constructing a response to the user, you should instead directly answer."

    # Generate search queries (if any)
    search_queries = []

    res = co.chat(
        model="command-r-08-2024",
        messages=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": message},
        ],
        tools=query_gen_tool
    )

    if res.message.tool_calls:
        for tc in res.message.tool_calls:
            queries = json.loads(tc.function.arguments)["queries"]
            search_queries.extend(queries)

    return search_queries

In the example above, the tool breaks down the user message into two separate queries.

In [30]:
query = "How to stay connected with the company, and do you organize team events?"
queries_for_search = generate_search_queries(query)
print(queries_for_search)

['how to stay connected with the company', 'do companies organise team events']


And in the example below, the tool decides that one query is sufficient.

In [31]:
query = "How flexible are the working hours"
queries_for_search = generate_search_queries(query)
print(queries_for_search)

['How flexible are the working hours at Cohere?']


And in the example below, the tool decides that no retrieval is needed to answer the query.

In [32]:
query = "What is 2 + 2"
queries_for_search = generate_search_queries(query)
print(queries_for_search)

[]


## Retrieval with Embed

Given the search query, we need a way to retrieve the most relevant documents from a large collection of documents.

This is where we can leverage text embeddings through the Embed endpoint. It enables semantic search, which lets us to compare the semantic meaning of the documents and the query. It solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles at capturing the context or meaning of a piece of text.

The Embed endpoint takes in texts as input and returns embeddings as output.

First, we need to embed the documents to search from. We call the Embed endpoint using `co.embed()` and pass the following arguments:

- `model`: Here we choose `embed-english-v3.0`, which generates embeddings of size 1024
- `input_type`: We choose `search_document` to ensure the model treats these as the documents (instead of the query) for search
- `texts`: The list of texts (the FAQs)

In [33]:
# Define the documents
faqs_long = [
    {
        "data": {
            "text": "Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged."
        }
    },
    {
        "data": {
            "text": "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee."
        }
    },
    {
        "data": {
            "text": "Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!"
        }
    },
    {
        "data": {
            "text": "Working Hours Flexibility: We prioritize work-life balance. While our core hours are 9 AM to 5 PM, we offer flexibility to adjust as needed."
        }
    },
    {
        "data": {
            "text": "Side Projects Policy: We encourage you to pursue your passions. Just be mindful of any potential conflicts of interest with our business."
        }
    },
    {
        "data": {
            "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
        }
    },
    {
        "data": {
            "text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."
        }
    },
    {
        "data": {
            "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
        }
    },
    {
        "data": {
            "text": "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year."
        }
    },
    {
        "data": {
            "text": "Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead."
        }
    },
]

# Embed the documents
doc_emb = co.embed(
            model="embed-english-v3.0",
            input_type="search_document",
            texts=[doc['data']['text'] for doc in faqs_long],
            embedding_types=["float"]).embeddings.float

Next, we add a query, which asks about how to get to know the team.

We choose `search_query` as the `input_type` to ensure the model treats this as the query (instead of the documents) for search.

In [34]:
# Add the user query
query = "How to get to know my teammates"

# Generate the search query
# Note: For simplicity, we are assuming only one query generated. For actual implementations, you will need to perform search for each query.
queries_for_search = generate_search_queries(query)[0]
print("Search query: ", queries_for_search)

# Embed the search query
query_emb = co.embed(
    model="embed-english-v3.0",
    input_type="search_query",
    texts=[queries_for_search],
    embedding_types=["float"]).embeddings.float

Search query:  how to get to know your teammates


Now, we want to search for the most relevant documents to the query. For this, we make use of the `numpy` library to compute the similarity between each query-document pair using the dot product approach.

Each query-document pair returns a score, which represents how similar the pair are. We then sort these scores in descending order and select the top most similar pairs, which we choose 5 (this is an arbitrary choice, you can choose any number).

Here, we show the most relevant documents with their similarity scores.

In [35]:
# Compute dot product similarity and display results
n = 5
scores = np.dot(query_emb, np.transpose(doc_emb))[0]
max_idx = np.argsort(-scores)[:n]

retrieved_documents = [faqs_long[item] for item in max_idx]

for rank, idx in enumerate(max_idx):
    print(f"Rank: {rank+1}")
    print(f"Score: {scores[idx]}")
    print(f"Document: {retrieved_documents[rank]}\n")

Rank: 1
Score: 0.3261866441412288
Document: {'data': {'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'}}

Rank: 2
Score: 0.2683765327379585
Document: {'data': {'text': 'Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead.'}}

Rank: 3
Score: 0.2578081147038258
Document: {'data': {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}}

Rank: 4
Score: 0.18593656147295465
Document: {'data': {'text': "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee."}}

Rank: 5
Score: 0.1297671494011921
Document: {'data': {'text': 'Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive hea

Reranking can boost the results from semantic or lexical search further. The Rerank endpoint takes a list of search results and reranks them according to the most relevant documents to a query. This requires just a single line of code to implement.

We call the endpoint using `co.rerank()` and pass the following arguments:

- `query`: The user query
- `documents`: The list of documents we get from the semantic search results
- `top_n`: The top reranked documents to select
- `model`: We choose Rerank English 3

Looking at the results, we see that since the query is about getting to know the team, the document that talks about joining Slack channels is now ranked higher (1st) compared to earlier (3rd).

Here we select `top_n` to be 2, which will be the documents we will pass next for response generation.

In [36]:
# Rerank the documents
results = co.rerank(query=queries_for_search,
                    documents=[doc['data']['text'] for doc in retrieved_documents],
                    top_n=2,
                    model='rerank-english-v3.0')

# Display the reranking results
for idx, result in enumerate(results.results):
    print(f"Rank: {idx+1}")
    print(f"Score: {result.relevance_score}")
    print(f"Document: {retrieved_documents[result.index]}\n")

reranked_documents = [retrieved_documents[result.index] for result in results.results]

Rank: 1
Score: 0.0040072887
Document: {'data': {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}}

Rank: 2
Score: 0.0020829707
Document: {'data': {'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'}}



Finally we reach the step that we saw in the earlier `Basic RAG` section.

To call the Chat API with RAG, we pass the following parameters. This tells the model to run in RAG-mode and use these documents in its response.

- `model` for the model ID
- `messages` for the user's query.
- `documents` for defining the documents.

The response is then generated based on the the query and the documents retrieved.

RAG introduces additional objects in the Chat response. One of them is `citations`, which contains details about:
- specific text spans from the retrieved documents on which the response is grounded.
- the documents referenced in the citations.

In [37]:
# Generate the response
response = co.chat(model="command-r-plus-08-2024",
                   messages=[{'role': 'user', 'content': query}],
                   documents=reranked_documents)

# Display the response
print(response.message.content[0].text)

# Display the citations and source documents
if response.message.citations:
    print("\nCITATIONS:")
    for citation in response.message.citations:
        print(citation, "\n")

You can get to know your teammates by joining Slack channels and participating in team-building activities. You will receive an invite via email to join relevant Slack channels. You can also foster team spirit by participating in monthly outings and weekly game nights.

CITATIONS:
start=38 end=60 text='joining Slack channels' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'})] 

start=82 end=107 text='team-building activities.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'})] 

start=117 end=144 text='receive an invite via email' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Joining Slack Channels: You 

# Agents with Tool Use

Tool use extends the ideas from RAG, where external systems are used to guide the response of an LLM, but by leveraging a much bigger set of tools than what’s possible with RAG. The concept of tool use leverages LLMs' useful feature of being able to act as a reasoning and decision-making engine.

While RAG enables applications that can *answer questions*, tool use enables those that can *automate tasks*.

Tool use also enables developers to build agentic applications that can take actions, that is, doing both read and write operations on an external system.

In this tutorial, you'll learn about:
- Creating tools
- Tool planning and calling
- Tool execution
- Response and citation generation
- Multi-step tool use

You'll learn these by building an onboarding assistant for new hires.

## Setup

To get started, first we need to install the `cohere` library and create a Cohere client.

In [39]:

# pip install cohere

import cohere
import json
import os

co = cohere.ClientV2(api_key="1GLAqzgbTVGbUTfUeblk3ByYtQyN0fjvHLbs5fPr") # Get your free API key: https://dashboard.cohere.com/api-keys

## Creating tools

The pre-requisite, before we can run a tool use workflow, is to set up the tools. Let's create three tools:
- `search_faqs`: A tool for searching the FAQs. For simplicity, we'll not implement any retrieval logic, but we'll simply pass a list of pre-defined documents, which are the FAQ documents we had used in the Text Embeddings section.
- `search_emails`: A tool for searching the emails. Same as above, we'll simply pass a list of pre-defined emails from the Reranking section.
- `create_calendar_event`: A tool for creating new calendar events. Again, for simplicity, we'll not implement actual event bookings, but will return a mock success event. In practice, we can connect to a calendar service API and implement all the necessary logic here.

Here, we are defining a Python function for each tool, but more broadly, the tool can be any function or service that can receive and send objects.

In [41]:
# Create the tools
def search_faqs(query):
    faqs = [
        {"text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."},
        {"text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."}
    ]
    return  faqs

def search_emails(query):
    emails = [
        {"from": "it@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "Setting Up Your IT Needs", "text": "Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts."},
        {"from": "john@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "First Week Check-In", "text": "Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!"}
    ]
    return  emails

def create_calendar_event(date: str, time: str, duration: int):
    # You can implement any logic here
    return {"is_success": True,
            "message": f"Created a {duration} hour long event at {time} on {date}"}

functions_map = {
    "search_faqs": search_faqs,
    "search_emails": search_emails,
    "create_calendar_event": create_calendar_event
}

The second and final setup step is to define the tool schemas in a format that can be passed to the Chat endpoint. The schema must contain the following fields: `name`, `description`, and `parameters` in the format shown below.

This schema informs the LLM about what the tool does, and the LLM decides whether to use a particular tool based on it. Therefore, the more descriptive and specific the schema, the more likely the LLM will make the right tool call decisions.

Further reading:
- [Documentation on parameter types in tool use](https://docs.cohere.com/v2/docs/parameter-types-in-tool-use)

In [40]:
# Define the tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_faqs",
            "description": "Given a user query, searches a company's frequently asked questions (FAQs) list and returns the most relevant matches to the query.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The query from the user"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_emails",
            "description": "Given a user query, searches a person's emails and returns the most relevant matches to the query.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The query from the user"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "create_calendar_event",
            "description": "Creates a new calendar event of the specified duration at the specified time and date. A new event cannot be created on the same time as an existing event.",
            "parameters": {
                "type": "object",
                "properties": {
                    "date": {
                        "type": "string",
                        "description": "the date on which the event starts, formatted as mm/dd/yy"
                    },
                    "time": {
                        "type": "string",
                        "description": "the time of the event, formatted using 24h military time formatting"
                    },
                    "duration": {
                        "type": "number",
                        "description": "the number of hours the event lasts for"
                    }
                },
                "required": ["date", "time", "duration"]
            }
        }
    }
]

## Tool planning and calling

We can now run the tool use workflow. We can think of a tool use system as consisting of four components:
- The user
- The application
- The LLM
- The tools

At its most basic, these four components interact in a workflow through four steps:
- **Step 1: Get user message** – The LLM gets the user message (via the application)
- **Step 2: Tool planning and calling** – The LLM makes a decision on the tools to call (if any) and generates - the tool calls
- **Step 3: Tool execution** - The application executes the tools and the results are sent to the LLM
- **Step 4: Response and citation generation** – The LLM generates the response and citations to back to the user

In [42]:
# Create custom system message
system_message = """## Task and Context
You are an assistant who assist new employees of Co1t with their first week. You respond to their questions and assist them with their needs. Today is Monday, June 24, 2024"""


# Step 1: Get user message
message = "Is there any message about getting setup with IT?"

# Add the system and user messages to the chat history
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": message},
]

# Step 2: Tool planning and calling
response = co.chat(model="command-r-plus-08-2024", messages=messages, tools=tools)

if response.message.tool_calls:
    print("Tool plan:")
    print(response.message.tool_plan, "\n")
    print("Tool calls:")
    for tc in response.message.tool_calls:
        print(f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}")

    # Append tool calling details to the chat history
    messages.append(
        {
            "role": "assistant",
            "tool_calls": response.message.tool_calls,
            "tool_plan": response.message.tool_plan,
        }
    )

Tool plan:
I will search the emails for messages about getting set up with IT. 

Tool calls:
Tool name: search_emails | Parameters: {"query":"getting setup with IT"}


Given three tools to choose from, the model is able to pick the right tool (in this case, `search_emails`) based on what the user is asking for.

Also, notice that the model first generates a plan about what it should do ("I will do ...") before actually generating the tool call(s).

# Tool execution

In [43]:
# Step 3: Tool execution
for tc in response.message.tool_calls:
    tool_result = functions_map[tc.function.name](**json.loads(tc.function.arguments))
    tool_content = []
    for idx, data in enumerate(tool_result):
        tool_content.append({"type": "document", "document": {"data": json.dumps(data)}})
        # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated
    # Append tool results to the chat history
    messages.append({"role": "tool", "tool_call_id": tc.id, "content": tool_content})

print("Tool results:")
for result in tool_content:
    print(result)

Tool results:
{'type': 'document', 'document': {'data': '{"from": "it@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "Setting Up Your IT Needs", "text": "Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts."}'}}
{'type': 'document', 'document': {'data': '{"from": "john@co1t.com", "to": "david@co1t.com", "date": "2024-06-24", "subject": "First Week Check-In", "text": "Hello! I hope you\'re settling in well. Let\'s connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon\\u2014it\'s a great opportunity to get to know your colleagues!"}'}}


## Response and citation generation

In [44]:
# Step 4: Response and citation generation
response = co.chat(
    model="command-r-plus-08-2024",
    messages=messages,
    tools=tools
)

# Append assistant response to the chat history
messages.append({"role": "assistant", "content": response.message.content[0].text})

# Print final response
print("Response:")
print(response.message.content[0].text)
print("="*50)

# Print citations (if any)
if response.message.citations:
    print("\nCITATIONS:")
    for citation in response.message.citations:
        print(citation, "\n")

Response:
Yes, there is an email from it@co1t.com with the subject 'Setting Up Your IT Needs'. It includes a comprehensive guide to setting up your work accounts.

CITATIONS:
start=28 end=39 text='it@co1t.com' sources=[ToolSource(type='tool', id='search_emails_r142nnch4wed:0', tool_output={'date': '2024-06-24', 'from': 'it@co1t.com', 'subject': 'Setting Up Your IT Needs', 'text': 'Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts.', 'to': 'david@co1t.com'})] 

start=57 end=83 text="'Setting Up Your IT Needs'" sources=[ToolSource(type='tool', id='search_emails_r142nnch4wed:0', tool_output={'date': '2024-06-24', 'from': 'it@co1t.com', 'subject': 'Setting Up Your IT Needs', 'text': 'Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts.', 'to': 'david@co1t.com'})] 

start=99 end=152 text='comprehensi

# Multi-step tool use

The model can execute more complex tasks in tool use – tasks that require tool calls to happen in a sequence. This is referred to as "multi-step" tool use.

Let's create a function to called `run_assistant` to implement these steps, and along the way, print out the key events and messages. Optionally, this function also accepts the chat history as an argument to keep the state in a multi-turn conversation.

In [45]:
model = "command-r-plus-08-2024"

system_message = """## Task and Context
You are an assistant who assists new employees of Co1t with their first week. You respond to their questions and assist them with their needs. Today is Monday, June 24, 2024"""


def run_assistant(query, messages=None):
    if messages is None:
        messages = []

    if "system" not in {m.get("role") for m in messages}:
        messages.append({"role": "system", "content": system_message})

    # Step 1: get user message
    print(f"Question:\n{query}")
    print("=" * 50)

    messages.append({"role": "user", "content": query})

    # Step 2: Generate tool calls (if any)
    response = co.chat(model=model, messages=messages, tools=tools)

    while response.message.tool_calls:

        print("Tool plan:")
        print(response.message.tool_plan, "\n")
        print("Tool calls:")
        for tc in response.message.tool_calls:
            print(
                f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}"
            )
        print("=" * 50)

        messages.append(
            {
                "role": "assistant",
                "tool_calls": response.message.tool_calls,
                "tool_plan": response.message.tool_plan,
            }
        )

        # Step 3: Get tool results
        for idx, tc in enumerate(response.message.tool_calls):
            tool_result = functions_map[tc.function.name](
                **json.loads(tc.function.arguments)
            )
            tool_content = []
            for idx, data in enumerate(tool_result):
                tool_content.append({"type": "document", "document": {"data": json.dumps(data)}})
                # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated
            messages.append(
                {"role": "tool", "tool_call_id": tc.id, "content": tool_content}
            )

        # Step 4: Generate response and citations
        response = co.chat(model=model, messages=messages, tools=tools)

    messages.append({"role": "assistant", "content": response.message.content[0].text})

    # Print final response
    print("Response:")
    print(response.message.content[0].text)
    print("=" * 50)

    # Print citations (if any)
    if response.message.citations:
        print("\nCITATIONS:")
        for citation in response.message.citations:
            print(citation, "\n")

    return messages

To illustrate the concept of multi-step tool user, let's ask the assistant to block time for any lunch invites received in the email.

This requires tasks to happen over multiple steps in a sequence. Here, we see the assistant running these steps:
- First, it calls the `search_emails` tool to find any lunch invites, which it found one.
- Next, it calls the `create_calendar_event` tool to create an event to block the person's calendar on the day mentioned by the email.

This is also an example of tool use enabling a write operation instead of just a read operation that we saw with RAG.

In [46]:
messages = run_assistant("Can you check if there are any lunch invites, and for those days, create a one-hour event on my calendar at 12PM.")

Question:
Can you check if there are any lunch invites, and for those days, create a one-hour event on my calendar at 12PM.
Tool plan:
I will search the user's emails for lunch invites. Then, I will create a one-hour event on the user's calendar at 12PM for each day that the user has a lunch invite. 

Tool calls:
Tool name: search_emails | Parameters: {"query":"lunch invites"}
Tool plan:
I found one lunch invite for Thursday at noon. I will now create a one-hour event on the user's calendar for Thursday at 12PM. 

Tool calls:
Tool name: create_calendar_event | Parameters: {"date":"06/27/24","duration":1,"time":"12:00"}
Response:
I found one lunch invite for Thursday at noon. I have created a one-hour event on your calendar for Thursday at 12PM.

CITATIONS:
start=29 end=46 text='Thursday at noon.' sources=[ToolSource(type='tool', id='search_emails_zdas36mjahhr:1', tool_output={'date': '2024-06-24', 'from': 'john@co1t.com', 'subject': 'First Week Check-In', 'text': "Hello! I hope you're 

In this tutorial, you learned about:
- How to create tools
- How tool planning and calling happens
- How tool execution happens
- How to generate the response and citations
- How to run tool use in a multi-step scenario

And that concludes our 7-part Cohere tutorial. We hope that they have provided you with a foundational understanding of the Cohere API, the available models and endpoints, and the types of use cases that you can build with them.

To continue your learning, check out:
- [LLM University - A range of courses and step-by-step guides to help you start building](https://cohere.com/llmu)
- [Cookbooks - A collection of basic to advanced example applications](https://docs.cohere.com/page/cookbooks)
- [Cohere's documentation](https://docs.cohere.com/docs/the-cohere-platform)
- [The Cohere API reference](https://docs.cohere.com/reference/about)