##### Copyright 2025 Google LLC.

In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Search re-ranking using Gemini embeddings

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

This notebook demonstrates the use of embeddings to re-rank search results. This walkthrough will focus on the following objectives:



1.   Setting up your development environment and API access to use Gemini.
2.   Using Gemini's function calling support to access the Wikipedia API.
3.   Embedding content via Gemini API.
4.   Re-ranking the search results.


This is how you will implement search re-ranking:


1.   The user will make a search query.
2.   You will use Wikipedia API to return the relevant search results.
3.   The search results will be embedded and their relevance will be evaluated by calculating distance metrics like cosine similarity.
4.   The most relevant search result will be returned as the final answer.

> The non-source code materials in this notebook are licensed under Creative Commons - Attribution-ShareAlike CC-BY-SA 4.0, https://creativecommons.org/licenses/by-sa/4.0/legalcode.

## Setup


In [2]:
%pip install -q -U "google-genai>=1.0.0"

In [3]:
%pip install -q wikipedia

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone


Note: The [`wikipedia` package](https://pypi.org/project/wikipedia/) notes that it was "designed for ease of use and simplicity, not for advanced use", and that production or heavy use should instead "use [Pywikipediabot](http://www.mediawiki.org/wiki/Manual:Pywikipediabot) or one of the other more advanced [Python MediaWiki API wrappers](http://en.wikipedia.org/wiki/Wikipedia:Creating_a_bot#Python)".

In [4]:
import json
import textwrap

from google import genai
from google.genai import types

import wikipedia
from wikipedia.exceptions import DisambiguationError, PageError

import numpy as np

from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see the [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) quickstart for an example.

In [5]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

# Initialize the client with your API key
client = genai.Client(api_key=GOOGLE_API_KEY)

As stated earlier, this tutorial uses Gemini's function calling support to access the Wikipedia API. Please refer to the [docs](https://ai.google.dev/docs/function_calling) to learn more about function calling.

### Define the search function

To cater to the search engine needs, you will design this function in the following way:


*   For each search query, the search engine will use the `wikipedia.search` method to get relevant topics.
*   From the relevant topics, the engine will choose `n_topics(int)` top candidates and will use `gemini-2.0-flash` to extract relevant information from the page.
*   The engine will avoid duplicate entries by maintaining a search history.


In [18]:
def wikipedia_search(search_queries: list[str]) -> list[str]:
  """Search wikipedia for each query and summarize relevant docs."""
  n_topics=3
  search_history = set() # tracking search history
  search_urls = []
  summary_results = []

  for query in search_queries:
    print(f'Searching for "{query}"')
    search_terms = wikipedia.search(query)

    print(f"Related search terms: {search_terms[:n_topics]}")
    for search_term in search_terms[:n_topics]: # select first `n_topics` candidates
      if search_term in search_history: # check if the topic is already covered
        continue

      print(f'Fetching page: "{search_term}"')
      search_history.add(search_term) # add to search history

      try:
        # extract the relevant data using the client with gemini-2.0-flash model
        page = wikipedia.page(search_term, auto_suggest=False)
        url = page.url
        print(f"Information Source: {url}")
        search_urls.append(url)
        page = page.content

        # Using the client to generate content with the new SDK
        response = client.models.generate_content(
            model='gemini-2.0-flash',
            contents=textwrap.dedent(f"""\
                Extract relevant information
                about user's query: {query}
                From this source:

                {page}

                Note: Do not summarize. Only Extract and return the relevant information
            """)
        )

        urls = [url]
        # In the new SDK, citation metadata handling has changed
        # Check if response has citation metadata and handle it properly
        try:
          if response.candidates and response.candidates[0].citation_metadata:
            # In the new SDK, citation sources may be accessed differently
            # Check the structure of the citation_metadata
            if hasattr(response.candidates[0].citation_metadata, 'citation_sources'):
              extra_citations = response.candidates[0].citation_metadata.citation_sources
              extra_urls = [source.url for source in extra_citations]
              urls.extend(extra_urls)
              search_urls.extend(extra_urls)
              print("Additional citations:", response.candidates[0].citation_metadata.citation_sources)
        except AttributeError as e:
          print(f"Error processing citation metadata: {str(e)}")
          # Continue processing without citations
          pass

        try:
          text = response.text
        except (ValueError, AttributeError) as e:
          print(f"Error processing response text: {str(e)}")
          pass
        else:
          summary_results.append(text + "\n\nBased on:\n  " + ',\n  '.join(urls))

      except DisambiguationError:
        print(f"""Results when searching for "{search_term}" (originally for "{query}")
        were ambiguous, hence skipping""")
        continue

      except PageError:
        print(f'{search_term} did not match with any page id, hence skipping.')
        continue

      except Exception as e:
        print(f'Error processing {search_term}: {str(e)}')
        continue

  print(f"Information Sources:")
  for url in search_urls:
    print('    ', url)

  return summary_results



In [22]:
example = wikipedia_search(["What are LLMs?"])

Searching for "What are LLMs?"
Related search terms: ['Large language model', 'Retrieval-augmented generation', 'DeepSeek']
Fetching page: "Large language model"
Information Source: https://en.wikipedia.org/wiki/Large_language_model
Fetching page: "Retrieval-augmented generation"
Information Source: https://en.wikipedia.org/wiki/Retrieval-augmented_generation
Fetching page: "DeepSeek"
Information Source: https://en.wikipedia.org/wiki/DeepSeek
Information Sources:
     https://en.wikipedia.org/wiki/Large_language_model
     https://en.wikipedia.org/wiki/Retrieval-augmented_generation
     https://en.wikipedia.org/wiki/DeepSeek


Here is what the search results look like:

In [20]:
from IPython.display import display

for e in example:
  display(to_markdown(e))

> LLMs:
> 
> *   Are a type of machine learning model designed for natural language processing tasks like language generation.
> *   Are language models with many parameters.
> *   Are trained with self-supervised learning on a vast amount of text.
> *   The largest and most capable LLMs are generative pretrained transformers (GPTs).
> *   Modern models can be fine-tuned for specific tasks or guided by prompt engineering.
> *   These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in.
> 
> 
> Based on:
>   https://en.wikipedia.org/wiki/Large_language_model

> According to the provided source:
> 
> *   RAG modifies interactions with a large language model (LLM) so that the model responds to user queries with reference to a specified set of documents, using this information to supplement information from its pre-existing training data.
> *   RAG improves large language models (LLMs) by incorporating information retrieval before generating responses.
> *   LLMs rely on static training data.
> *   RAG allows large language models (LLMs) to retrieve and incorporate additional information before generating responses.
> *   RAG enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set.
> 
> 
> Based on:
>   https://en.wikipedia.org/wiki/Retrieval-augmented_generation

> Based on the provided text, here's the extracted information about what LLMs (Large Language Models) are, specifically in the context of DeepSeek:
> 
> *   DeepSeek is a Chinese artificial intelligence company that develops large language models (LLMs).
> *   DeepSeek-R1 provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1.
> *   LLMs developed by DeepSeek are referred to as "open weight," meaning the exact parameters are openly shared, although certain usage conditions differ from typical open-source software.
> *   DeepSeek has created several series of LLMs, including DeepSeek-LLM, DeepSeek Coder, DeepSeek-MoE, DeepSeek-Math, V2, V3, and R1.
> *   DeepSeek significantly reduced training expenses for their R1 model by incorporating techniques such as mixture of experts (MoE) layers.
> *   DeepSeek models are transformer based, using decoder-only transformers.
> *   DeepSeek models incorporate techniques such as multi-head latent attention (MLA), Mixture of Experts (MoE), and KV caching.
> 
> 
> Based on:
>   https://en.wikipedia.org/wiki/DeepSeek

### Pass the tools to the model

In the new Google GenAI SDK v1.0.0+, function calling is handled by registering your tools with the client and then providing them when sending messages. The client will handle the extraction of schema from function signatures, and the model may return a function call object when it wants to call the function.

The functions need to have proper type annotations for parameters and return values to work correctly.

## Generate supporting search queries

In order to have multiple supporting search queries to the user's original query, you will ask the model to generate more such queries. This would help the engine to cover the asked question on comprehensive levels.

In [28]:
instructions = """You have access to the Wikipedia API which you will be using
to answer a user's query. Your job is to generate a list of search queries which
might answer a user's question. Be creative by using various key-phrases from
the user's query. To generate variety of queries, ask questions which are
related to  the user's query that might help to find the answer. The more
queries you generate the better are the odds of you finding the correct answer.
Here is an example:

user: Tell me about Cricket World cup 2023 winners.

function_call: wikipedia_search(['What is the name of the team that
won the Cricket World Cup 2023?', 'Who was the captain of the Cricket World Cup
2023 winning team?', 'Which country hosted the Cricket World Cup 2023?', 'What
was the venue of the Cricket World Cup 2023 final match?', 'Cricket World cup 2023',
'Who lifted the Cricket World Cup 2023 trophy?'])

The search function will return a list of article summaries, use these to
answer the  user's question.

Here is the user's query: {query}
"""

In order to yield creative and a more random variety of questions, you will set the model's temperature parameter to a value higher. Values can range from [0.0,1.0], inclusive. A value closer to 1.0 will produce responses that are more varied and creative, while a value closer to 0.0 will typically result in more straightforward responses from the model.

## Enable automatic function calling and call the API

Now start a new chat with `enable_automatic_function_calling=True`. With it enabled, the `genai.ChatSession` will handle the back and forth required to call the function, and return the final response:

In [29]:
# # Create a chat session with the client with configuration
# Attach your function tool here in a GenerateContentConfig
chat = client.chats.create(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        temperature=0.6,
        tools=[wikipedia_search],
    )
)

query = "Explain how deep-sea life survives."

# Now you can just send the message; the chat session will handle
res = chat.send_message(
    message=instructions.format(query=query)
)

Searching for "deep sea life adaptations"
Related search terms: ['Deep sea', 'Deep-sea fish', 'Deep-sea gigantism']
Fetching page: "Deep sea"
Information Source: https://en.wikipedia.org/wiki/Deep_sea
Fetching page: "Deep-sea fish"
Information Source: https://en.wikipedia.org/wiki/Deep-sea_fish
Fetching page: "Deep-sea gigantism"
Information Source: https://en.wikipedia.org/wiki/Deep-sea_gigantism
Searching for "deep sea food sources"
Related search terms: ['Deep-sea fish', 'Deep sea', 'Deep-sea wood']
Fetching page: "Deep-sea wood"
Information Source: https://en.wikipedia.org/wiki/Deep-sea_wood
Searching for "deep sea environment"
Related search terms: ['Deep-sea fish', 'Deep-sea gigantism', 'Deep sea mining']
Fetching page: "Deep sea mining"
Information Source: https://en.wikipedia.org/wiki/Deep_sea_mining
Searching for "hydrothermal vent ecosystems"
Related search terms: ['Hydrothermal vent', 'Hydrothermal vent microbial communities', "Loki's Castle"]
Fetching page: "Hydrothermal ve

In [30]:
to_markdown(res.text)

> Deep-sea life survives in a challenging environment characterized by darkness, low temperatures, and immense pressure. Here's a breakdown of their survival mechanisms:
> 
> 1.  **Adaptations to Darkness:**
> 
>     *   Many deep-sea creatures are blind and rely on senses other than sight, such as pressure and smell.
>     *   Some have developed large, sensitive eyes to detect the faintest bioluminescent light.
>     *   Bioluminescence, the production of light, is common and used for various purposes like hunting, attracting prey, communication, and camouflage.
> 
> 2.  **Adaptations to Pressure:**
> 
>     *   Organisms have internal pressure that matches the external pressure to avoid being crushed.
>     *   Their cell membranes have a higher proportion of unsaturated fatty acids to maintain fluidity under pressure.
>     *   They have also evolved structural modifications in their proteins to withstand high pressure.
> 
> 3.  **Finding Food:**
> 
>     *   Food is scarce, so deep-sea life relies on:
>         *   **Marine snow:** Organic material that sinks from upper waters.
>         *   **Scavenging:** Feeding on dead organisms.
>         *   **Predation:** Hunting other deep-sea creatures.
>         *   **Hydrothermal vents and cold seeps:** Some communities thrive around these features, utilizing chemosynthesis.
>     *   Chemosynthesis is a process where bacteria use chemicals like hydrogen sulfide to produce organic material, forming the base of the food chain in these ecosystems.
>     *   Some fish have evolved features like long feelers, bioluminescent lures, sharp teeth, and expandable bodies to help them find and consume prey.
> 
> 4.  **Other Adaptations:**
> 
>     *   Many deep-sea fish have jelly-like flesh and minimal bone structure for buoyancy and to withstand pressure.
>     *   Some have high fat content and accumulate water to reduce tissue density.
>     *   Deep-sea gigantism, the tendency for deep-sea animals to grow larger than their shallow-water relatives, is observed in some species due to factors like colder temperatures, food scarcity, and reduced predation.


Check for additional citations:

In [31]:
res.candidates[0].citation_metadata or 'No citations found'

'No citations found'

That looks like it worked. You can go through the chat history to see the details of what was sent and received in the function calls:

In [34]:
for content in chat.get_history():
  print(f'{content.role} -> ')

  if hasattr(content, 'parts') and content.parts:
    part = content.parts[0]
    if hasattr(part, 'function_call') and part.function_call:
      print(f"Function Call: {part.function_call.name}")
      print(f"Arguments: {json.dumps(part.function_call.args, indent=2)}")
    elif hasattr(part, 'function_response') and part.function_response:
      print(f"Function Response: {part.function_response.name}")
      print(f"Response: {json.dumps(part.function_response.response, indent=2)}")
    else:
      print(content.parts[0].text if hasattr(content.parts[0], 'text') else content.parts[0])
  print('---' * 20)


user -> 
You have access to the Wikipedia API which you will be using
to answer a user's query. Your job is to generate a list of search queries which
might answer a user's question. Be creative by using various key-phrases from
the user's query. To generate variety of queries, ask questions which are
related to  the user's query that might help to find the answer. The more
queries you generate the better are the odds of you finding the correct answer.
Here is an example:

user: Tell me about Cricket World cup 2023 winners.

function_call: wikipedia_search(['What is the name of the team that
won the Cricket World Cup 2023?', 'Who was the captain of the Cricket World Cup
2023 winning team?', 'Which country hosted the Cricket World Cup 2023?', 'What
was the venue of the Cricket World Cup 2023 final match?', 'Cricket World cup 2023',
'Who lifted the Cricket World Cup 2023 trophy?'])

The search function will return a list of article summaries, use these to
answer the  user's question.

He

In the chat history you can see all the steps of the conversation:

1. The user sent the query.
2. The model replied with a function call to `wikipedia_search` with a number of relevant searches.
3. Because you configured the tool and provided it when sending the message, the client handled executing the search function and returning the list of article summaries to the model.
4. Following the instructions in the prompt, the model generated a final answer based on those summaries.


## [Optional] Manually execute the function call

If you want to understand what happened behind the scenes, this section executes the `FunctionCall` manually to demonstrate.

In [35]:
# Create a new chat session without automatic function calling
chat = client.chats.create(
    model='gemini-2.0-flash',
    config=types.GenerationConfig(temperature=0.6)
)


In [36]:
# Send a message to the chat without automatic function calling
result = chat.send_message(
    message=instructions.format(query=query),
    config=types.GenerateContentConfig(
        tools=[wikipedia_search],
        automatic_function_calling={'disable': True},
    )
)

Initially the model returns a FunctionCall:

In [None]:
# Extract the function call from the response
if result.candidates and len(result.candidates) > 0 and result.candidates[0].content.parts:
    function_call = result.candidates[0].content.parts[0].function_call
    if function_call:
        print(json.dumps({
            "name": function_call.name,
            "args": function_call.args
        }, indent=2))
    else:
        print("No function call found in the response.")
else:
    print("Unexpected response format.")

In [None]:
# Access the function name
if 'name' in locals() and function_call and function_call.name:
    print(function_call.name)
else:
    print("Function name not available")

Call the function with generated arguments to get the results.

In [None]:
# Execute the function with the arguments from the function call
if function_call and hasattr(function_call, 'args'):
    summaries = wikipedia_search(**function_call.args)
else:
    print("Could not execute function, missing function call or arguments.")
    summaries = []

Now send the `FunctionResult` to the model.

In [None]:
# Create a function response message
response = chat.send_message(
    message=types.Content(
        parts=[types.Part(
            function_response=types.FunctionResponse(
                name="wikipedia_search",
                response={
                    "result": summaries
                }
            )
        )]
    )
)

to_markdown(response.text)

## Re-ranking the search results

Helper function to embed the content:

In [None]:
def get_embeddings(content: list[str]) -> np.ndarray:
  # Using client.models.embed_content for embeddings
  response = client.models.embed_content(
      model="embedding-001",  # Note: model name format has changed
      contents=content,
      task_type="SEMANTIC_SIMILARITY"
  )

  # Extract embeddings from the response
  embds = [embedding.values for embedding in response.embeddings]
  embds = np.array(embds).reshape(len(embds), -1)
  return embds

Please refer to the [embeddings guide](https://ai.google.dev/docs/embeddings_guide) for more information on embeddings.

Your next step is to define functions that you can use to calculate similarity scores between two embedding vectors. These scores will help you decide which embedding vector is the most relevant vector to the user's query.


You will now implement cosine similarity as your metric. Here returned embedding vectors will be of unit length and hence their L1 norm (`np.linalg.norm()`) will be ~1. Hence, calculating cosine similarity is esentially same as calculating their dot product score.

In [None]:
def dot_product(a: np.ndarray, b: np.ndarray):
  return (a @ b.T)

### Similarity with user's query

Now it's time to find the most relevant search result returned by the Wikipedia API.

Use Gemini API to get embeddings for user's query and search results.

In [None]:
search_res = get_embeddings(summaries)
embedded_query = get_embeddings([query])

Calculate similarity score:

In [None]:
sim_value = dot_product(search_res, embedded_query)

using `np.argmax` best candidate is selected.

**Users's Input:** Explain how deep-sea life survives.

**Answer:**

In [None]:
print(summaries[np.argmax(sim_value)])

### Similarity with Hypothetical Document Embeddings (HyDE)

Drawing inspiration from [Gao et al](https://arxiv.org/abs/2212.10496) the objective here is to generate a template answer to the user's query using `gemini-2.0-flash`'s internal knowledge. This hypothetical answer will serve as a baseline to calculate relevance of all the search results.

In [None]:
# Generate hypothetical answer using the client
res = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=f"""Generate a hypothetical answer
to the user's query by using your own knowledge. Assume that you know everything
about the said topic. Do not use factual information, instead use placeholders
to complete your answer. Your answer should feel like it has been written by a human.

query: {query}"""
)

to_markdown(res.text)

Use Gemini API to get embeddings for the baseline answer and compare them with search results

In [None]:
hypothetical_ans = get_embeddings([res.text])

Calculate similarity scores to rank the search results

In [None]:
sim_value = dot_product(search_res, hypothetical_ans)

In [None]:
sim_value

using `np.argmax` best candidate is selected.

**Users's Input:** Explain how deep-sea life survives.

**Answer:**

In [None]:
to_markdown(summaries[np.argmax(sim_value)])

You have now created a search re-ranking engine using Gemini embeddings with the Google GenAI SDK v1.0.0+!

In this notebook, you've learned how to:
1. Set up and use the new Google GenAI client
2. Use function calling to access the Wikipedia API
3. Embed content using the new embedding API format
4. Re-rank search results using similarity metrics

## Next steps

I hope you found this example helpful! Check out more examples in the [Gemini Guide](https://github.com/google-gemini/gemini-guide/) to learn more.