##### Copyright 2025 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# What's new in Gemini-1.5-pro-002 and Gemini-1.5-flash-002

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/New_in_002.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

This notebook explores the new options added with the 002 versions of the 1.5 series models:

* Candidate count
* Presence and frequency penalties
* Response logprobs

## Setup

Install a `002` compatible version of the SDK:

In [None]:
%pip install -q "google-genai>=1.0.0"

import the package and give it your API-key

In [None]:
from google import genai
from google.genai import types

In [None]:
from google.colab import userdata
client = genai.Client(api_key=userdata.get('GOOGLE_API_KEY'))

Import other packages.

In [None]:
from IPython.display import display, Markdown, HTML

Check available 002 models

In [None]:
for model in client.models.list():
  if '002' in model.name:
    print(model.name)

In [None]:
model_name = "models/gemini-1.5-flash-002"
test_prompt="Why don't people have tails"

## Quick refresher on `generation_config` [Optional]

In [None]:
response = client.models.generate_content(model=model_name, contents='hello', config = types.GenerateContentConfig(temperature=1.0, max_output_tokens=5))

Note:

* Each `generate_content` request is sent with a `config` (`chat.send_message` uses `generate_content`).
* You can set the `config` by passing it in the arguments to `generate_content` (or `chat.send_message`).
* You can pass the `config` as either a Python `dict`, or a `types.GenerateContentConfig`.
* If you're ever unsure about the parameters of `config` check `types.GenerateContentConfig`.

## Candidate count

With 002 models you can now use `candidate_count > 1`.

In [None]:
config = dict(candidate_count=2)

In [None]:
response = client.models.generate_content(model=model_name, contents=test_prompt, config=config)

But note that the `.text` quick-accessor only works for the simple 1-candidate case.

In [None]:
try:
  response.text # Fails with multiple candidates, sorry!
except ValueError as e:
  print(e)

With multiple candidates you have to handle the list of candidates yourself:

In [None]:
for candidate in response.candidates:
  display(Markdown(candidate.content.parts[0].text))
  display(Markdown("-------------"))


The response contains multiple full `Candidate` objects.

In [None]:
response

## Penalties

The `002` models expose `penalty` arguments that let you affect the statistics of output tokens.

### Presence penalty

The `presence_penalty` penalizes tokens that have already been used in the output, so it induces variety in the model's output. This is detectible if you count the unique words in the output.

Here's a function to run a prompt a few times and report the fraction of unique words (words don't map perfectly to tokens but it's a simple way to see the effect).

In [None]:
from statistics import mean

In [None]:
def unique_words(prompt, config, N=10):
  responses = []
  vocab_fractions = []
  for n in range(N):
    response = client.models.generate_content(model=model_name, contents=prompt, config=config)
    responses.append(response)

    words = response.text.lower().split()
    score = len(set(words))/len(words)
    print(score)
    vocab_fractions.append(score)

  return vocab_fractions

In [None]:
prompt='Tell me a story'

In [None]:
# baseline
v = unique_words(prompt, config={})

In [None]:
mean(v)

In [None]:
# the penalty encourages diversity in the oputput tokens.
v = unique_words(prompt, config=dict(presence_penalty=1.999))

In [None]:
mean(v)

In [None]:
# a negative penalty discourages diversity in the output tokens.
v = unique_words(prompt, config=dict(presence_penalty=-1.999))

In [None]:
mean(v)

The `presence_penalty` has a small effect on the vocabulary statistics.

### Frequency Penalty

Frequency penalty is similar to the `presence_penalty` but  the penalty is multiplied by the number of times a token is used. This effect is much stronger than the `presence_penalty`.

The easiest way to see that it works is to ask the model to do something repetitive. The model has to get creative while trying to complete the task.

In [None]:
response = client.models.generate_content(model=model_name, contents='please repeat "Cat" 50 times, 10 per line',
                                  config=dict(frequency_penalty=1.999))

In [None]:
print(response.text)

Since the frequency penalty accumulates with usage, it can have a much stronger effect on the output compared to the presence penalty.

> Caution: Be careful with negative frequency penalties: A negative penalty makes a token more likely the more it's used. This positive feedback quickly leads the model to just repeat a common token until it hits the `max_output_tokens` limit (once it starts the model can't produce the `<STOP>` token).

In [None]:
response = client.models.generate_content(
    model=model_name,
    contents=prompt,
    config=types.GenerateContentConfig(
        max_output_tokens=400,
        frequency_penalty=-2.0))

In [None]:
Markdown(response.text)  # the, the, the, ...

In [None]:
response.candidates[0].finish_reason