# Gemini API: General Parameters

This notebook covers:

- Temperature
- Max output length
- Token counting


## Setup

Since we've put our gemini API key in Colab Secrets, we can just run the following cells to setup:

In [1]:
!pip install -q -U google-generativeai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m164.2/164.2 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m718.3/718.3 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25h

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Safety.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

In [2]:
import google.generativeai as genai

In [3]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Model Temperature

Steve is indecisive about how to spend his Friday night as a Freshman at Berkeley. He decides to ask Gemini using the 1.5 Flash [variant](https://ai.google.dev/gemini-api/docs/models/gemini):

In [6]:
model_name = 'gemini-1.5-flash'
model = genai.GenerativeModel(model_name)

prompt = "Help me choose a fun way to spend my Friday night as a CS major at Berkeley looking for emotionally mature and ambitious friends in a single one-sentence idea"
response = model.generate_content(prompt)
print(response.text)


Host a "Tech Talks and Trivia" night with a focus on ethical AI and the future of work, offering pizza and board games for a relaxed atmosphere where you can connect with like-minded peers. 



Steve is very smart and knows that Gemini is non-deterministic; the same prompt can result in different outputs! To demonstrate this, the following code passes the same prompt 5 different times:

In [8]:
outputs = []
prompt = "Help me choose a fun way to spend my Friday night as a CS major at Berkeley looking for emotionally mature and ambitious friends in a single one-sentence idea"
for i in range(5):
  response = model.generate_content(prompt)
  outputs.append(response.text)
for index, sentence in enumerate(outputs, start=1):
    print(f"{index}. {sentence}")

1. Host a low-key "coding for good" hackathon at your apartment, focusing on projects that address social issues, inviting fellow ambitious CS majors for a night of tech and meaningful discussion. 

2. Host a board game night with a "hackathon-inspired" twist, inviting fellow CS majors to compete for bragging rights and maybe even a prize. 

3. Host a casual board game night at a park with a focus on strategy games, inviting fellow CS majors to share their competitive spirit and discuss their ambitious goals. 

4. Host a board game night at your apartment, focusing on strategy games like Settlers of Catan or Ticket to Ride, to attract fellow ambitious CS majors seeking intellectually stimulating company. 

5. Host a "Hackathon for Humanity" themed game night with coding puzzles and discussions about impactful tech projects, attracting like-minded friends with a passion for making a difference. 



Gemini returns a different response each time. Hmm. Steve doesn't like the fact that his Friday night might be determined by random chance.

Fortunately, Gemini has a temperature parameter that controls the randomness of the output. Temperature values can range from 0.0 to 2.0. We can check the temperature of our current model as follows:

In [9]:
for m in genai.list_models():
    if m.name == 'models/gemini-1.5-flash':
        print(m)

Model(name='models/gemini-1.5-flash',
      base_model_id='',
      version='001',
      display_name='Gemini 1.5 Flash',
      description='Fast and versatile multimodal model for scaling across diverse tasks',
      input_token_limit=1048576,
      output_token_limit=8192,
      supported_generation_methods=['generateContent', 'countTokens'],
      temperature=1.0,
      max_temperature=2.0,
      top_p=0.95,
      top_k=64)


Let's initialize a new model with a temperature of 0:

In [10]:
model_name = 'gemini-1.5-flash'
new_outputs = []
low_temp_model = genai.GenerativeModel(model_name, generation_config={"temperature": 0.798})
for i in range(5):
  response = low_temp_model.generate_content(prompt)
  new_outputs.append(response.text)
for index, sentence in enumerate(new_outputs, start=1):
    print(f"{index}. {sentence}")

1. Host a "Code & Cocktails" night at a trendy bar, inviting fellow CS majors to brainstorm innovative projects while enjoying drinks and conversation. 

2. Host a "Coding for Change" hackathon at a local non-profit, inviting fellow CS majors to build tech solutions for social good while connecting with like-minded individuals. 

3. How about attending a hackathon focused on social good and using your coding skills to make a positive impact while connecting with like-minded individuals? 

4. Host a board game night with a focus on strategy games, inviting fellow CS majors for some competitive fun and intellectually stimulating conversation. 

5. Host a board game night with a focus on strategy games and lively discussion, inviting fellow CS majors to connect and build friendships. 



All the outputs are the same! Finally Steve can be sure how to spend his night. Conversely, setting temperature to the max value of 2.0 would have the opposite effect, yielding more unpredictable responses.

## Max Output Length

Let's say Steve wants his outputs to be below a certain length. In large language models, text is generated in tokens. For Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words. He can set the `max_output_tokens` variable as follows:

In [11]:
model_name = 'gemini-1.5-flash'
short_response_model = genai.GenerativeModel(model_name, generation_config={"max_output_tokens": 19})
prompt = "Help me choose a fun way to spend my Friday night as a CS major at Berkeley in a single one-sentence idea"
response = short_response_model.generate_content(prompt)
print(response.text)

Grab some friends, code up a fun project, and watch the sunset from the top of the


Notice that this simply halts token generation at a fixed quantity and does not guarantee that the output is complete.

## Token Count

Let's say that Steve is being charged for every token that he inputs to Gemini.

If Steve has billing enabled, the price of a paid request is controlled by the number of input and output tokens, so knowing how to count your tokens is important. As such, Steve might want to know the number of tokens in his prompt (input) before actually putting it into the model.

Let's create a new instance of Gemini 1.5 Flash and set our prompt:

In [13]:
model_name = 'gemini-1.5-flash'
token_n_model = genai.GenerativeModel(model_name, generation_config={"temperature": 0.0})
poem_prompt = "Write me a poem about Berkeley's data science wing of the campus"

In [14]:
response = token_n_model.generate_content(poem_prompt)
response.text

"On Berkeley's hills, where knowledge thrives,\nA new domain, where data strives.\nA wing of science, bold and bright,\nWhere numbers dance in digital light.\n\nFrom algorithms, complex and deep,\nTo insights hidden, secrets to keep.\nThe data scientists, minds so keen,\nUnraveling patterns, unseen, unseen.\n\nWith Python code and R's command,\nThey build models, understand,\nThe flow of information, vast and wide,\nUnveiling truths, where secrets hide.\n\nFrom social networks, to markets' sway,\nThey analyze trends, day by day.\nPredicting futures, with precision's art,\nA symphony of data, playing its part.\n\nIn classrooms bright, and labs so grand,\nThey learn and teach, hand in hand.\nThe future's promise, in their grasp,\nA data-driven world, a future to clasp.\n\nSo raise a glass, to Berkeley's might,\nWhere data science shines, so clear and bright.\nA beacon of knowledge, for all to see,\nThe future's promise, eternally. \n"

Before generating a response, we can check how many tokens are in this prompt using the `.count_tokens` function of the model:

In [15]:
prompt_token_count = token_n_model.count_tokens(poem_prompt)
output_token_count = token_n_model.count_tokens(response.text)
print(f'Tokens in prompt: {prompt_token_count} \n Estimated tokens in output {output_token_count}')

Tokens in prompt: total_tokens: 14
 
 Estimated tokens in output total_tokens: 238

