<a href="https://colab.research.google.com/github/sampathn2005/google-ai-studio-text-gen/blob/main/Gemini_2_0_Flash_Thinking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gemini 2.0 Flash Thinking

The Gemini 2.0 Flash Thinking model is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, the Flash Thinking model is capable of stronger reasoning capabilities in its responses than the Gemini 2.0 Flash Experimental model.



# Use thinking models

Flash Thinking models are available in Google AI Studio and through the Gemini API. The Gemini API doesn't return thoughts in the response.

Note: We have set up gemini-2.0-flash-thinking-exp as an alias to the latest Flash Thinking model. Use this alias to get the latest Flash thinking model, or specify the full model name.

# Send a basic request

This example uses the new Google Genai SDK and the v1alpha version of the API.

In [None]:
GEMINI_API_KEY = 'AIzaSyA_YOH0-UZl34pNjrh_TPrWSLSmhqvgQGA'

from google import genai

client = genai.Client(api_key= GEMINI_API_KEY, http_options={'api_version':'v1alpha'})

response = client.models.generate_content(
    model='gemini-2.0-flash-thinking-exp',
    contents='Explain how RLHF works in simple terms.',
)

print(response.text)

Okay, let's break down RLHF (Reinforcement Learning from Human Feedback) in super simple terms.

Imagine you have a really smart student (the AI language model) who has read a huge amount of books (its initial training data). It's great at writing sentences and continuing stories, but it doesn't really know what kind of answers you *personally* find helpful, honest, or safe. It just knows how to predict the next word based on what it's seen.

RLHF is like giving this smart student a **personal coach** who represents human preferences. Here's how it works in three main steps:

1.  **Getting Human Opinions (The Coach's Taste):**
    *   You show humans different answers the AI gave for the *same question*.
    *   Humans **rank** these answers from best to worst. They say things like, "I like answer C the best, then A, then B, then D."
    *   You collect a lot of these human rankings. This teaches you what humans generally *prefer*.

2.  **Building a "Preference Judge" (Training the Coa

# Multi-turn thinking conversations

During multi-turn conversations, you pass the entire conversation history as input, so the model has access to its previous thoughts in a multi-turn conversation.

The new Google Genai SDK provides the ability to create a multi-turn chat session which is helpful to manage the state of a conversation.



In [None]:
from google import genai

client = genai.Client(api_key= GEMINI_API_KEY, http_options={'api_version':'v1alpha'})

chat = client.aio.chats.create(
    model='gemini-2.0-flash-thinking-exp',
)
response = await chat.send_message('What is your name?')
print(response.text)
response = await chat.send_message('What did you just say before this?')
print(response.text)

I do not have a name. I am a large language model, trained by Google.
Before you asked "What did you just say before this?", I said:

"I do not have a name. I am a large language model, trained by Google."


# Limitations

The Flash Thinking model is an experimental model and has the following limitations:

*   No JSON mode or Search Grounding
*   Thoughts are only shown in Google AI Studio


# What's next?



*   Try the Flash Thinking model in Google AI Studio.
*   Try the Flash Thinking Colab


