# Gemini intro

The Gemini API can be used to generate many different outputs such as text, images, video and audio. In this tutorial, we'll go through text generation, check some metadata and tokens. In the end we'll explore multimodal inputs.

## Text generation

In [None]:
from google import genai

client = genai.Client()
response = client.models.generate_content(
    model = "gemini-2.0-flash",
    contents = "Generate some funny jokes about data engineering. Give 5 points in markdown format",
)
print(response.text)

Alright, here are 5 funny jokes about data engineering, in Markdown format:

*   Why did the data engineer break up with the database? Because they couldn't commit! They needed more isolation, and less of a "dirty read."

*   What's a data engineer's favorite band? Metallica... because they're always trying to "seek and destroy" bad data!

*   A data engineer walks into a bar and orders a beer. Then asks, "Can you give me the schema of your tap list, the lineage of your ingredients, and is your supply chain immutable?" The bartender replies, "Get out."

*   Why did the data engineer cross the road? To get to the other database... but needed to build a whole ETL pipeline just for that.

*   My therapist asked me why I'm so obsessed with data modeling. I said, "I'm just trying to normalize my life."



In [12]:
response.__dict__.keys()

dict_keys(['sdk_http_response', 'candidates', 'create_time', 'model_version', 'prompt_feedback', 'response_id', 'usage_metadata', 'automatic_function_calling_history', 'parsed'])

In [4]:
def ask_gemini(prompt, model = "gemini-2.0-flash"):
    response = client.models.generate_content(
        model = model,
        contents = prompt,
    )
    return response

response = ask_gemini(prompt="Generate some funny jokes about data engineering. Give 5 points in markdown format")
response

GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      avg_logprobs=-0.5846955623532751,
      content=Content(
        parts=[
          Part(
            text="""Okay, here are 5 funny jokes about data engineering in markdown format:

*   Why did the data engineer break up with the database administrator? Because they couldn't see eye-to-eye on normalization. It was a non-relational relationship.

*   What's a data engineer's favorite Halloween costume? A Data Lake Monster... it's big, messy, and everyone's afraid to go near it.

*   Parallel processing is so good, I tried to parallel process my taxes. Now the IRS is giving me a segmentation fault.

*   A data engineer walks into a bar and orders a beer. They then ask for a second beer. Then a third. Then a fourth. The bartender asks, "Hey, are you going to order another?" The data engineer replies, "I'm just testing the latency."

*   Why did the data engineer refuse to go to therapy? B

In [None]:
from pydantic import BaseModel

#knows that generateContentResponse is a pydantic model
# -> we can work with in a OOP manner.
isinstance(response, BaseModel)
#uses this to explore the objects recieved from the API


True

In [7]:
dict(response).keys()

dict_keys(['sdk_http_response', 'candidates', 'create_time', 'model_version', 'prompt_feedback', 'response_id', 'usage_metadata', 'automatic_function_calling_history', 'parsed'])

In [8]:
response.model_version

'gemini-2.0-flash'

## Analyze tokens

Tokens are the basic units of text for LLMs used to process and generate language. It is how LLMs divide the text into smaller units, for simplicity you could see a word as a token. Tokens are also what you pay for when you use the APIs

The free tier in gemini API allows for (Gemini 2.5 flash)

Price as of 2025-11-11
Requests per minute (RPM): 10
Tokens per minute (TPM): 250 000
Requests per day (RDP): 250

There is a possiblity to upgrade to higher tiers, which allows for more generous rate limits, but comes with higher costs.

In [13]:
metadata = response.usage_metadata
metadata

GenerateContentResponseUsageMetadata(
  candidates_token_count=186,
  candidates_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=186
    ),
  ],
  prompt_token_count=15,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=15
    ),
  ],
  total_token_count=201
)

## System instruction

In [12]:
from google.genai import types
system_instruction= """you are an expert in Python programming. You always provide idomatic code i.e. pythonic code. 
So when you see my code or my question. 
Be very critical but answer in a concise way. 
Also be constructive to help me improve"""
prompt = "Explain OOP and dunder methods"

def ask_gemini(prompt, model = "gemini-2.0-flash"):
    response = client.models.generate_content(
        model = model,
        contents = prompt,
        config=types.GenerateContentConfig(system_instruction= system_instruction)
    )
    return response

response = ask_gemini(prompt)
response

GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      avg_logprobs=-0.11464369181494356,
      content=Content(
        parts=[
          Part(
            text="""Okay, let's break down Object-Oriented Programming (OOP) and dunder methods in Python, focusing on clarity and best practices.

**Object-Oriented Programming (OOP)**

OOP is a programming paradigm centered around "objects." Think of objects as bundles of data (attributes) and functions (methods) that operate on that data.  It's a way to structure your code to mirror real-world entities and their interactions.

**Key Concepts:**

*   **Class:** A blueprint or template for creating objects.  It defines the attributes and methods that objects of that class will have.

*   **Object (Instance):**  A specific realization of a class.  You create objects from classes.

*   **Encapsulation:** Bundling data and methods that operate on that data within a class.  It helps protect data and 

In [13]:
metadata = response.usage_metadata
metadata

GenerateContentResponseUsageMetadata(
  candidates_token_count=1135,
  candidates_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=1135
    ),
  ],
  prompt_token_count=61,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=61
    ),
  ],
  total_token_count=1196
)

In [14]:
print(f"{metadata.candidates_token_count}")
print(f"{metadata.prompt_token_count}")
print(f"{metadata.total_token_count}")

1135
61
1196


## Multimodal model

In [19]:
text_input = "describe this image shortly"
image_input= {"mime_type": "image/png", "data": open("bella.png", 'rb').read()}
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=dict(
        parts=[dict(text= text_input), dict(inline_data=image_input)]
    )
)
print(response.text)

The image shows a furry, gray rabbit wearing a small, white graduation cap with a black brim. A yellow and blue ribbon drapes around the rabbit's neck, seemingly attached to the cap. The rabbit is sitting on a gray carpet.
