# Prompting with Files and Audio
This notebook demonstrates how to use files and audio as part of the prompt when interacting with the Gemini Pro model. It includes examples of uploading files and audio, and using them to generate text responses.

In [1]:
import google.generativeai as genai 
from dotenv import load_dotenv
import os
import time
# Load environment variables from .env file
load_dotenv()

# Access environment variables
api_key = os.getenv("API_KEY")

GOOGLE_API_KEY = os.getenv("API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)

model = genai.GenerativeModel('gemini-2.5-pro')

  from .autonotebook import tqdm as notebook_tqdm


## Uploading and Using Files

To upload a file, use the `genai.upload_file` method. This method takes the path to the file you want to upload and returns a file object that contains metadata about the uploaded file.

In [2]:
image_file = genai.upload_file(path="guitar.jpg")
print(f"Uploaded file: {image_file.display_name} as {image_file.uri}")

Uploaded file: guitar.jpg as https://generativelanguage.googleapis.com/v1beta/files/tst0aj2xw78q


In [3]:
file = genai.get_file(name=image_file.name)
print(f"Retrieved file {file.display_name} as {file.uri}")

Retrieved file guitar.jpg as https://generativelanguage.googleapis.com/v1beta/files/tst0aj2xw78q


In [8]:
from IPython.display import Markdown

response = model.generate_content(["Describe this image as an expert in music", image_file])
Markdown(response.text)

As an expert in music and musical instruments, this image captures a classic and inspiring scene, rich with detail for any guitarist.

Here's a detailed breakdown:

**The Centerpiece: The Stratocaster-style Electric Guitar**

At the forefront is a beautiful electric guitar, immediately recognizable as a **Fender Stratocaster** or a very close copy. This is one of the most iconic and influential guitar designs in history.

*   **Finish and Body:** The guitar features a classic creamy yellow finish, reminiscent of Fender's "Vintage White" or "Buttercream." The body's double-cutaway design with its ergonomic "comfort contours" is a hallmark of the Stratocaster, allowing for easy access to the highest frets. The way the light casts shadows across the body highlights these smooth, flowing lines.

*   **Electronics:** It's equipped with the traditional **three single-coil pickup** configuration (often abbreviated as S-S-S). This setup is renowned for producing bright, clear, and "chimey" tones. The five-way selector switch allows the player to choose each pickup individually or combine them (neck/middle and middle/bridge) for the characteristic "quack" or "cluck" that Strats are famous for, a sound central to genres from blues (Stevie Ray Vaughan) to funk (Nile Rodgers) and rock (David Gilmour). The control layout is standard: a master volume knob and two tone knobs, typically controlling the neck and middle pickups.

*   **Hardware and Neck:** The bridge is a **vintage-style synchronized tremolo**, allowing the player to alter the pitch of the strings for vibrato effects. You can see the six individual saddles, which are crucial for setting the guitar's intonation and string height precisely. The neck and fretboard appear to be maple, a common choice that contributes to the guitar's bright, snappy attack. The black dot inlays serve as standard position markers.

*   **Ready for Action:** A 1/4-inch instrument cable is plugged into the guitar's angled output jack, indicating it's either just been played or is ready to be. The coiled cable at the bottom suggests a space dedicated to making music.

**The Supporting Cast: The Environment**

The background elements are just as telling as the guitar itself.

*   **The Amplifier:** Positioned directly behind the Stratocaster is a quintessential piece of rock amplification: a **Marshall amplifier**. The iconic white script logo is unmistakable. A Stratocaster plugged into a Marshall is one of the foundational sounds of classic and hard rock. The visible "STANDBY" switches on the control panel strongly suggest this is a **tube amplifier**, prized by musicians for its warm, rich, and dynamic overdrive when pushed. This is a professional-grade setup, not just a beginner's practice amp.

*   **The Acoustic Counterpart:** To the right, we see the body of a **dreadnought-style acoustic guitar**. Its natural wood finish, soundhole with a multi-ring rosette, and bridge with bridge pins offer a beautiful contrast to the electric guitar. Its presence suggests a versatile musician or a recording environment where both electric and acoustic textures are valued.

**Overall Impression:**

This image portrays a musician's creative space. It's a snapshot of a timeless combination of gear that has defined popular music for decades. The scene speaks of potential and readiness—the electric guitar is plugged in, the amp is on standby, and the acoustic is within arm's reach. It's an invitation to create, capturing the soul of rock, blues, and countless other genres in a single, well-composed frame.

In [9]:
genai.delete_file(image_file.name)
print(f"Deleted {image_file.display_name}")

Deleted guitar.jpg


## Tokens in Gemini
Tokens are the building blocks of text that language models like Gemini use to process and generate language. A token can be as short as one character or as long as one word (or even a few words in some cases). For example, the word "chat" is one token, while the phrase "chatGPT is great!" would be broken down into several tokens: "chat", "GPT", "is", "great", and "!".

A context window is the maximum number of tokens that a model can consider at one time when generating text. For example, if a model has a context window of 8,192 tokens, it means that the model can take into account up to 8,192 tokens of input text when generating a response. If the input text exceeds this limit, the model will only consider the most recent 8,192 tokens.

Important note: The context window includes both input tokens (the text you provide to the model) and output tokens (the text generated by the model). For example, if you provide 4,000 tokens of input and the model generates 4,000 tokens of output, the total token count is 8,000 tokens, which fits within an 8,192-token context window.

On images: Each image counts towards the token limit based on its size.

In [10]:
model_info = genai.get_model("models/gemini-2.5-pro")
(model_info.input_token_limit, model_info.output_token_limit)

(1048576, 65536)

## Prompting with Audio
To upload an audio file, use the `genai.upload_file` method in the same way as for other files. You can then include the audio file in your prompt to the model. 

In [12]:
audio_file = genai.upload_file("japanese.mp3")

In [13]:
prompt = "Create a Japanese transcript and English translation of the audio file."

In [14]:
response = model.generate_content([prompt, audio_file])
print(response.text)

**Japanese**

気は抜いてもいかん、張りすぎてもいかん。
張り詰めた気は弱く、緩んだ気も弱い。
敵に気を見せるな。

**English**

You must neither let your guard down, nor be too tense.
A spirit stretched too thin is weak, and a spirit that is too slack is also weak.
Do not let the enemy see your spirit.
