##### Copyright 2025 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Voice memos

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Voice_memos.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

This notebook provides a quick example of how to work with audio and text files in the same prompt. You'll use the Gemini API to help you generate ideas for your next blog post, based on voice memos you recorded on your phone, and previous articles you've written.

In [1]:
%pip install -U -q "google-genai>=1.0.0"

In [2]:
from google import genai
from google.genai.types import GenerateContentConfig
from google.colab import userdata
import time


### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [3]:
# Initialize client
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
client = genai.Client(api_key=GOOGLE_API_KEY)

Install PDF processing tools.

In [4]:
!apt install poppler-utils

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  poppler-utils
0 upgraded, 1 newly installed, 0 to remove and 30 not upgraded.
Need to get 186 kB of archives.
After this operation, 696 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 poppler-utils amd64 22.02.0-2ubuntu0.6 [186 kB]
Fetched 186 kB in 1s (194 kB/s)
Selecting previously unselected package poppler-utils.
(Reading database ... 126213 files and directories currently installed.)
Preparing to unpack .../poppler-utils_22.02.0-2ubuntu0.6_amd64.deb ...
Unpacking poppler-utils (22.02.0-2ubuntu0.6) ...
Setting up poppler-utils (22.02.0-2ubuntu0.6) ...
Processing triggers for man-db (2.10.2-1) ...


## Upload your audio and text files


In [5]:
# Download files
!wget -q https://storage.googleapis.com/generativeai-downloads/data/Walking_thoughts_3.m4a
!wget -q https://storage.googleapis.com/generativeai-downloads/data/A_Possible_Future_for_Online_Content.pdf
!wget -q https://storage.googleapis.com/generativeai-downloads/data/Unanswered_Questions_and_Endless_Possibilities.pdf

In [7]:
audio_file_name = "Walking_thoughts_3.m4a"
audio_file = client.files.upload(file="Walking_thoughts_3.m4a")

## Extract text from the PDFs

In [8]:
# Convert PDFs to text
!pdftotext -q A_Possible_Future_for_Online_Content.pdf
!pdftotext -q Unanswered_Questions_and_Endless_Possibilities.pdf

In [9]:
# Upload files
blog_file = client.files.upload(file="A_Possible_Future_for_Online_Content.txt")


In [10]:
blog_file2 = client.files.upload(file="Unanswered_Questions_and_Endless_Possibilities.txt")

## System instructions

Write a detailed system instruction to configure the model.

In [11]:
# System instruction
system_instruction = """
Objective: Transform raw thoughts and ideas into polished, engaging blog posts that capture a writer's unique style and voice.

Input:
- Example Blog Posts (1-5): Guides style preferences (word choice, sentence structure, voice).
- Audio Clips: Brainstorming thoughts in natural speech.

Output:
- Blog Post Draft (500-800 words):
  - Clear, concise, engaging writing.
  - Consistent tone/style with examples.
  - Logical structure with sections.
  - Strong conclusion.

Process:
1. Style Analysis: Analyze example posts for vocabulary, sentence structure, tone.
2. Audio Transcription: Extract key ideas from audio.
3. Draft Generation: Combine insights into a structured draft.
"""

## Generate Content

In [13]:
# Generate content
prompt = "Draft my next blog post based on my thoughts in this audio file and these two previous blog posts I wrote."

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[prompt, blog_file, blog_file2, audio_file],
    config=GenerateContentConfig(
        system_instruction=system_instruction,
        httpOptions={"timeout": 7200}
    )
)

print(response.text)

Okay, here's a draft blog post based on your provided materials. I've tried to capture your conversational tone and focus on the "Write to Think" concept, framing it around your earlier experiences.

**Title: Embracing the Mess: Why "Writing to Think" is the Ultimate Productivity Hack**

Early in my career, I was obsessed with the final product. I poured hours into visions, roadmaps, and meticulously planned projects.  Coming straight from school, I was wired to believe that an assignment, once given, had to be executed perfectly. There were no "take-backsies." You got the brief, you delivered, and you were graded on it.

The reality of the workforce hit me hard.  Projects got scrapped. Priorities shifted.  Sometimes, entire initiatives were killed weeks, even months, into development. And, to be honest, it felt like a colossal waste of time. All that effort, all those late nights, for *nothing*. I struggled to reconcile this with my ingrained belief that every task needed to have a ta

In [14]:
# Clean up files
client.files.delete(name=audio_file.name)
client.files.delete(name=blog_file.name)
client.files.delete(name=blog_file2.name)

DeleteFileResponse()

## Learning more

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.