# gem

> Simple utilities for working with Google's Gemini API

This notebook provides a minimal interface to Google's Gemini API. The goal is to make it dead simple to:

1. Generate text with just a prompt
2. Analyze files (PDFs, images) 
3. Process videos (like YouTube)

All through a single `gem()` function that just works.

In [1]:
#| default_exp gem

In [2]:
#| hide
from nbdev.showdoc import *

## Setup

First, make sure you have your Gemini API key set:

In [3]:
#| export
import os
from pathlib import Path
from fastcore.all import *
from google import genai
from google.genai import types
from functools import partial

In [4]:
# export GEMINI_API_KEY='your-api-key'
assert os.environ.get("GEMINI_API_KEY"), "Please set GEMINI_API_KEY environment variable"

## Building blocks

Let's start with the simple helper functions that make everything work.

### Client creation

We need a Gemini client to talk to the API:

In [5]:
#|export
def _client():
    "Get Gemini client"
    return genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

In [6]:
#|hide
c = _client()
assert c is not None
assert hasattr(c, 'models')

### Converting attachments to Parts

Gemini expects different types of content (files, URLs) to be wrapped in "Parts". This helper handles that conversion:

In [7]:
#| export
def _is_url(s):
    "Check if string is a URL"
    if not isinstance(s, str): return False
    return (s.startswith('http://') or 
            s.startswith('https://') or 
            s.startswith('www.') or 
            'youtube.com' in s or 
            'youtu.be' in s)

def _make_part(o):
    "Convert object to Gemini Part"
    if isinstance(o, (str, Path)):
        p = Path(o)
        if p.exists():
            mime_map = {'.pdf': 'application/pdf', 
                        '.png': 'image/png', 
                        '.jpg': 'image/jpeg', 
                        '.jpeg': 'image/jpeg', 
                        '.gif': 'image/gif'}
            mime = mime_map.get(p.suffix.lower(), 'application/octet-stream')
            return types.Part.from_bytes(mime_type=mime, data=p.read_bytes())
        elif _is_url(o): return types.Part.from_uri(file_uri=o, mime_type='video/*')
    return None

## The main interface

Now we can build our main `gem()` function that handles all use cases:

In [8]:
#| export
def gem(prompt, # Text prompt
        o=None, # Optional file/URL attachment or list of attachments
        model='gemini-2.5-flash',
        thinking=-1,
        search=False):
    "Generate content with Gemini"
    parts = [types.Part.from_text(text=prompt)]
    
    # Handle single attachment or list of attachments
    attachments = o if isinstance(o, list) else [o] if o else []
    for attachment in attachments:
        if part := _make_part(attachment): parts.insert(0, part)
    
    contents = types.Content(role='user', parts=parts) if attachments else prompt    
    config_dict = {
        'thinking_config': types.ThinkingConfig(thinking_budget=thinking),
        'response_mime_type': 'text/plain'
    }
    # Adjust media_resolution for videos for more tokens
    if any(attachment and not Path(str(attachment)).exists() for attachment in attachments): 
        config_dict['media_resolution'] = 'MEDIA_RESOLUTION_LOW'
    config_dict['tools'] = []
    if search: config_dict['tools'].append(types.Tool(google_search=types.GoogleSearch()))
    cfg = types.GenerateContentConfig(**config_dict)
    resp = _client().models.generate_content(model=model, contents=contents, config=cfg)
    return resp.text

## Examples

One function handles everything:
- Just text? Pass a prompt.
- Have a file? Pass it as the second argument.
- Got a YouTube URL? Same thing.

Let's test it out:

### Text generation

The simplest case - just generate some text:

In [9]:
gem("Write a haiku about Python programming")

'Clean, clear code unfolds,\nIndents guide the powerful flow,\nProblems solved with ease.'

### Video analysis

Perfect for creating YouTube chapters or summaries:

In [10]:
prompt = "5 word summary of this video."
gem(prompt, "https://youtu.be/1x3k0V2IITo")

'Late interaction beats single vector.'

### File analysis

Great for extracting information from PDFs or images:

In [11]:
gem("3 sentence summary of this presentation.", "NewFrontiersInIR.pdf")

'This presentation introduces new frontiers in Information Retrieval (IR), focusing on instruction following and reasoning capabilities, much like Large Language Models (LLMs). It presents two key models: Promptriever, a fast bi-encoder trained to follow natural language instructions for retrieval, and Rank1, a slower cross-encoder capable of complex, test-time reasoning for judging document relevance. These "promptable" and "reasoning" retrievers significantly enhance search performance, unlock new types of queries, and can even uncover previously overlooked relevant documents.'

In [12]:
gem("What's in this image?", "anton.png")



### Change Model

You can also control the model and thinking time:

In [13]:
gem("What is Hamel Husain's current job?", model="gemini-2.5-pro")

"Based on his public profiles and professional presence, Hamel Husain's current job is **Co-founder and CEO** of **Gantry**.\n\nGantry is a company that provides an AI observability platform designed to help teams monitor, analyze, and improve their machine learning models in production.\n\nBefore co-founding Gantry in 2022, he was well-known for his role as the Head of Machine Learning at GitHub."

### Grounded Search

As you can see, grounded search is required to get things right sometimes!

In [14]:
gem("What is Hamel Husain's current job?.", search=True)

"\nthought\nThe search results indicate that Hamel Husain is currently an independent consultant specializing in AI and machine learning, particularly in operationalizing Large Language Models (LLMs). While he was previously a Staff Machine Learning Engineer at GitHub and is still listed as such in some contexts, more recent information from March and July 2025 explicitly states his role as an independent consultant. This suggests his independent consulting is his current primary job. Therefore, I have sufficient information to answer the user's request.Hamel Husain is currently an independent consultant, specializing in helping companies build, evaluate, and operationalize AI-powered systems and Large Language Models (LLMs). He focuses on making AI more reliable, understandable, and actionable.\n\nPreviously, he held the position of Staff Machine Learning Engineer at GitHub, where he was involved in the design and development of software engineering, machine learning, and developer to

### Multiple Attachments

You can analyze multiple files/URLs at once by passing a list:

In [17]:
prompt = "Is this PDF and YouTube video related or are they different talks? Answer with very short yes/no answer."
gem(prompt, ["https://youtu.be/Trps2swgeOg?si=yK7CO0Zk4E1rfp6s", "NewFrontiersInIR.pdf"])

'No.'

In [18]:
gem(prompt, ["https://youtu.be/YB3b-wPbSH8?si=WI0LqflY5SYIsRz9", "NewFrontiersInIR.pdf"])

'Yes.'

## Shortcuts

### Functions to help you do common tasks

In [19]:
#|export
def yt_chapters(link):
    "Generate YoutTube Summary and Chapters From A Public Video."
    
    chapter_prompt="Generate a succinct video summary (1-2 sentences) followed by YouTube chapter timestamps for this video. Format each line of the chapter summaries as 'MM:SS - Chapter Title' (e.g., '02:30 - Introduction'). Start with 00:00. Include all major topics and transitions and be thorough - do not miss any important topics.  For the summary, do not say 'In this video, we will cover the following topics', 'This video discusses..' or anything like that. Instead, reference the main speaker's name if you know it.  If there is a Q&A Section, enumerate individual questions as additional chapters."
    return gem(prompt=chapter_prompt, o=link, model="gemini-2.5-pro")

This is what it looks like for Antoine's [Late Interaction Talk](https://youtu.be/1x3k0V2IITo):

In [20]:
chp = yt_chapters("https://youtu.be/1x3k0V2IITo")
print(chp)

Antoine Chaffin explains the inherent limitations of single vector search, such as information loss from pooling and poor performance in out-of-domain and long-context scenarios. He then introduces late interaction (multi-vector) models as a superior alternative that avoids these pitfalls and presents the PyLate library to make training and evaluating these powerful models more accessible.

00:00 - Going Further: Late Interaction Beats Single Vector Limits
00:32 - About Antoine Chaffin
01:40 - Explaining Dense (Single) Vector Search
03:08 - Why Single Vector Search is the Go-To for RAG
03:54 - Performance Evaluation and the MTEB Leaderboard
04:17 - BEIR: A School Case of Goodhart's Law
05:36 - Limitations Beyond Standard Benchmarks
08:24 - Pooling: The Intrinsic Flaw of Dense Models
08:41 - How Pooling Creates Problems in Production
10:42 - The Advantage of BM25
11:32 - Replacing Pooling with Late Interaction
12:17 - Why Not Just Use a Bigger Single Vector?
13:51 - Performance Comparis

## Export -

In [21]:
#| hide
import nbdev; nbdev.nbdev_export()