# Other Techniques Examples

## Setup

In [1]:
%pip install -U -q "google-genai>=1.0.0"  # Install the Python SDK

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see the [Authentication](../quickstarts/Authentication.ipynb) quickstart for an example.

In [2]:
from google.colab import userdata
from google import genai

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
client = genai.Client(api_key=GOOGLE_API_KEY)

MODEL_ID = "gemini-2.5-flash"

In [3]:
from google.genai import types

import requests
import json
import math

questions = requests.get("https://raw.githubusercontent.com/phil-daniel/gemini-batcher/refs/heads/main/examples/demo_files/questions.txt").text.split('\n')
content = requests.get("https://raw.githubusercontent.com/phil-daniel/gemini-batcher/refs/heads/main/examples/demo_files/content.txt").text

## Token-Aware Chunking and Batching Example

In [4]:
system_prompt = """
    Answer each of the inputted questions using the information provided to you in the prompt. Each answer should be a **single** string in the JSON response.
    There should be **exactly** one answer for each inputted question, no more, no less.
    * **Accuracy and Precision:** Provide direct, factual answers. **Do not** create or merge any of the questions.
    * **Source Constraint:** Use *only* information explicitly present in the transcript. Do not infer, speculate, or bring in outside knowledge.
    * **Completeness:** Ensure each answer fully addresses the question, *to the extent possible with the given transcript*.
    * **Missing Information:** If the information required to answer a question is not discussed or cannot be directly derived from the transcript, respond with "N/A".
"""

model_name = "gemini-2.5-flash"
input_token_limit = client.models.get(model = model_name).input_token_limit # Retrieving the input token limit of the specified model

answers = {}

# Beginning with attempting to make an API call with the entire content & questions, if this fails we can break it up as appropriate.
queue = [(content, questions)]

while len(queue) > 0:
    curr_content, curr_questions = queue.pop(0)

    # Checking whether the input token limit is exceeded.
    input_tokens_required = client.models.count_tokens(
        model = model_name,
        contents = [curr_content, curr_content]
    ).total_tokens
    if input_tokens_required > input_token_limit:
        # In this case we know that an API call with the current content will exceed the input token limit for the current model.
        # If this is the case, we split the content in half so each API call processes half of the content.
        chunked_content = [curr_content[0 : len(curr_content)//2 + 1], curr_content[len(curr_content)//2 + 1 : len(curr_content)]]
        queue.append((chunked_content[0], curr_questions))
        queue.append((chunked_content[1], curr_questions))
        continue

    # Making the API call to the Gemini model
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        config=types.GenerateContentConfig(
            system_instruction=system_prompt,
            response_mime_type = "application/json",
            response_schema = list[str]
        ),
        contents=[f'Content:\n{curr_content}', f'\nQuestion:\n{curr_questions}']
    )

    # Checking the finish reason of token generation, anything other than 'STOP' is unnatural.
    if response.candidates[0].finish_reason == types.FinishReason.MAX_TOKENS:
        # In this case we know that token generation finished due to max token limit being exceeded, therefore we likely have not recieved a full answer.
        # We will therefore retry the API call but split the questions into batches of half the sizes to reduce the output.
        batch1, batch2 = curr_questions[0 : len(curr_questions)//2 + 1], curr_content[len(curr_questions)//2 + 1 : len(curr_questions)]
        queue.append((curr_content, batch1))
        queue.append((curr_content, batch2))
        continue

    response_parsed = json.loads(response.text)

    for i in range(len(curr_questions)):
        answers[curr_questions[i]] = response_parsed[i]

print("First 10 Answers:")
for key in (list(answers.keys())[:10]):
    print (f"{key}\n\t{answers[key]}")

First 10 Answers:
What is the goal of MIT 6.00 (Introduction to Computer Science and Programming)?
	To help everybody learn about computation and to learn how to think like a computer scientist.
Who teaches the course?
	Professor Eric Grimson and Professor John Guttag.
What kind of students is this course intended for?
	Students with little or no prior programming experience.
What are the strategic and tactical goals of the course?
	Strategic goals include preparing freshmen and sophomores for Course 6, helping non-Course 6 majors write and read small pieces of code, providing an understanding of computation's role in technical problems, and positioning students for jobs. Tactical goals include using computational thinking to write small programs, understanding programs written by others, understanding the capabilities and limitations of computations, and mapping scientific problems into a computational frame.
How is the course structured (lectures, recitations, workload)?
	The course 

## Semantic Batching and Chunking Example

Step 1: Generating chunks semantically

In [6]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import re

transformer_model = 'all-MiniLM-L6-v2'
threshold_factor = 0.5
min_sentences_per_chunk = 150
max_sentences_per_chunk = 250

# Splitting sentences and stripping excess detail
sentences = re.split(r'(?<=[.!?])\s+', content)
sentences = [sentence.strip() for sentence in sentences]

model = SentenceTransformer(transformer_model)

# Creating sentence embeddings using the SentenceTransformer model
sentence_embeddings = model.encode(sentences)

# Calculating the similarity between adjacent embeddings
similarities = []
for i in range(len(sentence_embeddings) - 1):
    # Reshape for cosine_similarity: (1, n_features) for each vector
    s1 = sentence_embeddings[i].reshape(1, -1)
    s2 = sentence_embeddings[i+1].reshape(1, -1)
    similarity = cosine_similarity(s1, s2)[0][0]
    similarities.append(similarity)

# Calculating a threshold value for cosine similarity.
mean = np.mean(similarities)
std_dev = np.std(similarities)
similarity_threshold = mean - (std_dev * threshold_factor)

boundaries = [0]
current_chunk_start_pos = 0
for i in range(len(similarities)):
    # Checking if there is a natural boundary.
    if similarities[i] < similarity_threshold and (i + 1) - current_chunk_start_pos >= min_sentences_per_chunk:
        boundaries.append(i+1)
        current_chunk_start_pos = i + 1
    elif (i+1) - current_chunk_start_pos >= max_sentences_per_chunk:
        boundaries.append(i+1)
        current_chunk_start_pos = i + 1

# Adding the end point if it has not already been added
if boundaries[-1] != len(similarities) + 1:
    boundaries.append(len(similarities) + 1)

# Creating the chunks based on the boundaries.
content_chunks = []
for i in range(len(boundaries) - 1):
    content_chunks.append(" ".join(sentences[boundaries[i] : boundaries[i+1]]))

print (len(content_chunks))

5


Step 2: Generating batches semantically based on chunks

In [7]:
# Creating a batch for each chunk. Each batch only contains the questions for its respective chunks.
question_batches = [[] for _ in range(len(content_chunks))]

# Creating embeddings for each question.
question_embeddings = model.encode(questions)
# Creating an embeddings for each chunk - not each sentence in a chunk.
chunk_embeddings = model.encode(content_chunks)

for i in range(len(question_embeddings)):
    # Calculating the similarity to each chunk.
    chunk_similarity = cosine_similarity(question_embeddings[i].reshape(1, -1), chunk_embeddings)[0]
    # Finding the most similar chunk and adding the question to its batch.
    most_similar_chunk = np.argmax(chunk_similarity)
    question_batches[most_similar_chunk].append(questions[i])

print (question_batches)

[['What is the goal of MIT 6.00 (Introduction to Computer Science and Programming)?', 'Who teaches the course?', 'What kind of students is this course intended for?', 'What are the strategic and tactical goals of the course?', 'What programming language is used in the course?', 'Are there any textbooks required for the course?', 'How are grades calculated in the course?', 'What does it mean to think like a computer scientist?'], ['How is the course structured (lectures, recitations, workload)?', 'How are quizzes and exams administered (e.g., open book)?', 'What are the expectations regarding lecture and recitation attendance?', 'Why are class notes not handed out?'], ['What is computation?', 'How do we capture imperative knowledge in a computer?', 'What is a fixed-program computer?', 'What is a stored-program computer?', 'What are the components of a stored-program computer?', 'What is a recipe in the context of computation?', 'What is the significance of Alan Turing’s work to computat

Step 3: Make API calls using the chunks.

In [8]:
system_prompt = """
    Answer each of the inputted questions using the information provided to you in the prompt. Each answer should be a **single** string in the JSON response.
    There should be **exactly** one answer for each inputted question, no more, no less.
    * **Accuracy and Precision:** Provide direct, factual answers. **Do not** create or merge any of the questions.
    * **Source Constraint:** Use *only* information explicitly present in the transcript. Do not infer, speculate, or bring in outside knowledge.
    * **Completeness:** Ensure each answer fully addresses the question, *to the extent possible with the given transcript*.
    * **Missing Information:** If the information required to answer a question is not discussed or cannot be directly derived from the transcript, respond with "N/A".
"""

answers = {}

for i in range(len(content_chunks)):
    # If there are no questions for the current chunk we don't need to bother querying.
    if len(question_batches[i]) == 0:
        continue

  # Making the API call to the Gemini model
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        config=types.GenerateContentConfig(
            system_instruction=system_prompt,
            response_mime_type = "application/json",
            response_schema = list[str]
        ),
        contents=[f'Content:\n{content_chunks[i]}', f'\nQuestion:\n{question_batches[i]}']
    )

    response_parsed = json.loads(response.text)

    for j in range(len(question_batches[i])):
        answers[question_batches[i][j]] = response_parsed[j]

print("First 10 Answers:")
for key in (list(answers.keys())[:10]):
    print (f"{key}\n\t{answers[key]}")


First 10 Answers:
What is the goal of MIT 6.00 (Introduction to Computer Science and Programming)?
	The goal of the course is to help everybody learn about computation, help students learn how to think like a computer scientist, and provide an understanding of the role computation can and cannot play in tackling technical problems.
Who teaches the course?
	Eric Grimson and John Guttag teach the course.
What kind of students is this course intended for?
	This course is primarily aimed at students who have little or no prior programming experience.
What are the strategic and tactical goals of the course?
	The strategic goals are to prepare freshmen and sophomores interested in majoring in course six to get an easy entry into the department, help students not majoring in course six feel justifiably confident in their ability to write and read small pieces of code, give all students an understanding of the role computation can and cannot play in tackling technical problems, and position al