Skip to content

perf: share few-shot preamble across prompts via PromptParts#447

Open
dan504512 wants to merge 6 commits intogoogle:mainfrom
dan504512:perf/prompt-parts-shared-preamble
Open

perf: share few-shot preamble across prompts via PromptParts#447
dan504512 wants to merge 6 commits intogoogle:mainfrom
dan504512:perf/prompt-parts-shared-preamble

Conversation

@dan504512
Copy link
Copy Markdown

@dan504512 dan504512 commented Apr 17, 2026

Fixes #446

Description

Share few-shot preamble across prompts via a new PromptParts dataclass,
reducing batch prompt memory from O(N × preamble_size) to O(1 × preamble_size + N × small_parts).

  • Add PromptParts(prefix, examples, suffix) frozen dataclass in prompting.py with __str__() for backward-compatible string conversion
  • QAPromptGenerator caches formatted examples in __post_init__ and exposes render_parts() returning PromptParts; render() reimplemented via str(render_parts(...))
  • PromptBuilder.build_prompt() and ContextAwarePromptBuilder.build_prompt() return PromptParts instead of str
  • _build_request() in gemini_batch.py emits 3 text parts in contents[0].parts when given PromptParts with non-empty examples, keeping the shared reference intact
  • Gemini single-prompt, OpenAI, and Ollama providers convert PromptParts to str at entry point (negligible memory impact since they process one at a time)
  • GCSBatchCache._compute_hash() resolves non-primitive values (e.g. PromptParts) to str transiently at hash time, producing hashes identical to the old string-based format. Only one temporary concatenated string exists at a time during sequential hash computation, so peak memory is unaffected. Cache compatibility is preserved — no cache misses on upgrade.

Expected impact at batch_length=10000 with ~300 KB examples:
10,000 × 640 KB = 6.4 GB → 1 × 640 KB + 10,000 × ~1 KB = ~10 MB (640× reduction)

Memory benchmarks

Tested with 1000 prompts, ~154 KB examples per prompt:

Version Batch prompts key_data_list Cache compat
main (str only) 150.2 MB 0.4 MB baseline
This PR 0.2 MB 0.4 MB hashes match ✓

How Has This Been Tested?

  • All existing tests updated and passing (433+ pass across py3.10/3.11/3.12)
  • New test_build_prompt_shares_examples_reference and test_context_aware_shares_examples_reference verify the memory-sharing invariant via assertIs (all prompts from the same generator share a single examples string object)
  • Verified str(PromptParts) produces byte-identical output to old render() for all 4 cases (examples/no-examples × context/no-context)
  • Verified _compute_hash produces identical hashes for PromptParts vs plain str prompts, confirming cache key stability
  • All 3 import-linter contracts kept
  • Format check passes (pyink + isort)

Checklist:

  • I have read and acknowledged Google's Open Source Code of conduct.
  • I have read the Contributing page, and I either signed the Google Individual CLA or am covered by my company's Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
  • I have made any needed documentation changes, or noted in the linked issue(s) that documentation elsewhere needs updating.
  • I have added tests, or I have ensured existing tests cover the changes.
  • I have followed Google's Python Style Guide and ran pylint over the affected code.

@github-actions github-actions Bot added the size/M Pull request with 150-600 lines changed label Apr 17, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

…ompts

Add a frozen PromptParts dataclass that splits rendered prompts into
prefix (description + context), examples (large, shared by reference),
and suffix (question + answer prefix).  QAPromptGenerator caches the
formatted examples text in __post_init__ and exposes render_parts()
which returns a PromptParts whose examples field is always the same
string object.  render() is reimplemented as str(render_parts(...)).

PromptBuilder.build_prompt() and ContextAwarePromptBuilder.build_prompt()
now return PromptParts instead of str, so downstream consumers receive
structured prompts that share the large examples allocation.

Closes google#446
When _build_request receives a PromptParts with non-empty examples,
emit three text parts in contents[0].parts instead of one concatenated
string.  The middle part holds the shared examples reference, so
10,000 requests share one ~300 KB string instead of duplicating it
per request.

Gemini single-prompt, OpenAI, and Ollama providers convert PromptParts
to str at their entry points; since they process prompts one at a time
(or in small thread pools), the temporary string has negligible memory
impact.
Update PromptBuilder, ContextAwarePromptBuilder, Annotator, and
extract() tests to work with PromptParts instead of plain strings.
Add test_build_prompt_shares_examples_reference and
test_context_aware_shares_examples_reference to verify the memory-
sharing invariant (all prompts from the same generator share a single
examples string object via `assertIs`).
Convert PromptParts to str before inserting into cache key_data dicts
so that the SHA256 hash matches the old string-based format.  This
avoids a full cache miss on upgrade.  The str() call creates one
temporary string per prompt, processed sequentially, so peak memory
is unchanged.
@dan504512 dan504512 force-pushed the perf/prompt-parts-shared-preamble branch from b9c1238 to 221917c Compare April 18, 2026 16:11
…ptParts"

The str(prompt) conversion in key_data dicts negates the memory optimization
from PromptParts by materializing 10,000 × ~640 KB concatenated strings in
key_data_list (6.4 GB). Without it, PromptParts serializes via
dataclasses.asdict in _json_default, keeping the shared examples reference
intact. Cache keys will differ from pre-PromptParts entries, but those
expire via GCS lifecycle (retention_days) anyway.

This reverts commit b9c1238.
…hash

Convert non-primitive values (e.g. PromptParts) to str inside
_compute_hash rather than at key_data construction time.  This keeps
PromptParts references in key_data_list (shared examples, ~0.4 MB)
while producing hashes identical to the old string-based format
(cache compat preserved).  Only one transient str copy exists at a
time during sequential hash computation.

Replaces the reverted str(prompt) approach which materialized all
prompts upfront in key_data_list, negating the PromptParts memory
optimization (10,000 × 640 KB = 6.4 GB).
@github-actions
Copy link
Copy Markdown

⚠️ Branch Update Required

Your branch is 6 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Pull request with 150-600 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: prompt building duplicates full few-shot preamble per prompt, causing O(batch_length × preamble_size) peak memory

1 participant