perf: share few-shot preamble across prompts via PromptParts#447
Open
dan504512 wants to merge 6 commits intogoogle:mainfrom
Open
perf: share few-shot preamble across prompts via PromptParts#447dan504512 wants to merge 6 commits intogoogle:mainfrom
dan504512 wants to merge 6 commits intogoogle:mainfrom
Conversation
|
Your branch is 1 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
…ompts Add a frozen PromptParts dataclass that splits rendered prompts into prefix (description + context), examples (large, shared by reference), and suffix (question + answer prefix). QAPromptGenerator caches the formatted examples text in __post_init__ and exposes render_parts() which returns a PromptParts whose examples field is always the same string object. render() is reimplemented as str(render_parts(...)). PromptBuilder.build_prompt() and ContextAwarePromptBuilder.build_prompt() now return PromptParts instead of str, so downstream consumers receive structured prompts that share the large examples allocation. Closes google#446
When _build_request receives a PromptParts with non-empty examples, emit three text parts in contents[0].parts instead of one concatenated string. The middle part holds the shared examples reference, so 10,000 requests share one ~300 KB string instead of duplicating it per request. Gemini single-prompt, OpenAI, and Ollama providers convert PromptParts to str at their entry points; since they process prompts one at a time (or in small thread pools), the temporary string has negligible memory impact.
Update PromptBuilder, ContextAwarePromptBuilder, Annotator, and extract() tests to work with PromptParts instead of plain strings. Add test_build_prompt_shares_examples_reference and test_context_aware_shares_examples_reference to verify the memory- sharing invariant (all prompts from the same generator share a single examples string object via `assertIs`).
Convert PromptParts to str before inserting into cache key_data dicts so that the SHA256 hash matches the old string-based format. This avoids a full cache miss on upgrade. The str() call creates one temporary string per prompt, processed sequentially, so peak memory is unchanged.
b9c1238 to
221917c
Compare
…ptParts" The str(prompt) conversion in key_data dicts negates the memory optimization from PromptParts by materializing 10,000 × ~640 KB concatenated strings in key_data_list (6.4 GB). Without it, PromptParts serializes via dataclasses.asdict in _json_default, keeping the shared examples reference intact. Cache keys will differ from pre-PromptParts entries, but those expire via GCS lifecycle (retention_days) anyway. This reverts commit b9c1238.
…hash Convert non-primitive values (e.g. PromptParts) to str inside _compute_hash rather than at key_data construction time. This keeps PromptParts references in key_data_list (shared examples, ~0.4 MB) while producing hashes identical to the old string-based format (cache compat preserved). Only one transient str copy exists at a time during sequential hash computation. Replaces the reverted str(prompt) approach which materialized all prompts upfront in key_data_list, negating the PromptParts memory optimization (10,000 × 640 KB = 6.4 GB).
|
Your branch is 6 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #446
Description
Share few-shot preamble across prompts via a new
PromptPartsdataclass,reducing batch prompt memory from O(N × preamble_size) to O(1 × preamble_size + N × small_parts).
PromptParts(prefix, examples, suffix)frozen dataclass inprompting.pywith__str__()for backward-compatible string conversionQAPromptGeneratorcaches formatted examples in__post_init__and exposesrender_parts()returningPromptParts;render()reimplemented viastr(render_parts(...))PromptBuilder.build_prompt()andContextAwarePromptBuilder.build_prompt()returnPromptPartsinstead ofstr_build_request()ingemini_batch.pyemits 3 text parts incontents[0].partswhen givenPromptPartswith non-empty examples, keeping the shared reference intactPromptPartstostrat entry point (negligible memory impact since they process one at a time)GCSBatchCache._compute_hash()resolves non-primitive values (e.g.PromptParts) tostrtransiently at hash time, producing hashes identical to the old string-based format. Only one temporary concatenated string exists at a time during sequential hash computation, so peak memory is unaffected. Cache compatibility is preserved — no cache misses on upgrade.Expected impact at
batch_length=10000with ~300 KB examples:10,000 × 640 KB = 6.4 GB → 1 × 640 KB + 10,000 × ~1 KB = ~10 MB (640× reduction)
Memory benchmarks
Tested with 1000 prompts, ~154 KB examples per prompt:
main(str only)How Has This Been Tested?
test_build_prompt_shares_examples_referenceandtest_context_aware_shares_examples_referenceverify the memory-sharing invariant viaassertIs(all prompts from the same generator share a single examples string object)str(PromptParts)produces byte-identical output to oldrender()for all 4 cases (examples/no-examples × context/no-context)_compute_hashproduces identical hashes forPromptPartsvs plainstrprompts, confirming cache key stabilityChecklist:
pylintover the affected code.