perf: share few-shot preamble across prompts via PromptParts by dan504512 · Pull Request #447 · google/langextract

dan504512 · 2026-04-17T19:06:18Z

Fixes #446

Description

Share few-shot preamble across prompts via a new PromptParts dataclass,
reducing batch prompt memory from O(N × preamble_size) to O(1 × preamble_size + N × small_parts).

Add PromptParts(prefix, examples, suffix) frozen dataclass in prompting.py with __str__() for backward-compatible string conversion
QAPromptGenerator caches formatted examples in __post_init__ and exposes render_parts() returning PromptParts; render() reimplemented via str(render_parts(...))
PromptBuilder.build_prompt() and ContextAwarePromptBuilder.build_prompt() return PromptParts instead of str
_build_request() in gemini_batch.py emits 3 text parts in contents[0].parts when given PromptParts with non-empty examples, keeping the shared reference intact
Gemini single-prompt, OpenAI, and Ollama providers convert PromptParts to str at entry point (negligible memory impact since they process one at a time)
GCSBatchCache._compute_hash() resolves non-primitive values (e.g. PromptParts) to str transiently at hash time, producing hashes identical to the old string-based format. Only one temporary concatenated string exists at a time during sequential hash computation, so peak memory is unaffected. Cache compatibility is preserved — no cache misses on upgrade.

Expected impact at batch_length=10000 with ~300 KB examples:
10,000 × 640 KB = 6.4 GB → 1 × 640 KB + 10,000 × ~1 KB = ~10 MB (640× reduction)

Memory benchmarks

Tested with 1000 prompts, ~154 KB examples per prompt:

Version	Batch prompts	key_data_list	Cache compat
`main` (str only)	150.2 MB	0.4 MB	baseline
This PR	0.2 MB	0.4 MB	hashes match ✓

How Has This Been Tested?

All existing tests updated and passing (433+ pass across py3.10/3.11/3.12)
New test_build_prompt_shares_examples_reference and test_context_aware_shares_examples_reference verify the memory-sharing invariant via assertIs (all prompts from the same generator share a single examples string object)
Verified str(PromptParts) produces byte-identical output to old render() for all 4 cases (examples/no-examples × context/no-context)
Verified _compute_hash produces identical hashes for PromptParts vs plain str prompts, confirming cache key stability
All 3 import-linter contracts kept
Format check passes (pyink + isort)

Checklist:

I have read and acknowledged Google's Open Source Code of conduct.
I have read the Contributing page, and I either signed the Google Individual CLA or am covered by my company's Corporate CLA.
I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
I have made any needed documentation changes, or noted in the linked issue(s) that documentation elsewhere needs updating.
I have added tests, or I have ensured existing tests cover the changes.
I have followed Google's Python Style Guide and ran pylint over the affected code.

github-actions · 2026-04-18T05:16:09Z

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

…ompts Add a frozen PromptParts dataclass that splits rendered prompts into prefix (description + context), examples (large, shared by reference), and suffix (question + answer prefix). QAPromptGenerator caches the formatted examples text in __post_init__ and exposes render_parts() which returns a PromptParts whose examples field is always the same string object. render() is reimplemented as str(render_parts(...)). PromptBuilder.build_prompt() and ContextAwarePromptBuilder.build_prompt() now return PromptParts instead of str, so downstream consumers receive structured prompts that share the large examples allocation. Closes google#446

When _build_request receives a PromptParts with non-empty examples, emit three text parts in contents[0].parts instead of one concatenated string. The middle part holds the shared examples reference, so 10,000 requests share one ~300 KB string instead of duplicating it per request. Gemini single-prompt, OpenAI, and Ollama providers convert PromptParts to str at their entry points; since they process prompts one at a time (or in small thread pools), the temporary string has negligible memory impact.

Update PromptBuilder, ContextAwarePromptBuilder, Annotator, and extract() tests to work with PromptParts instead of plain strings. Add test_build_prompt_shares_examples_reference and test_context_aware_shares_examples_reference to verify the memory- sharing invariant (all prompts from the same generator share a single examples string object via `assertIs`).

Convert PromptParts to str before inserting into cache key_data dicts so that the SHA256 hash matches the old string-based format. This avoids a full cache miss on upgrade. The str() call creates one temporary string per prompt, processed sequentially, so peak memory is unchanged.

…ptParts" The str(prompt) conversion in key_data dicts negates the memory optimization from PromptParts by materializing 10,000 × ~640 KB concatenated strings in key_data_list (6.4 GB). Without it, PromptParts serializes via dataclasses.asdict in _json_default, keeping the shared examples reference intact. Cache keys will differ from pre-PromptParts entries, but those expire via GCS lifecycle (retention_days) anyway. This reverts commit b9c1238.

…hash Convert non-primitive values (e.g. PromptParts) to str inside _compute_hash rather than at key_data construction time. This keeps PromptParts references in key_data_list (shared examples, ~0.4 MB) while producing hashes identical to the old string-based format (cache compat preserved). Only one transient str copy exists at a time during sequential hash computation. Replaces the reverted str(prompt) approach which materialized all prompts upfront in key_data_list, negating the PromptParts memory optimization (10,000 × 640 KB = 6.4 GB).

github-actions · 2026-04-26T03:04:03Z

⚠️ Branch Update Required

Your branch is 6 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

github-actions Bot added the size/M Pull request with 150-600 lines changed label Apr 17, 2026

dan504512 added 4 commits April 18, 2026 11:11

dan504512 force-pushed the perf/prompt-parts-shared-preamble branch from b9c1238 to 221917c Compare April 18, 2026 16:11

dan504512 added 2 commits April 22, 2026 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: share few-shot preamble across prompts via PromptParts#447

perf: share few-shot preamble across prompts via PromptParts#447
dan504512 wants to merge 6 commits intogoogle:mainfrom
dan504512:perf/prompt-parts-shared-preamble

dan504512 commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dan504512 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Memory benchmarks

How Has This Been Tested?

Checklist:

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dan504512 commented Apr 17, 2026 •

edited

Loading