FEAT implement batching for memory interface#1325
Open
maifeeulasad wants to merge 8 commits intomicrosoft:mainfrom
Open
FEAT implement batching for memory interface#1325maifeeulasad wants to merge 8 commits intomicrosoft:mainfrom
maifeeulasad wants to merge 8 commits intomicrosoft:mainfrom
Conversation
romanlutz
reviewed
Jan 25, 2026
romanlutz
reviewed
Jan 25, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to address scaling limits in the PyRIT MemoryInterface by introducing “batched” filtering logic for large IN (...) queries, along with unit tests intended to validate behavior when many IDs/values are passed.
Changes:
- Added
_SQLITE_MAX_BIND_VARSand_batched_in_conditionhelper inMemoryInterfaceand applied it to several query filters. - Updated multiple retrieval methods (e.g., scores, message pieces, attack results, scenario results) to use the new helper for
INfiltering. - Added a new unit test module to exercise “batching” behavior and large-input queries against the SQLite-backed memory instance.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
pyrit/memory/memory_interface.py |
Introduces a helper intended to avoid bind-variable limits and applies it across several query paths. |
tests/unit/memory/memory_interface/test_batching_scale.py |
Adds tests intended to validate that large ID/value inputs don’t break memory queries. |
Author
|
@romanlutz eagerly waiting for your feedback! |
Contributor
I apologize for taking ages. There were A LOT of PRs to work through. |
Applies PR microsoft#1325 changes onto latest main with conflict resolution: - Add _SQLITE_MAX_BIND_VARS constant and _execute_batched_query helper - Add batching to get_scores, get_message_pieces, get_attack_results, get_scenario_results - Preserve main's dedup-by-conversation-id in get_attack_results (all paths) - Preserve main's identifier_filters, attack_class, converter_classes params - Fix test helper: use constructor role param instead of mutating _role post-creation - Extract _dedup_attack_entries helper to avoid mypy no-redef errors Co-authored-by: maifeeulasad <maifeeulasad@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
683acd7 to
3420468
Compare
- Move _SQLITE_MAX_BIND_VARS to class attribute _MAX_BIND_VARS on MemoryInterface - Override _MAX_BIND_VARS to 2000 in AzureSQLMemory (Azure SQL supports 2100) - Add early return for score_ids=[] in get_scores (consistency with other methods) - Add tests for get_attack_results batching (many IDs, many sha256, empty list) - Add test for get_scores empty list handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix dedup bug in _execute_batched_query: str(None) produced truthy 'None' string, making the no-id fallback dead code. Use explicit None check instead. - Simplify get_scenario_results: remove duplicated elif/else branches, let _execute_batched_query own threshold logic for both scenario_result_ids and all_conversation_ids. - Fix Optional -> pipe syntax in _execute_batched_query signature. - Document bind-variable headroom limitation in docstring. - Fix conditional test assertion that could pass vacuously. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract shared list_params batching logic from get_message_pieces and get_attack_results into _query_with_list_params helper (~40 duplicated lines removed from each caller). - Add batch_size parameter to _execute_batched_query so callers can reduce the batch size when other conditions consume bind variables. - _query_with_list_params computes effective_batch_size by subtracting small_param bind count from _MAX_BIND_VARS, preventing cumulative overflow of the per-statement bind variable limit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Test that batch size is reduced when small IN-clause params consume bind variables (effective_batch_size = _MAX_BIND_VARS - small_binds). Verifies the correct number of batched queries are issued. - Test that _execute_batched_query respects the batch_size override parameter, using batch_size=30 to verify 4 queries for 100 items. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #845
When
MemoryInterfacequery methods (get_scores,get_message_pieces,get_attack_results,get_scenario_results) receive large IN-clause parameter lists (e.g., 1000+ prompt IDs), the generated SQL exceeds SQLite's per-statement bind variable limit of 999, causing the query to fail. This PR adds transparent batching: large IN-clause lists are split into chunks, executed as separate queries, and the results are merged and deduplicated.What changed
Core batching infrastructure (
memory_interface.py):_MAX_BIND_VARSclass attribute onMemoryInterface(default: 500). Subclasses can override —AzureSQLMemorysets it to 2000 since Azure SQL supports 2100 params._execute_batched_query— Low-level helper that splits a single IN-clause into batches, runs separate queries, and deduplicates results by primary key. Accepts an optionalbatch_sizeoverride for callers that need a smaller batch._query_with_list_params— Higher-level helper used byget_message_piecesandget_attack_results. Classifies list parameters as "small" (fit in one query) or "large" (need batching), adds small params to SQL conditions, batches on the first large param, and filters remaining large params in Python. Computes an effective batch size that accounts for bind variables consumed by small params, preventing cumulative overflow._dedup_attack_entries— Static helper extracted fromget_attack_resultsto deduplicateAttackResultEntryrows byconversation_id(legacy data integrity workaround). Applied consistently across all code paths (batched and non-batched).Method changes:
get_scores— Batchesscore_idsvia_execute_batched_query. Adds early return forscore_ids=[]for consistency with other methods.get_message_pieces— Routes list params (prompt_ids,original_values,converted_values,converted_value_sha256) through_query_with_list_params.get_attack_results— Routesattack_result_idsandobjective_sha256through_query_with_list_params. Preserves dedup-by-conversation-id on all paths.get_scenario_results— Batchesscenario_result_idsand innerall_conversation_idsvia_execute_batched_query.azure_sql_memory.py:_MAX_BIND_VARS = 2000so Azure SQL uses larger batches (its limit is 2100).Why this approach
OR(col.IN(batch1), col.IN(batch2), ...)in a single statement, but this still binds all parameters in one statement and doesn't solve the limit.Tests and Documentation
New test file:
tests/unit/memory/memory_interface/test_batching_scale.py(586 lines, 25 tests)Covers:
get_message_pieceswith many prompt_ids, original_values, sha256 hashes (single and multiple large params)get_scoreswith many score_ids and empty list edge caseget_attack_resultswith many IDs, many objective_sha256, and empty list edge case_execute_batched_querywith small lists (single query), large lists (multiple queries), exact batch boundary, custombatch_sizeoverride_query_entriesspyAll 220
tests/unit/memory/memory_interface/tests pass. Pre-commit hooks (ruff, mypy strict) clean.