Avoid N+1 queries in PrimaryKeyRelatedField(many=True) validation#9984
Avoid N+1 queries in PrimaryKeyRelatedField(many=True) validation#9984adelkhayata76 wants to merge 4 commits into
Conversation
ManyRelatedField.to_internal_value resolved each related object with a separate child_relation.to_internal_value() call, so validating a list of N primary keys issued N SELECT queries (encode#9607). Add an opt-in to_internal_value_bulk() hook on RelatedField (defaulting to the existing per-item loop, so SlugRelatedField, HyperlinkedRelatedField and custom relations are unchanged) and override it on PrimaryKeyRelatedField to resolve every pk with a single in_bulk() query. Per-item error semantics (incorrect_type / does_not_exist), input ordering, duplicate handling, the queryset filter and pk_field transforms are all preserved; a type the backend cannot compare falls back to the per-item path so the offending item still raises the same error.
Handle string primary keys (e.g. from HTML form input): in_bulk() keys its result by the database pk type, so a string "1" must be coerced via the pk field's get_prep_value() before the membership check, exactly as queryset.get(pk=...) does. Without this, string pks raised a spurious does_not_exist error. Also make the regression tests rely on the pks actually created in setUp rather than hard-coding 1..5, which is not guaranteed across backends/test ordering, and add an explicit string-pk test.
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in bulk validation hook for relational fields to eliminate N+1 queries during PrimaryKeyRelatedField(many=True) validation, resolving related instances via a single in_bulk() query and adding regression coverage for query count and semantic parity.
Changes:
- Added
RelatedField.to_internal_value_bulk()as a bulk-conversion hook used byManyRelatedField. - Implemented a batched
to_internal_value_bulk()onPrimaryKeyRelatedFieldusingqueryset.in_bulk(...). - Added regression tests to ensure
PrimaryKeyRelatedField(many=True)validates with one query and preserves ordering/duplicates/errors/pk_fieldbehavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
rest_framework/relations.py |
Adds the bulk validation hook and switches ManyRelatedField to use it; implements PK bulk resolution via in_bulk(). |
tests/test_relations_pk.py |
Adds regression tests asserting single-query validation and parity behaviors for PK many validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.child_relation.to_internal_value(item) | ||
| for item in data | ||
| ] | ||
| return self.child_relation.to_internal_value_bulk(data) |
| pks = [] | ||
| for item in data: | ||
| value = item | ||
| if self.pk_field is not None: | ||
| value = self.pk_field.to_internal_value(value) | ||
| try: | ||
| if isinstance(value, bool): | ||
| raise TypeError | ||
| # Coerce to the pk's Python type (e.g. "1" -> 1) so the lookup | ||
| # below matches the keys returned by `in_bulk()`, exactly as | ||
| # `queryset.get(pk=value)` would have. | ||
| value = model_pk.get_prep_value(value) | ||
| except (TypeError, ValueError): | ||
| self.fail('incorrect_type', data_type=type(item).__name__) | ||
| pks.append(value) | ||
| try: | ||
| objects = queryset.in_bulk(pks) | ||
| except (TypeError, ValueError): | ||
| # queryset doesn't support in_bulk (e.g. distinct/sliced); fall | ||
| # back to the per-item path so behavior is unchanged. | ||
| return [self.to_internal_value(item) for item in data] | ||
| result = [] | ||
| for pk in pks: | ||
| if pk not in objects: | ||
| self.fail('does_not_exist', pk_value=pk) |
- Report `incorrect_type` / `does_not_exist` details using the post-`pk_field` value (matching the per-item `to_internal_value` path) instead of the raw input or the pk-coerced lookup key. With a type-changing `pk_field` (e.g. BooleanField) the bulk path previously reported a different `data_type`. - `ManyRelatedField.to_internal_value` now falls back to the per-item loop when the child field has no `to_internal_value_bulk`, so wrapping a non-RelatedField child no longer raises AttributeError. Adds regression tests for both.
|
Thanks for the review. Addressed both points in 1. bulk = getattr(self.child_relation, 'to_internal_value_bulk', None)
if bulk is not None:
return bulk(data)
return [self.child_relation.to_internal_value(item) for item in data]2. Error-detail divergence with a custom Added regression tests for both: one asserting the bulk error detail matches the per-item path under a type-changing |
Fixes #9607.
ManyRelatedField.to_internal_valueresolved each related object with its ownto_internal_value()call, so validating a list of N primary keys ran N SELECT queries. As @sevdog noted on the issue, the many-related path delegates per-item and does no DB-level batching.Change
to_internal_value_bulk()hook onRelatedField, defaulting to the existing per-item loop — soSlugRelatedField,HyperlinkedRelatedField, and custom relations are completely unchanged.PrimaryKeyRelatedFieldto resolve every pk with a singlein_bulk()query.No behavior change — only fewer queries
Per-item error semantics (
incorrect_type/does_not_exist), input ordering, duplicate handling, the queryset filter, andpk_fieldtransforms are all preserved. A value whose type the backend cannot compare falls back to the per-item path so the offending item still raises the same error.Tests
Adds regression tests in
tests/test_relations_pk.py, including anassertNumQueries(1)guard that fails on the current code (5 queries executed, 1 expected) and passes with the fix, plus parity tests for ordering, duplicates,does_not_exist,incorrect_type, queryset filtering, andpk_field.Follow-up
ListSerializer.createhas the same per-item shape (also flagged on the issue); left out here to keep this change surgical.