Skip to content

Avoid N+1 queries in PrimaryKeyRelatedField(many=True) validation#9984

Open
adelkhayata76 wants to merge 4 commits into
encode:mainfrom
adelkhayata76:fix/9607-pk-related-n-plus-one
Open

Avoid N+1 queries in PrimaryKeyRelatedField(many=True) validation#9984
adelkhayata76 wants to merge 4 commits into
encode:mainfrom
adelkhayata76:fix/9607-pk-related-n-plus-one

Conversation

@adelkhayata76

Copy link
Copy Markdown

Fixes #9607.

ManyRelatedField.to_internal_value resolved each related object with its own to_internal_value() call, so validating a list of N primary keys ran N SELECT queries. As @sevdog noted on the issue, the many-related path delegates per-item and does no DB-level batching.

Change

  • Add an opt-in to_internal_value_bulk() hook on RelatedField, defaulting to the existing per-item loop — so SlugRelatedField, HyperlinkedRelatedField, and custom relations are completely unchanged.
  • Override it on PrimaryKeyRelatedField to resolve every pk with a single in_bulk() query.

No behavior change — only fewer queries

Per-item error semantics (incorrect_type / does_not_exist), input ordering, duplicate handling, the queryset filter, and pk_field transforms are all preserved. A value whose type the backend cannot compare falls back to the per-item path so the offending item still raises the same error.

input pks before after
10 10 SELECT 1 SELECT

Tests

Adds regression tests in tests/test_relations_pk.py, including an assertNumQueries(1) guard that fails on the current code (5 queries executed, 1 expected) and passes with the fix, plus parity tests for ordering, duplicates, does_not_exist, incorrect_type, queryset filtering, and pk_field.

Follow-up

ListSerializer.create has the same per-item shape (also flagged on the issue); left out here to keep this change surgical.

ManyRelatedField.to_internal_value resolved each related object with a
separate child_relation.to_internal_value() call, so validating a list of
N primary keys issued N SELECT queries (encode#9607).

Add an opt-in to_internal_value_bulk() hook on RelatedField (defaulting to
the existing per-item loop, so SlugRelatedField, HyperlinkedRelatedField and
custom relations are unchanged) and override it on PrimaryKeyRelatedField to
resolve every pk with a single in_bulk() query.

Per-item error semantics (incorrect_type / does_not_exist), input ordering,
duplicate handling, the queryset filter and pk_field transforms are all
preserved; a type the backend cannot compare falls back to the per-item path
so the offending item still raises the same error.
Handle string primary keys (e.g. from HTML form input): in_bulk() keys its
result by the database pk type, so a string "1" must be coerced via the pk
field's get_prep_value() before the membership check, exactly as
queryset.get(pk=...) does. Without this, string pks raised a spurious
does_not_exist error.

Also make the regression tests rely on the pks actually created in setUp
rather than hard-coding 1..5, which is not guaranteed across backends/test
ordering, and add an explicit string-pk test.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in bulk validation hook for relational fields to eliminate N+1 queries during PrimaryKeyRelatedField(many=True) validation, resolving related instances via a single in_bulk() query and adding regression coverage for query count and semantic parity.

Changes:

  • Added RelatedField.to_internal_value_bulk() as a bulk-conversion hook used by ManyRelatedField.
  • Implemented a batched to_internal_value_bulk() on PrimaryKeyRelatedField using queryset.in_bulk(...).
  • Added regression tests to ensure PrimaryKeyRelatedField(many=True) validates with one query and preserves ordering/duplicates/errors/pk_field behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
rest_framework/relations.py Adds the bulk validation hook and switches ManyRelatedField to use it; implements PK bulk resolution via in_bulk().
tests/test_relations_pk.py Adds regression tests asserting single-query validation and parity behaviors for PK many validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rest_framework/relations.py Outdated
self.child_relation.to_internal_value(item)
for item in data
]
return self.child_relation.to_internal_value_bulk(data)
Comment thread rest_framework/relations.py Outdated
Comment on lines +277 to +301
pks = []
for item in data:
value = item
if self.pk_field is not None:
value = self.pk_field.to_internal_value(value)
try:
if isinstance(value, bool):
raise TypeError
# Coerce to the pk's Python type (e.g. "1" -> 1) so the lookup
# below matches the keys returned by `in_bulk()`, exactly as
# `queryset.get(pk=value)` would have.
value = model_pk.get_prep_value(value)
except (TypeError, ValueError):
self.fail('incorrect_type', data_type=type(item).__name__)
pks.append(value)
try:
objects = queryset.in_bulk(pks)
except (TypeError, ValueError):
# queryset doesn't support in_bulk (e.g. distinct/sliced); fall
# back to the per-item path so behavior is unchanged.
return [self.to_internal_value(item) for item in data]
result = []
for pk in pks:
if pk not in objects:
self.fail('does_not_exist', pk_value=pk)
- Report `incorrect_type` / `does_not_exist` details using the post-`pk_field`
  value (matching the per-item `to_internal_value` path) instead of the raw
  input or the pk-coerced lookup key. With a type-changing `pk_field` (e.g.
  BooleanField) the bulk path previously reported a different `data_type`.
- `ManyRelatedField.to_internal_value` now falls back to the per-item loop
  when the child field has no `to_internal_value_bulk`, so wrapping a
  non-RelatedField child no longer raises AttributeError.

Adds regression tests for both.
@adelkhayata76

Copy link
Copy Markdown
Author

Thanks for the review. Addressed both points in b0a437d0:

1. AttributeError when the child isn't a RelatedField — good catch. ManyRelatedField.to_internal_value now falls back to the per-item loop when the child field has no to_internal_value_bulk, so the optimization only applies to relational children and any other child field type keeps working as before:

bulk = getattr(self.child_relation, 'to_internal_value_bulk', None)
if bulk is not None:
    return bulk(data)
return [self.child_relation.to_internal_value(item) for item in data]

2. Error-detail divergence with a custom pk_field — you're right, and it was observable. With pk_field=BooleanField() and input "true", the per-item path reported received bool while the bulk path reported received str. The bulk method now tracks (lookup_key, value) pairs and uses the post-pk_field value for both incorrect_type (type(value)) and does_not_exist (pk_value=value) details — matching to_internal_value exactly — while the pk-coerced lookup_key is used only to match in_bulk() results.

Added regression tests for both: one asserting the bulk error detail matches the per-item path under a type-changing pk_field, and one asserting a ManyRelatedField wrapping a non-relational child still validates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance issue: N+1 queries and slow validation when using many=True with serializers containing relational fields

2 participants