ENG-2590: Handle dataset validation errors gracefully and add skip_validation param by adamsachs · Pull Request #7475 · ethyca/fides

adamsachs · 2026-02-24T14:24:19Z

Description Of Changes

Previously, retrieving datasets via GET /api/v1/dataset or GET /api/v1/dataset/{fides_key} would return a generic 500 Internal Server Error when persisted dataset data failed pydantic/fideslang validation during response serialization (e.g. a field with data_type='string' that has subfields). This made it impossible to diagnose or even retrieve the problematic data.

This PR adds two improvements:

Exception handlers that catch ResponseValidationError (FastAPI response serialization) and pydantic.ValidationError (explicit model_validate() calls) and return structured 422 responses with actionable error details instead of opaque 500s.
skip_validation query parameter on both the list and get dataset endpoints, allowing users to bypass pydantic response validation entirely for troubleshooting — making it possible to retrieve and inspect datasets that contain invalid data.

This is the companion PR to fidesplus#3134, which fixes the root cause (discovery monitor promotion not updating data_type when a field changes from string to object).

Code Changes

src/fides/api/api/v1/exception_handlers.py - Added response_validation_error_handler and pydantic_validation_error_handler that return 422 responses with structured error details and log at ERROR level
src/fides/api/app_setup.py - Registered both new exception handlers with the FastAPI application
src/fides/api/api/v1/endpoints/generic_overrides.py - Added skip_validation query parameter (default false) to GET /dataset and GET /dataset/{fides_key} endpoints; when true, returns raw JSON bypassing pydantic validation
tests/ops/api/v1/endpoints/test_generic_overrides.py - Added test coverage for both the skip_validation parameter and the 422 error handler behavior
changelog/7475-dataset-validation-error-handling.yaml - Changelog entry

Steps to Confirm

Ensure a dataset with invalid field data exists in the DB (e.g. a field with data_type='string' that has subfields)
GET /api/v1/dataset — should return 422 with structured errors array instead of 500
GET /api/v1/dataset?skip_validation=true — should return 200 with raw data including the invalid dataset
GET /api/v1/dataset/{fides_key}?skip_validation=true — should return 200 with raw data for the specific invalid dataset
GET /api/v1/dataset/{fides_key} (without skip_validation) for an invalid dataset — should return 422
Check server logs for ERROR level entries containing structured validation error details

Pre-Merge Checklist

vercel · 2026-02-24T14:24:23Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
fides-plus-nightly	Ignored	Preview	Feb 26, 2026 6:37pm
fides-privacy-center	Ignored		Feb 26, 2026 6:37pm

JadeCara

No major red flags, one question/comment -approving to unblock.

JadeCara · 2026-02-25T22:59:40Z

src/fides/api/api/v1/endpoints/generic_overrides.py

+        if skip_validation:
+            return JSONResponse(
+                content=jsonable_encoder([result.__dict__ for result in results])
+            )


I believe SQLAlchemy attaches _sa_instance_state to every model instance's __dict__. In this case jsonable_encoder will attempt to serialize it (falling back to str() on non-serializable objects), potentially leaking internal ORM state in the API response. Do we want that there for debugging? It may just look like ugly noise for the consumer.

Could we use jsonable_encoder(result) directly — I am pretty sure jsonable_encoder has SQLAlchemy-aware handling when given the model object rather than its __dict__

greptile-apps · 2026-02-26T15:48:10Z

Greptile Summary

This PR improves error handling for dataset validation failures by replacing opaque 500 errors with structured 422 responses containing actionable error details. It adds a global ResponseValidationError handler in app_setup.py and a local _dataset_validation_error_response helper in generic_overrides.py for explicit validation. The skip_validation query parameter allows authorized users to bypass validation for troubleshooting purposes.

Key changes:

Added response_validation_error_handler to handle FastAPI response serialization failures globally
Added skip_validation parameter to GET /dataset and GET /dataset/{fides_key} endpoints
Wrapped explicit model_validate() calls in try-catch blocks to return structured 422 errors
Comprehensive test coverage for both skip_validation and error handling paths

The implementation correctly scopes error handling to avoid masking unrelated validation errors. Previous review comments about global pydantic.ValidationError handlers have been addressed - only ResponseValidationError is handled globally.

Confidence Score: 4/5

Safe to merge with minor observations about existing patterns that could be improved
The implementation is sound and solves the stated problem effectively. Error handling is appropriately scoped, test coverage is comprehensive, and the troubleshooting mechanism works as intended. The score reflects that previous review comments about __dict__ usage on SQLAlchemy models and manual fixture cleanup remain unaddressed, though these follow existing patterns in the codebase.
No files require special attention - all changes are well-tested and implement the requirements correctly

Important Files Changed

Filename	Overview
src/fides/api/api/v1/exception_handlers.py	Adds response validation error handler to convert ResponseValidationError to structured 422 responses with detailed error information
src/fides/api/app_setup.py	Registers the ResponseValidationError handler globally to catch FastAPI response serialization failures
src/fides/api/api/v1/endpoints/generic_overrides.py	Adds skip_validation parameter and local validation error handling; includes try-catch blocks for explicit validation with structured 422 error responses
tests/ops/api/v1/endpoints/test_generic_overrides.py	Comprehensive test coverage for skip_validation parameter and 422 error responses; validates both paginated and non-paginated endpoints

_{Last reviewed commit: f36ae01}

greptile-apps

_{5 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-26T15:48:13Z

src/fides/api/app_setup.py

+    fastapi_app.add_exception_handler(
+        ValidationError,
+        pydantic_validation_error_handler,  # type: ignore[arg-type]
+    )


Global pydantic.ValidationError handler is overly broad

Registering a global handler for pydantic.ValidationError will catch all unhandled pydantic validation errors across the entire application, not just dataset response serialization failures. Any endpoint that has an uncaught model_validate() or model_dump() call (now or in the future) will return the generic message "The requested resource contains data that fails validation" — which may be misleading for non-dataset errors.

For example, if a future endpoint accidentally raises pydantic.ValidationError due to a coding error, this handler will convert what should be a 500 (signaling a bug) into a 422, silently masking the real issue.

Consider scoping this more narrowly — e.g., catching pydantic.ValidationError only in the dataset endpoints themselves (similar to how create/update endpoints already do), or adding a check in the handler to only handle known "response serialization" scenarios and re-raise unexpected ones.

greptile-apps · 2026-02-26T15:48:14Z

src/fides/api/api/v1/endpoints/generic_overrides.py

+        if skip_validation:
+            return JSONResponse(
+                content=jsonable_encoder([result.__dict__ for result in results])
+            )


__dict__ on SQLAlchemy model exposes internal state

Using result.__dict__ on a SQLAlchemy ORM instance includes SQLAlchemy's internal _sa_instance_state attribute. While jsonable_encoder() happens to handle this without error in most cases (it skips non-serializable attributes), it's fragile and can result in unexpected keys in the response depending on the encoder's behavior.

Consider filtering the dict or using a safer serialization approach:

Suggested change

if skip_validation:

return JSONResponse(

content=jsonable_encoder([result.__dict__ for result in results])

)

return JSONResponse(

content=jsonable_encoder(

[{k: v for k, v in result.__dict__.items() if not k.startswith("_")} for result in results]

)

)

greptile-apps · 2026-02-26T15:48:16Z

tests/ops/api/v1/endpoints/test_generic_overrides.py

+    yield dataset
+
+    db.delete(dataset)
+    db.commit()


Manual record deletion in fixture teardown

The dataset_with_invalid_field fixture manually deletes records in its teardown. Per repository convention, the database is automatically cleared between test runs, making this cleanup unnecessary and potentially error-prone if the test fails before reaching the yield.

Suggested change

yield dataset

db.delete(dataset)

db.commit()

yield dataset

Context Used: Rule from dashboard - Do not manually delete database records in test fixtures or at the end of tests, as the database is ... (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

- Use jsonable_encoder(result) instead of result.__dict__ to avoid leaking SQLAlchemy internal state - Scope pydantic ValidationError handling to dataset endpoints instead of global handler Made-with: Cursor

adamsachs · 2026-02-26T17:56:47Z

@greptile

The Page pydantic model triggers validation on items during jsonable_encoder, defeating skip_validation. Build the response dict manually to bypass pydantic serialization entirely. Made-with: Cursor

…lidation param (#7475)

add serialization handlers

00d7690

adamsachs added 4 commits February 24, 2026 12:01

provide option to bypass schema serialization on dataset response

c0cf1e8

add some test coverage

bb7b23d

mypy

b74665c

changelog

cbf3666

adamsachs changed the title ~~[draft] add serialization handlers~~ ENG-2590: Handle dataset validation errors gracefully and add skip_validation param Feb 24, 2026

typing

62323a1

JadeCara approved these changes Feb 25, 2026

View reviewed changes

Merge branch 'main' into asachs/handle-serialization-errors

1d90728

adamsachs marked this pull request as ready for review February 26, 2026 15:44

adamsachs requested a review from a team as a code owner February 26, 2026 15:44

adamsachs requested review from johnewart and removed request for a team and johnewart February 26, 2026 15:44

greptile-apps bot reviewed Feb 26, 2026

View reviewed changes

vercel bot deployed to Preview – fides-plus-nightly February 26, 2026 15:48 View deployment

Address PR review feedback

f36ae01

- Use jsonable_encoder(result) instead of result.__dict__ to avoid leaking SQLAlchemy internal state - Scope pydantic ValidationError handling to dataset endpoints instead of global handler Made-with: Cursor

Fix paginated skip_validation path serializing through Page model

78f033d

The Page pydantic model triggers validation on items during jsonable_encoder, defeating skip_validation. Build the response dict manually to bypass pydantic serialization entirely. Made-with: Cursor

adamsachs added this pull request to the merge queue Feb 26, 2026

Merged via the queue into main with commit 0707523 Feb 26, 2026
54 checks passed

adamsachs deleted the asachs/handle-serialization-errors branch February 26, 2026 19:16

adamsachs added a commit that referenced this pull request Feb 26, 2026

ENG-2590: Handle dataset validation errors gracefully and add skip_va…

63bc1d3

…lidation param (#7475)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG-2590: Handle dataset validation errors gracefully and add skip_validation param#7475

ENG-2590: Handle dataset validation errors gracefully and add skip_validation param#7475
adamsachs merged 9 commits intomainfrom
asachs/handle-serialization-errors

adamsachs commented Feb 24, 2026 •

edited by atlassian bot

Loading

Uh oh!

vercel bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

JadeCara left a comment

Uh oh!

JadeCara Feb 25, 2026

Uh oh!

greptile-apps bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

adamsachs commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adamsachs commented Feb 24, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Code Changes

Steps to Confirm

Pre-Merge Checklist

Uh oh!

vercel bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JadeCara left a comment

Choose a reason for hiding this comment

Uh oh!

JadeCara Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

adamsachs commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adamsachs commented Feb 24, 2026 •

edited by atlassian bot

Loading

vercel bot commented Feb 24, 2026 •

edited

Loading

greptile-apps bot commented Feb 26, 2026 •

edited

Loading