Skip to content

ENG-2590: Handle dataset validation errors gracefully and add skip_validation param#7475

Merged
adamsachs merged 9 commits intomainfrom
asachs/handle-serialization-errors
Feb 26, 2026
Merged

ENG-2590: Handle dataset validation errors gracefully and add skip_validation param#7475
adamsachs merged 9 commits intomainfrom
asachs/handle-serialization-errors

Conversation

@adamsachs
Copy link
Contributor

@adamsachs adamsachs commented Feb 24, 2026

Ticket ENG-2590

Description Of Changes

Previously, retrieving datasets via GET /api/v1/dataset or GET /api/v1/dataset/{fides_key} would return a generic 500 Internal Server Error when persisted dataset data failed pydantic/fideslang validation during response serialization (e.g. a field with data_type='string' that has subfields). This made it impossible to diagnose or even retrieve the problematic data.

This PR adds two improvements:

  1. Exception handlers that catch ResponseValidationError (FastAPI response serialization) and pydantic.ValidationError (explicit model_validate() calls) and return structured 422 responses with actionable error details instead of opaque 500s.
  2. skip_validation query parameter on both the list and get dataset endpoints, allowing users to bypass pydantic response validation entirely for troubleshooting — making it possible to retrieve and inspect datasets that contain invalid data.

This is the companion PR to fidesplus#3134, which fixes the root cause (discovery monitor promotion not updating data_type when a field changes from string to object).

Code Changes

  • src/fides/api/api/v1/exception_handlers.py - Added response_validation_error_handler and pydantic_validation_error_handler that return 422 responses with structured error details and log at ERROR level
  • src/fides/api/app_setup.py - Registered both new exception handlers with the FastAPI application
  • src/fides/api/api/v1/endpoints/generic_overrides.py - Added skip_validation query parameter (default false) to GET /dataset and GET /dataset/{fides_key} endpoints; when true, returns raw JSON bypassing pydantic validation
  • tests/ops/api/v1/endpoints/test_generic_overrides.py - Added test coverage for both the skip_validation parameter and the 422 error handler behavior
  • changelog/7475-dataset-validation-error-handling.yaml - Changelog entry

Steps to Confirm

  1. Ensure a dataset with invalid field data exists in the DB (e.g. a field with data_type='string' that has subfields)
  2. GET /api/v1/dataset — should return 422 with structured errors array instead of 500
  3. GET /api/v1/dataset?skip_validation=true — should return 200 with raw data including the invalid dataset
  4. GET /api/v1/dataset/{fides_key}?skip_validation=true — should return 200 with raw data for the specific invalid dataset
  5. GET /api/v1/dataset/{fides_key} (without skip_validation) for an invalid dataset — should return 422
  6. Check server logs for ERROR level entries containing structured validation error details

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

@vercel
Copy link
Contributor

vercel bot commented Feb 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Feb 26, 2026 6:37pm
fides-privacy-center Ignored Ignored Feb 26, 2026 6:37pm

Request Review

@adamsachs adamsachs changed the title [draft] add serialization handlers ENG-2590: Handle dataset validation errors gracefully and add skip_validation param Feb 24, 2026
Copy link
Contributor

@JadeCara JadeCara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major red flags, one question/comment -approving to unblock.

Comment on lines 220 to 223
if skip_validation:
return JSONResponse(
content=jsonable_encoder([result.__dict__ for result in results])
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe SQLAlchemy attaches _sa_instance_state to every model instance's __dict__. In this case jsonable_encoder will attempt to serialize it (falling back to str() on non-serializable objects), potentially leaking internal ORM state in the API response. Do we want that there for debugging? It may just look like ugly noise for the consumer.

Could we use jsonable_encoder(result) directly — I am pretty sure jsonable_encoder has SQLAlchemy-aware handling when given the model object rather than its __dict__

@adamsachs adamsachs marked this pull request as ready for review February 26, 2026 15:44
@adamsachs adamsachs requested a review from a team as a code owner February 26, 2026 15:44
@adamsachs adamsachs requested review from johnewart and removed request for a team and johnewart February 26, 2026 15:44
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

This PR improves error handling for dataset validation failures by replacing opaque 500 errors with structured 422 responses containing actionable error details. It adds a global ResponseValidationError handler in app_setup.py and a local _dataset_validation_error_response helper in generic_overrides.py for explicit validation. The skip_validation query parameter allows authorized users to bypass validation for troubleshooting purposes.

Key changes:

  • Added response_validation_error_handler to handle FastAPI response serialization failures globally
  • Added skip_validation parameter to GET /dataset and GET /dataset/{fides_key} endpoints
  • Wrapped explicit model_validate() calls in try-catch blocks to return structured 422 errors
  • Comprehensive test coverage for both skip_validation and error handling paths

The implementation correctly scopes error handling to avoid masking unrelated validation errors. Previous review comments about global pydantic.ValidationError handlers have been addressed - only ResponseValidationError is handled globally.

Confidence Score: 4/5

  • Safe to merge with minor observations about existing patterns that could be improved
  • The implementation is sound and solves the stated problem effectively. Error handling is appropriately scoped, test coverage is comprehensive, and the troubleshooting mechanism works as intended. The score reflects that previous review comments about __dict__ usage on SQLAlchemy models and manual fixture cleanup remain unaddressed, though these follow existing patterns in the codebase.
  • No files require special attention - all changes are well-tested and implement the requirements correctly

Important Files Changed

Filename Overview
src/fides/api/api/v1/exception_handlers.py Adds response validation error handler to convert ResponseValidationError to structured 422 responses with detailed error information
src/fides/api/app_setup.py Registers the ResponseValidationError handler globally to catch FastAPI response serialization failures
src/fides/api/api/v1/endpoints/generic_overrides.py Adds skip_validation parameter and local validation error handling; includes try-catch blocks for explicit validation with structured 422 error responses
tests/ops/api/v1/endpoints/test_generic_overrides.py Comprehensive test coverage for skip_validation parameter and 422 error responses; validates both paginated and non-paginated endpoints

Last reviewed commit: f36ae01

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 117 to 120
fastapi_app.add_exception_handler(
ValidationError,
pydantic_validation_error_handler, # type: ignore[arg-type]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global pydantic.ValidationError handler is overly broad

Registering a global handler for pydantic.ValidationError will catch all unhandled pydantic validation errors across the entire application, not just dataset response serialization failures. Any endpoint that has an uncaught model_validate() or model_dump() call (now or in the future) will return the generic message "The requested resource contains data that fails validation" — which may be misleading for non-dataset errors.

For example, if a future endpoint accidentally raises pydantic.ValidationError due to a coding error, this handler will convert what should be a 500 (signaling a bug) into a 422, silently masking the real issue.

Consider scoping this more narrowly — e.g., catching pydantic.ValidationError only in the dataset endpoints themselves (similar to how create/update endpoints already do), or adding a check in the handler to only handle known "response serialization" scenarios and re-raise unexpected ones.

Comment on lines 220 to 223
if skip_validation:
return JSONResponse(
content=jsonable_encoder([result.__dict__ for result in results])
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__dict__ on SQLAlchemy model exposes internal state

Using result.__dict__ on a SQLAlchemy ORM instance includes SQLAlchemy's internal _sa_instance_state attribute. While jsonable_encoder() happens to handle this without error in most cases (it skips non-serializable attributes), it's fragile and can result in unexpected keys in the response depending on the encoder's behavior.

Consider filtering the dict or using a safer serialization approach:

Suggested change
if skip_validation:
return JSONResponse(
content=jsonable_encoder([result.__dict__ for result in results])
)
return JSONResponse(
content=jsonable_encoder(
[{k: v for k, v in result.__dict__.items() if not k.startswith("_")} for result in results]
)
)

Comment on lines +276 to +279
yield dataset

db.delete(dataset)
db.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual record deletion in fixture teardown

The dataset_with_invalid_field fixture manually deletes records in its teardown. Per repository convention, the database is automatically cleared between test runs, making this cleanup unnecessary and potentially error-prone if the test fails before reaching the yield.

Suggested change
yield dataset
db.delete(dataset)
db.commit()
yield dataset

Context Used: Rule from dashboard - Do not manually delete database records in test fixtures or at the end of tests, as the database is ... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

- Use jsonable_encoder(result) instead of result.__dict__ to avoid leaking SQLAlchemy internal state
- Scope pydantic ValidationError handling to dataset endpoints instead of global handler

Made-with: Cursor
@adamsachs
Copy link
Contributor Author

@greptile

The Page pydantic model triggers validation on items during
jsonable_encoder, defeating skip_validation. Build the response
dict manually to bypass pydantic serialization entirely.

Made-with: Cursor
@adamsachs adamsachs added this pull request to the merge queue Feb 26, 2026
Merged via the queue into main with commit 0707523 Feb 26, 2026
54 checks passed
@adamsachs adamsachs deleted the asachs/handle-serialization-errors branch February 26, 2026 19:16
adamsachs added a commit that referenced this pull request Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants