Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
JadeCara
left a comment
There was a problem hiding this comment.
No major red flags, one question/comment -approving to unblock.
| if skip_validation: | ||
| return JSONResponse( | ||
| content=jsonable_encoder([result.__dict__ for result in results]) | ||
| ) |
There was a problem hiding this comment.
I believe SQLAlchemy attaches _sa_instance_state to every model instance's __dict__. In this case jsonable_encoder will attempt to serialize it (falling back to str() on non-serializable objects), potentially leaking internal ORM state in the API response. Do we want that there for debugging? It may just look like ugly noise for the consumer.
Could we use jsonable_encoder(result) directly — I am pretty sure jsonable_encoder has SQLAlchemy-aware handling when given the model object rather than its __dict__
Greptile SummaryThis PR improves error handling for dataset validation failures by replacing opaque 500 errors with structured 422 responses containing actionable error details. It adds a global Key changes:
The implementation correctly scopes error handling to avoid masking unrelated validation errors. Previous review comments about global Confidence Score: 4/5
Important Files Changed
Last reviewed commit: f36ae01 |
src/fides/api/app_setup.py
Outdated
| fastapi_app.add_exception_handler( | ||
| ValidationError, | ||
| pydantic_validation_error_handler, # type: ignore[arg-type] | ||
| ) |
There was a problem hiding this comment.
Global pydantic.ValidationError handler is overly broad
Registering a global handler for pydantic.ValidationError will catch all unhandled pydantic validation errors across the entire application, not just dataset response serialization failures. Any endpoint that has an uncaught model_validate() or model_dump() call (now or in the future) will return the generic message "The requested resource contains data that fails validation" — which may be misleading for non-dataset errors.
For example, if a future endpoint accidentally raises pydantic.ValidationError due to a coding error, this handler will convert what should be a 500 (signaling a bug) into a 422, silently masking the real issue.
Consider scoping this more narrowly — e.g., catching pydantic.ValidationError only in the dataset endpoints themselves (similar to how create/update endpoints already do), or adding a check in the handler to only handle known "response serialization" scenarios and re-raise unexpected ones.
| if skip_validation: | ||
| return JSONResponse( | ||
| content=jsonable_encoder([result.__dict__ for result in results]) | ||
| ) |
There was a problem hiding this comment.
__dict__ on SQLAlchemy model exposes internal state
Using result.__dict__ on a SQLAlchemy ORM instance includes SQLAlchemy's internal _sa_instance_state attribute. While jsonable_encoder() happens to handle this without error in most cases (it skips non-serializable attributes), it's fragile and can result in unexpected keys in the response depending on the encoder's behavior.
Consider filtering the dict or using a safer serialization approach:
| if skip_validation: | |
| return JSONResponse( | |
| content=jsonable_encoder([result.__dict__ for result in results]) | |
| ) | |
| return JSONResponse( | |
| content=jsonable_encoder( | |
| [{k: v for k, v in result.__dict__.items() if not k.startswith("_")} for result in results] | |
| ) | |
| ) |
| yield dataset | ||
|
|
||
| db.delete(dataset) | ||
| db.commit() |
There was a problem hiding this comment.
Manual record deletion in fixture teardown
The dataset_with_invalid_field fixture manually deletes records in its teardown. Per repository convention, the database is automatically cleared between test runs, making this cleanup unnecessary and potentially error-prone if the test fails before reaching the yield.
| yield dataset | |
| db.delete(dataset) | |
| db.commit() | |
| yield dataset |
Context Used: Rule from dashboard - Do not manually delete database records in test fixtures or at the end of tests, as the database is ... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
- Use jsonable_encoder(result) instead of result.__dict__ to avoid leaking SQLAlchemy internal state - Scope pydantic ValidationError handling to dataset endpoints instead of global handler Made-with: Cursor
|
@greptile |
The Page pydantic model triggers validation on items during jsonable_encoder, defeating skip_validation. Build the response dict manually to bypass pydantic serialization entirely. Made-with: Cursor
Ticket ENG-2590
Description Of Changes
Previously, retrieving datasets via
GET /api/v1/datasetorGET /api/v1/dataset/{fides_key}would return a generic500 Internal Server Errorwhen persisted dataset data failed pydantic/fideslang validation during response serialization (e.g. a field withdata_type='string'that has subfields). This made it impossible to diagnose or even retrieve the problematic data.This PR adds two improvements:
ResponseValidationError(FastAPI response serialization) andpydantic.ValidationError(explicitmodel_validate()calls) and return structured422responses with actionable error details instead of opaque500s.skip_validationquery parameter on both the list and get dataset endpoints, allowing users to bypass pydantic response validation entirely for troubleshooting — making it possible to retrieve and inspect datasets that contain invalid data.Code Changes
src/fides/api/api/v1/exception_handlers.py- Addedresponse_validation_error_handlerandpydantic_validation_error_handlerthat return 422 responses with structured error details and log at ERROR levelsrc/fides/api/app_setup.py- Registered both new exception handlers with the FastAPI applicationsrc/fides/api/api/v1/endpoints/generic_overrides.py- Addedskip_validationquery parameter (defaultfalse) toGET /datasetandGET /dataset/{fides_key}endpoints; whentrue, returns raw JSON bypassing pydantic validationtests/ops/api/v1/endpoints/test_generic_overrides.py- Added test coverage for both theskip_validationparameter and the 422 error handler behaviorchangelog/7475-dataset-validation-error-handling.yaml- Changelog entrySteps to Confirm
data_type='string'that has subfields)GET /api/v1/dataset— should return422with structurederrorsarray instead of500GET /api/v1/dataset?skip_validation=true— should return200with raw data including the invalid datasetGET /api/v1/dataset/{fides_key}?skip_validation=true— should return200with raw data for the specific invalid datasetGET /api/v1/dataset/{fides_key}(withoutskip_validation) for an invalid dataset — should return422ERRORlevel entries containing structured validation error detailsPre-Merge Checklist
CHANGELOG.mdupdated