Fix #27079: [BUG] S3 Ingestion Failure: Pydantic ValidationError 'ext…#27180
Fix #27079: [BUG] S3 Ingestion Failure: Pydantic ValidationError 'ext…#27180
Conversation
…ra_forbidden' for BucketArn in S3BucketResponse
There was a problem hiding this comment.
Pull request overview
This PR addresses ingestion-time Pydantic ValidationError failures caused by cloud SDK responses including additional (previously forbidden) fields, primarily for S3 bucket listing responses.
Changes:
- Relaxed Pydantic models to ignore unexpected fields in S3 bucket responses (and similarly for GCS bucket + SageMaker model response models).
- Updated S3 unit test fixtures to include additional bucket fields (e.g.,
BucketArn,BucketRegion) to reproduce/guard the regression.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| ingestion/tests/unit/topology/storage/test_s3_storage.py | Extends mocked S3 list_buckets response with extra fields to validate tolerant parsing. |
| ingestion/src/metadata/ingestion/source/storage/s3/models.py | Changes S3BucketResponse to ignore extra keys to prevent extra_forbidden failures. |
| ingestion/src/metadata/ingestion/source/storage/gcs/models.py | Migrates GCSBucketResponse config to Pydantic v2 ConfigDict and ignores extra keys. |
| ingestion/src/metadata/ingestion/source/mlmodel/sagemaker/metadata.py | Updates SageMakerModel to ignore extra keys. |
| from typing import List, Optional | ||
|
|
||
| from pydantic import BaseModel, Extra, Field | ||
| from pydantic import BaseModel, ConfigDict, Field |
There was a problem hiding this comment.
Extra was removed from the pydantic imports, but this module still references Extra in GCSContainerDetails.Config later in the file. This will raise a NameError at import time and break GCS ingestion/tests. Re-add the Extra import or (preferred, Pydantic v2 style) migrate GCSContainerDetails to model_config = ConfigDict(extra="forbid") and drop the inner Config class.
| class SageMakerModel(BaseModel): | ||
| model_config = ConfigDict( | ||
| extra="forbid", | ||
| extra="ignore", |
There was a problem hiding this comment.
ignore is the default so we can just delete that completely
Code Review ✅ ApprovedFixes S3 ingestion failure caused by Pydantic ValidationError. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
🟡 Playwright Results — all passed (25 flaky)✅ 3593 passed · ❌ 0 failed · 🟡 25 flaky · ⏭️ 207 skipped
🟡 25 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
|
Failed to cherry-pick changes to the 1.12.5 branch. |
|
…ionError 'ext… (open-metadata#27180) * Fix open-metadata#27079: [BUG] S3 Ingestion Failure: Pydantic ValidationError 'extra_forbidden' for BucketArn in S3BucketResponse * Address comments
…ionError 'ext… (open-metadata#27180) * Fix open-metadata#27079: [BUG] S3 Ingestion Failure: Pydantic ValidationError 'extra_forbidden' for BucketArn in S3BucketResponse * Address comments



…ra_forbidden' for BucketArn in S3BucketResponse
Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Summary by Gitar
extraconfig fromforbidtoignoreinS3BucketResponse,GCSBucketResponse, andSageMakerModelto handle extra fields likeBucketArnandBucketRegionBucketArnandBucketRegionfields to S3 bucket mock responses in test casesThis will update automatically on new commits.