MINOR: Fix missing sesssion for s3 column ingestion#27393
MINOR: Fix missing sesssion for s3 column ingestion#27393
Conversation
| if connection.awsConfig.endPointURL | ||
| else None | ||
| ) | ||
| s3_kwargs = {"endpoint_url": endpoint_url} if endpoint_url else {} |
There was a problem hiding this comment.
⚠️ Bug: CloudWatch client missing endpoint_url for custom endpoints
The refactored get_connection in connection.py builds s3_kwargs with the custom endpoint_url and passes it only to the S3 client (line 58), but the CloudWatch client on line 59 never receives it. The previous code used aws_client.get_client(), which applied endpoint_url to all services when configured (see aws_client.py:226-230).
This means users with a custom endPointURL (e.g., LocalStack, MinIO with CloudWatch-compatible metrics) will have a broken CloudWatch client that points to the default AWS endpoint instead of their custom one. This could cause connection test failures or incorrect metric retrieval.
Suggested fix:
kwargs = {"endpoint_url": endpoint_url} if endpoint_url else {}
return S3ObjectStoreClient(
s3_client=session.client(service_name="s3", **kwargs),
cloudwatch_client=session.client(service_name="cloudwatch", **kwargs),
session=session,
)
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
There was a problem hiding this comment.
Pull request overview
This PR aims to fix S3 structured column inference during storage ingestion by ensuring a boto3 Session is available and propagated into the dataframe reader path (fetch_dataframe_first_chunk), which can be required by underlying S3 filesystem/readers.
Changes:
- Add an optional
sessionparameter toStorageServiceSource.extract_column_definitions/_get_columnsand pass it through tofetch_dataframe_first_chunk. - Capture and pass a session from the S3 source (
S3Source) into_get_columns. - Update the S3 connection wrapper to expose the created boto3 session alongside the S3/CloudWatch clients.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/storage/storage_service.py | Plumbs an optional session through the generic storage column-extraction path into fetch_dataframe_first_chunk. |
| ingestion/src/metadata/ingestion/source/storage/s3/metadata.py | Stores the connection session on the source and passes it into _get_columns during container detail generation. |
| ingestion/src/metadata/ingestion/source/storage/s3/connection.py | Creates and returns a boto3 session as part of S3ObjectStoreClient, and uses it to build service clients. |
| metadata_entry: MetadataEntry, | ||
| session: Any = None, | ||
| ) -> List[Column]: | ||
| """Extract Column related metadata from s3""" |
| extracted_cols = self.extract_column_definitions( | ||
| container_name, sample_key, config_source, client, metadata_entry | ||
| container_name, | ||
| sample_key, | ||
| config_source, | ||
| client, | ||
| metadata_entry, | ||
| session, | ||
| ) |
| s3_client=aws_client.get_client(service_name="s3"), | ||
| cloudwatch_client=aws_client.get_client(service_name="cloudwatch"), | ||
| s3_client=session.client(service_name="s3", **s3_kwargs), | ||
| cloudwatch_client=session.client(service_name="cloudwatch"), |
|
🔴 Playwright Results — 2 failure(s), 21 flaky✅ 3633 passed · ❌ 2 failed · 🟡 21 flaky · ⏭️ 84 skipped
Genuine Failures (failed on all attempts)❌
|



Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>