-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: added s3 artifact service #3873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @miyannishar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the application's artifact management capabilities by introducing native support for Amazon S3 as a storage backend. This integration provides users with greater flexibility in choosing their artifact storage solutions, complementing existing file-based and Google Cloud Storage options. The changes encompass the implementation of a dedicated S3 artifact service, its seamless registration within the application's framework, and thorough unit testing to guarantee its reliability and correct operation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Response from ADK Triaging Agent Hello @miyannishar, thank you for creating this PR! Before we can review your contribution, we need you to address a few items from our contribution guidelines:
You can find more details in our contribution guidelines. This information will help us to review your PR more efficiently. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new S3ArtifactService for storing artifacts in Amazon S3, along with the necessary configuration and tests. The implementation is well-structured and follows the async-over-sync pattern seen in other artifact services in the project.
A major point of concern is that the pull request title, 'fix: pass context to client inceptors', and the empty description do not reflect the content of the changes. It is crucial to update them to accurately describe the addition of the S3 artifact service for clarity and history tracking.
I've added a few comments with suggestions for improving maintainability, robustness, and code clarity.
| except Exception as e: | ||
| logger.error( | ||
| "Failed to list session artifacts for %s: %s", session_id, e | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching a broad Exception can hide unexpected errors and make debugging difficult, especially when the exception is not re-raised. It's better to catch more specific exceptions that you expect boto3 to raise, such as botocore.exceptions.ClientError. This makes the error handling more robust. This same issue is present in several other try...except blocks in this file where exceptions are not re-raised:
_list_artifact_keys_sync(lines 393-396 and 413-414)_delete_artifact_sync(lines 440-441)_list_versions_sync(lines 483-485)_get_artifact_version_sync(lines 538-545)
Please narrow the exception type in all these locations. You'll need to import ClientError from botocore.exceptions.
| except Exception as e: | |
| logger.error( | |
| "Failed to list session artifacts for %s: %s", session_id, e | |
| ) | |
| except ClientError as e: | |
| logger.error( | |
| "Failed to list session artifacts for %s: %s", session_id, e | |
| ) |
| """ | ||
| if self._file_has_user_namespace(filename): | ||
| # Remove "user:" prefix before encoding | ||
| actual_filename = filename[5:] # len("user:") == 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a magic number 5 for the length of "user:" can be brittle and harms readability. It's better to calculate the length directly using len("user:"). For further improvement and consistency with other artifact services (e.g., file_artifact_service.py), consider defining a module-level constant _USER_NAMESPACE_PREFIX = "user:" and using it here and in _file_has_user_namespace.
| actual_filename = filename[5:] # len("user:") == 5 | |
| actual_filename = filename[len("user:"):] # len("user:") == 5 |
| Bucket=self.bucket_name, Key=object_key | ||
| ) | ||
|
|
||
| metadata = response.get("Metadata", {}) or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The or {} is redundant here. response.get("Metadata", {}) will never return a falsy value like None that would cause the or to be evaluated; it will return the value for "Metadata" if it exists, or the default value {} if it doesn't. Removing the redundant part makes the code cleaner.
| metadata = response.get("Metadata", {}) or {} | |
| metadata = response.get("Metadata", {}) |
| if service_type == ArtifactServiceType.GCS: | ||
| uri = f"gs://test_bucket/{app_name}/{user_id}/user/{user_scoped_filename}/{i}" | ||
| elif service_type == ArtifactServiceType.S3: | ||
| uri = f"s3://test_bucket/{app_name}/{user_id}/user/document.pdf/{i}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename document.pdf is hardcoded here. It's better to derive it from the user_scoped_filename variable to make the test more robust and less reliant on magic values. This ensures that if user_scoped_filename changes, this test won't break unexpectedly.
| uri = f"s3://test_bucket/{app_name}/{user_id}/user/document.pdf/{i}" | |
| uri = f"s3://test_bucket/{app_name}/{user_id}/user/{user_scoped_filename.split(':', 1)[1]}/{i}" |
98ccc29 to
3c4d3e8
Compare
|
Hi @miyannishar , thanks for the PR, I think it is a great add to the adk-python-community repo, would you mind moving your change there? |
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
Problem:
ADK currently provides
GcsArtifactServicefor Google Cloud Storage andFileArtifactServicefor local development, but there is no native artifact service for Amazon S3. This creates friction for developers and organizations using AWS infrastructure, as they must either:FileArtifactService(not production-ready, limited scalability)GcsArtifactService(requires cross-cloud setup, additional complexity)For AWS-native deployments (EC2, ECS, Lambda, EKS), developers need a first-class S3 artifact storage solution that integrates seamlessly with their existing infrastructure and IAM policies.
Solution:
Implemented
S3ArtifactServiceas a production-ready artifact storage backend that:BaseArtifactServicefollowing the same patterns asGcsArtifactService{app}/{user}/{session}/{filename}/{version}{app}/{user}/user/{filename}/{version}(usinguser:prefix)asyncio.to_threads3://URI scheme support in service registryextensionsKey Files Changed:
src/google/adk/artifacts/s3_artifact_service.py(NEW - ~612 lines)src/google/adk/artifacts/__init__.py(export S3ArtifactService)src/google/adk/cli/service_registry.py(register s3:// URI scheme)pyproject.toml(add boto3>=1.28.0 to extensions)tests/unittests/artifacts/test_artifact_service.py(add S3 tests)Testing Plan
Unit Tests
Unit Test Summary:
Added comprehensive unit tests for S3ArtifactService:
Mock Infrastructure:
MockS3Client: Simulates boto3 S3 clientMockS3Bucket: Simulates S3 bucketMockS3Object: Simulates S3 objects with metadatamock_s3_artifact_service(): Factory function for testsTest Coverage:
test_load_empty[S3]- Loading non-existent artifactstest_save_load_delete[S3]- Basic CRUD operationstest_list_keys[S3]- Listing artifact keystest_list_versions[S3]- Version managementtest_list_keys_preserves_user_prefix[S3]- User-scoped artifactstest_list_artifact_versions_and_get_artifact_version[S3]- Metadata operationstest_list_artifact_versions_with_user_prefix[S3]- User-scoped metadatatest_get_artifact_version_artifact_does_not_exist[S3]- Error handlingtest_get_artifact_version_out_of_index[S3]- Edge casesTest Execution:
Test Results: (Fill this in after running tests)
Manual End-to-End (E2E) Tests
E2E Test 1: Direct Instantiation
Setup:
Expected Output:
E2E Test 2: Service Registry URI Configuration
Setup:
Expected Output:
E2E Test 3: Integration with Runner
Setup:
Console Output: (Include actual output here)
Checklist
Additional Context
Code Quality
@overridedecoratorsfrom __future__ import annotationsDesign Decisions
extensionsto avoid breaking existing installations/,:, spaces)asyncio.to_threadmatchingGcsArtifactServicepatterngs://for consistencyBaseArtifactServiceimplementedComparison with Existing Services
Installation
Users can install with:
Usage Examples
Direct Instantiation:
URI Configuration:
Environment Variable:
Related Documentation
GcsArtifactServicegs://schemeFuture Enhancements (Not in this PR)
app:prefix)