-
Notifications
You must be signed in to change notification settings - Fork 5
Wip-bring-back-s3-mock-issue-36 #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add .ruff_cache to .gitignore to exclude Ruff linter cache files from version control, preventing unnecessary commits of generated cache data.
Introduce USE_MOCK_S3 flag in .env to toggle between in-process mock S3 (default) and MinIO via Docker. This simplifies the development workflow by eliminating the need for Docker in local setups, while maintaining production compatibility. Updated documentation and startup script accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an in-process Mock S3 implementation to eliminate Docker/MinIO dependency during development. The mock uses FastAPI TestClient to provide S3-compatible REST API responses, allowing file storage to work without external services.
Key changes:
- New S3 mock server with file-based storage supporting core S3 operations (PUT, GET, HEAD, DELETE, ListObjectsV2, tagging)
- MockS3StorageClient that wraps the mock server via TestClient
- Configuration toggle (
USE_MOCK_S3) to switch between mock and MinIO - Documentation updates reflecting the mock S3 option
Reviewed Changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
mocks/s3-mock/storage.py |
Core file storage logic with path sanitization and metadata handling |
mocks/s3-mock/s3_xml.py |
XML generation/parsing for S3 API responses |
mocks/s3-mock/main.py |
FastAPI server with S3 REST endpoints |
mocks/s3-mock/smoke_test.py |
Integration tests for S3 mock functionality |
mocks/s3-mock/README.md |
Documentation for the mock server |
backend/modules/file_storage/mock_s3_client.py |
TestClient-based S3 client for in-process testing |
backend/infrastructure/app_factory.py |
Factory logic to select between mock and real S3 |
backend/modules/config/manager.py |
Added use_mock_s3 setting |
backend/application/chat/preprocessors/message_builder.py |
Added debug logging for file manifests |
agent_start.sh |
Updated to conditionally start MinIO based on mock setting |
GEMINI.md, CLAUDE.md |
Added documentation about mock S3 |
.env.example |
Updated S3 configuration examples |
config/overrides/mcp.json |
Added progress_demo server entry |
.gitignore |
Added .ruff_cache |
Comments suppressed due to low confidence (4)
backend/infrastructure/app_factory.py:78
- Return type annotation is incorrect. This method returns either S3StorageClient or MockS3StorageClient depending on configuration, but is typed as only S3StorageClient. Use a Union type or create a common protocol/base class: 'Union[S3StorageClient, MockS3StorageClient]' or define a shared interface.
def get_file_storage(self) -> S3StorageClient: # noqa: D401
backend/modules/file_storage/mock_s3_client.py:205
- Except block directly handles BaseException.
except:
backend/modules/file_storage/mock_s3_client.py:205
- 'except' clause does nothing but pass and there is no explanatory comment.
except:
mocks/s3-mock/storage.py:115
- File is opened but is not closed.
"ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()),
mocks/s3-mock/main.py
Outdated
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| port = int(os.getenv("PORT", "9002")) |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Port default is inconsistent with README documentation. The code defaults to 9002 but the README states the server runs on port 9001 by default (lines 7, 49, 57 of README.md). This mismatch will cause connection failures when following documentation.
| port = int(os.getenv("PORT", "9002")) | |
| port = int(os.getenv("PORT", "9001")) |
| @app.put("/{bucket}/{key:path}") | ||
| async def put_object(bucket: str, key: str, request: Request): |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate route definition: This PUT endpoint at line 58 conflicts with the put_object_tagging endpoint at line 183, both using the same path pattern '/{bucket}/{key:path}'. FastAPI will only register one of these routes, causing tagging operations to fail. The tagging endpoint should use a query parameter check or different decorator order to disambiguate.
mocks/s3-mock/storage.py
Outdated
| items.append({ | ||
| "Key": key, | ||
| "Size": size, | ||
| "ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()), |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File is opened and read without using a context manager, risking resource leaks. Additionally, this fallback reads the entire file into memory just to calculate ETag when metadata is missing. Consider using 'with open(obj_path, "rb") as f: calc_etag(f.read())' and potentially logging when metadata is unavailable.
| items.append({ | |
| "Key": key, | |
| "Size": size, | |
| "ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()), | |
| etag = meta.get("etag") | |
| if not etag: | |
| with open(obj_path, "rb") as f: | |
| etag = calc_etag(f.read()) | |
| items.append({ | |
| "Key": key, | |
| "Size": size, | |
| "ETag": etag, |
| value_elem = tag_elem.find("Value") | ||
| if key_elem is not None and value_elem is not None: | ||
| tags[key_elem.text] = value_elem.text | ||
| except: |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare except clause silently swallows all exceptions including SystemExit and KeyboardInterrupt. This should catch specific exceptions like 'ET.ParseError' or 'Exception' to avoid hiding critical errors during XML tag parsing.
| except: | |
| except ET.ParseError: |
| if include_files_manifest: | ||
| session_context = build_session_context(session) | ||
| files_in_context = session_context.get("files", {}) | ||
| logger.info(f"DEBUG: Session has {len(files_in_context)} files: {list(files_in_context.keys())}") |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log messages prefixed with 'DEBUG:' are being sent at INFO level. Either change the log level to logger.debug() or remove the 'DEBUG:' prefix from the message. This applies to lines 59, 62, and 65.
mocks/s3-mock/main.py
Outdated
| import os | ||
| import urllib.parse | ||
| from pathlib import Path | ||
| from typing import Dict, Optional |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Optional' is not used.
| from typing import Dict, Optional | |
| from typing import Dict |
mocks/s3-mock/smoke_test.py
Outdated
| Tests all supported S3 operations to ensure compatibility. | ||
| """ | ||
|
|
||
| import hashlib |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'hashlib' is not used.
| import hashlib |
mocks/s3-mock/smoke_test.py
Outdated
| """ | ||
|
|
||
| import hashlib | ||
| import time |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'time' is not used.
| import time |
mocks/s3-mock/smoke_test.py
Outdated
|
|
||
| import hashlib | ||
| import time | ||
| import os |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
mocks/s3-mock/smoke_test.py
Outdated
| import hashlib | ||
| import time | ||
| import os | ||
| from pathlib import Path |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Path' is not used.
| from pathlib import Path |
- Adjust log levels from info to debug in message_builder.py and remove debug prefixes from warnings for cleaner output - Replace bare except with specific ET.ParseError in mock_s3_client.py for better error handling - Update README.md to use uv pip for installation - Consolidate object tagging logic into main PUT endpoint in main.py, removing redundant function for simpler API structure
- Remove outdated comment about enabling mock S3 in README.md - Add production usage warning in main README guidelines - Expand S3 mock README with detailed security limitations and alternatives - Ensures developers understand risks and avoid production deployment of insecure mock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 12 comments.
mocks/s3-mock/storage.py
Outdated
|
|
||
|
|
||
| def calc_etag(data: bytes) -> str: | ||
| return hashlib.md5(data).hexdigest() |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD5 is used for ETag calculation without the usedforsecurity=False parameter. While MD5 is acceptable for ETag generation (which is not a security-critical operation), Python 3.9+ recommends explicitly marking non-security uses to avoid warnings in FIPS mode. Consider adding usedforsecurity=False: hashlib.md5(data, usedforsecurity=False).hexdigest()
| return hashlib.md5(data).hexdigest() | |
| return hashlib.md5(data, usedforsecurity=False).hexdigest() |
mocks/s3-mock/storage.py
Outdated
| # Sanitize key to prevent traversal | ||
| key = key.lstrip("/") | ||
| safe_parts = [] | ||
| for part in key.split("/"): | ||
| if part in ("..", ""): | ||
| continue | ||
| safe_parts.append(part) | ||
| safe_key = "/".join(safe_parts) | ||
| obj_path = bucket_root / safe_key | ||
| meta_path = bucket_root / f"{safe_key}.meta.json" |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path traversal protection only filters .. and empty strings, but doesn't handle other potentially dangerous path components like . (current directory) or absolute paths. While . is generally safe, consider also checking for absolute path indicators on Windows (e.g., drive letters) if cross-platform compatibility is a concern. Additionally, the current approach silently drops dangerous parts - consider whether this could lead to unexpected behavior where a key like ../../../etc/passwd becomes etc/passwd instead of raising an error.
| # Sanitize key to prevent traversal | |
| key = key.lstrip("/") | |
| safe_parts = [] | |
| for part in key.split("/"): | |
| if part in ("..", ""): | |
| continue | |
| safe_parts.append(part) | |
| safe_key = "/".join(safe_parts) | |
| obj_path = bucket_root / safe_key | |
| meta_path = bucket_root / f"{safe_key}.meta.json" | |
| # Sanitize key to prevent traversal and reject dangerous keys | |
| key = key.lstrip("/") | |
| if os.path.isabs(key): | |
| raise ValueError("Absolute paths are not allowed in S3 keys") | |
| safe_parts = [] | |
| for part in key.split("/"): | |
| if part in ("..", ".", ""): | |
| raise ValueError(f"Invalid path component in S3 key: {part!r}") | |
| # On Windows, reject drive letters (e.g., C:) | |
| if os.name == "nt" and len(part) == 2 and part[1] == ":" and part[0].isalpha(): | |
| raise ValueError(f"Drive letter not allowed in S3 key: {part!r}") | |
| safe_parts.append(part) | |
| safe_key = "/".join(safe_parts) | |
| obj_path = (bucket_root / safe_key).resolve() | |
| meta_path = (bucket_root / f"{safe_key}.meta.json").resolve() | |
| # Ensure the resolved paths are within the bucket root | |
| bucket_root_resolved = bucket_root.resolve() | |
| if not str(obj_path).startswith(str(bucket_root_resolved)): | |
| raise ValueError("Path traversal detected in S3 key") | |
| if not str(meta_path).startswith(str(bucket_root_resolved)): | |
| raise ValueError("Path traversal detected in S3 key (meta)") |
mocks/s3-mock/smoke_test.py
Outdated
| test_delete_object() | ||
|
|
||
| print("\n" + "=" * 40) | ||
| print("✅ All tests passed!") |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The success message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).
| print("✅ All tests passed!") | |
| print("All tests passed!") |
mocks/s3-mock/smoke_test.py
Outdated
| print("✅ All tests passed!") | ||
|
|
||
| except Exception as e: | ||
| print(f"\n❌ Test failed: {e}") |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message uses an emoji '❌' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).
| print(f"\n❌ Test failed: {e}") | |
| print(f"\nERROR: Test failed: {e}") |
|
|
||
| def _calculate_etag(self, content_bytes: bytes) -> str: | ||
| """Calculate ETag for file content.""" | ||
| return hashlib.md5(content_bytes).hexdigest() |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD5 is used for ETag calculation without the usedforsecurity=False parameter. While MD5 is acceptable for ETag generation (which is not a security-critical operation), Python 3.9+ recommends explicitly marking non-security uses to avoid warnings in FIPS mode. Consider adding usedforsecurity=False: hashlib.md5(content_bytes, usedforsecurity=False).hexdigest()
| return hashlib.md5(content_bytes).hexdigest() | |
| return hashlib.md5(content_bytes, usedforsecurity=False).hexdigest() |
agent_start.sh
Outdated
| echo "✅ Using Mock S3 (no Docker required)" | ||
| else | ||
| # Check if MinIO is running | ||
| if ! docker ps | grep -q atlas-minio; then | ||
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." | ||
| docker-compose up -d minio minio-init | ||
| echo "✅ MinIO started successfully" | ||
| sleep 3 | ||
| else | ||
| echo "✅ MinIO is already running" |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).
| echo "✅ Using Mock S3 (no Docker required)" | |
| else | |
| # Check if MinIO is running | |
| if ! docker ps | grep -q atlas-minio; then | |
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." | |
| docker-compose up -d minio minio-init | |
| echo "✅ MinIO started successfully" | |
| sleep 3 | |
| else | |
| echo "✅ MinIO is already running" | |
| echo "Using Mock S3 (no Docker required)" | |
| else | |
| # Check if MinIO is running | |
| if ! docker ps | grep -q atlas-minio; then | |
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." | |
| docker-compose up -d minio minio-init | |
| echo "MinIO started successfully" | |
| sleep 3 | |
| else | |
| echo "MinIO is already running" |
agent_start.sh
Outdated
| if ! docker ps | grep -q atlas-minio; then | ||
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." | ||
| docker-compose up -d minio minio-init | ||
| echo "✅ MinIO started successfully" |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).
| echo "✅ MinIO started successfully" | |
| echo "MinIO started successfully" |
agent_start.sh
Outdated
| echo "✅ Using Mock S3 (no Docker required)" | ||
| else | ||
| # Check if MinIO is running | ||
| if ! docker ps | grep -q atlas-minio; then | ||
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." | ||
| docker-compose up -d minio minio-init | ||
| echo "✅ MinIO started successfully" | ||
| sleep 3 | ||
| else | ||
| echo "✅ MinIO is already running" |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).
| echo "✅ Using Mock S3 (no Docker required)" | |
| else | |
| # Check if MinIO is running | |
| if ! docker ps | grep -q atlas-minio; then | |
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." | |
| docker-compose up -d minio minio-init | |
| echo "✅ MinIO started successfully" | |
| sleep 3 | |
| else | |
| echo "✅ MinIO is already running" | |
| echo "Using Mock S3 (no Docker required)" | |
| else | |
| # Check if MinIO is running | |
| if ! docker ps | grep -q atlas-minio; then | |
| echo "MinIO is not running. Starting MinIO with docker-compose..." | |
| docker-compose up -d minio minio-init | |
| echo "MinIO started successfully" | |
| sleep 3 | |
| else | |
| echo "MinIO is already running" |
agent_start.sh
Outdated
| else | ||
| # Check if MinIO is running | ||
| if ! docker ps | grep -q atlas-minio; then | ||
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message uses an emoji '
| echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..." | |
| echo "MinIO is not running. Starting MinIO with docker-compose..." |
| if key_elem is not None and value_elem is not None: | ||
| tags[key_elem.text] = value_elem.text | ||
| except ET.ParseError: | ||
| pass |
Copilot
AI
Nov 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| pass | |
| # Failed to parse tags XML; tags will be left empty. This is non-fatal as tags are optional. | |
| logger.warning(f"Failed to parse tags XML for file {sanitize_for_logging(file_key)}", exc_info=True) |
…mock implementation
Introduce USE_MOCK_S3 flag in .env to toggle between in-process mock S3 (default) and MinIO via Docker. This simplifies the development workflow by eliminating the need for Docker in local setups, while maintaining production compatibility. Updated documentation and startup script accordingly.