Wip-bring-back-s3-mock-issue-36 #41

garland3 · 2025-11-02T18:18:57Z

Introduce USE_MOCK_S3 flag in .env to toggle between in-process mock S3 (default) and MinIO via Docker. This simplifies the development workflow by eliminating the need for Docker in local setups, while maintaining production compatibility. Updated documentation and startup script accordingly.

Add .ruff_cache to .gitignore to exclude Ruff linter cache files from version control, preventing unnecessary commits of generated cache data.

Introduce USE_MOCK_S3 flag in .env to toggle between in-process mock S3 (default) and MinIO via Docker. This simplifies the development workflow by eliminating the need for Docker in local setups, while maintaining production compatibility. Updated documentation and startup script accordingly.

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copilot

Pull Request Overview

This PR adds an in-process Mock S3 implementation to eliminate Docker/MinIO dependency during development. The mock uses FastAPI TestClient to provide S3-compatible REST API responses, allowing file storage to work without external services.

Key changes:

New S3 mock server with file-based storage supporting core S3 operations (PUT, GET, HEAD, DELETE, ListObjectsV2, tagging)
MockS3StorageClient that wraps the mock server via TestClient
Configuration toggle (USE_MOCK_S3) to switch between mock and MinIO
Documentation updates reflecting the mock S3 option

Reviewed Changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`mocks/s3-mock/storage.py`	Core file storage logic with path sanitization and metadata handling
`mocks/s3-mock/s3_xml.py`	XML generation/parsing for S3 API responses
`mocks/s3-mock/main.py`	FastAPI server with S3 REST endpoints
`mocks/s3-mock/smoke_test.py`	Integration tests for S3 mock functionality
`mocks/s3-mock/README.md`	Documentation for the mock server
`backend/modules/file_storage/mock_s3_client.py`	TestClient-based S3 client for in-process testing
`backend/infrastructure/app_factory.py`	Factory logic to select between mock and real S3
`backend/modules/config/manager.py`	Added `use_mock_s3` setting
`backend/application/chat/preprocessors/message_builder.py`	Added debug logging for file manifests
`agent_start.sh`	Updated to conditionally start MinIO based on mock setting
`GEMINI.md`, `CLAUDE.md`	Added documentation about mock S3
`.env.example`	Updated S3 configuration examples
`config/overrides/mcp.json`	Added progress_demo server entry
`.gitignore`	Added `.ruff_cache`

Comments suppressed due to low confidence (4)

backend/infrastructure/app_factory.py:78

Return type annotation is incorrect. This method returns either S3StorageClient or MockS3StorageClient depending on configuration, but is typed as only S3StorageClient. Use a Union type or create a common protocol/base class: 'Union[S3StorageClient, MockS3StorageClient]' or define a shared interface.

    def get_file_storage(self) -> S3StorageClient:  # noqa: D401

backend/modules/file_storage/mock_s3_client.py:205

Except block directly handles BaseException.

                except:

backend/modules/file_storage/mock_s3_client.py:205

'except' clause does nothing but pass and there is no explanatory comment.

                except:

mocks/s3-mock/storage.py:115

File is opened but is not closed.

                "ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()),

Copilot · 2025-11-02T18:22:26Z

mocks/s3-mock/main.py

+
+
+if __name__ == "__main__":
+    port = int(os.getenv("PORT", "9002"))


Port default is inconsistent with README documentation. The code defaults to 9002 but the README states the server runs on port 9001 by default (lines 7, 49, 57 of README.md). This mismatch will cause connection failures when following documentation.

Suggested change

port = int(os.getenv("PORT", "9002"))

port = int(os.getenv("PORT", "9001"))

Copilot · 2025-11-02T18:22:26Z

mocks/s3-mock/main.py

+@app.put("/{bucket}/{key:path}")
+async def put_object(bucket: str, key: str, request: Request):


Duplicate route definition: This PUT endpoint at line 58 conflicts with the put_object_tagging endpoint at line 183, both using the same path pattern '/{bucket}/{key:path}'. FastAPI will only register one of these routes, causing tagging operations to fail. The tagging endpoint should use a query parameter check or different decorator order to disambiguate.

Copilot · 2025-11-02T18:22:27Z

mocks/s3-mock/storage.py

+            items.append({
+                "Key": key,
+                "Size": size,
+                "ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()),


File is opened and read without using a context manager, risking resource leaks. Additionally, this fallback reads the entire file into memory just to calculate ETag when metadata is missing. Consider using 'with open(obj_path, "rb") as f: calc_etag(f.read())' and potentially logging when metadata is unavailable.

Suggested change

items.append({

"Key": key,

"Size": size,

"ETag": meta.get("etag") or calc_etag(open(obj_path, "rb").read()),

etag = meta.get("etag")

if not etag:

with open(obj_path, "rb") as f:

etag = calc_etag(f.read())

items.append({

"Key": key,

"Size": size,

"ETag": etag,

Copilot · 2025-11-02T18:22:27Z

backend/modules/file_storage/mock_s3_client.py

+                        value_elem = tag_elem.find("Value")
+                        if key_elem is not None and value_elem is not None:
+                            tags[key_elem.text] = value_elem.text
+                except:


Bare except clause silently swallows all exceptions including SystemExit and KeyboardInterrupt. This should catch specific exceptions like 'ET.ParseError' or 'Exception' to avoid hiding critical errors during XML tag parsing.

Suggested change

except:

except ET.ParseError:

Copilot · 2025-11-02T18:22:27Z

backend/application/chat/preprocessors/message_builder.py

        if include_files_manifest:
            session_context = build_session_context(session)
+            files_in_context = session_context.get("files", {})
+            logger.info(f"DEBUG: Session has {len(files_in_context)} files: {list(files_in_context.keys())}")


Log messages prefixed with 'DEBUG:' are being sent at INFO level. Either change the log level to logger.debug() or remove the 'DEBUG:' prefix from the message. This applies to lines 59, 62, and 65.

Copilot · 2025-11-02T18:22:28Z

mocks/s3-mock/main.py

+import os
+import urllib.parse
+from pathlib import Path
+from typing import Dict, Optional


Import of 'Optional' is not used.

Suggested change

from typing import Dict, Optional

from typing import Dict

Copilot · 2025-11-02T18:22:28Z

mocks/s3-mock/smoke_test.py

+Tests all supported S3 operations to ensure compatibility.
+"""
+
+import hashlib


Import of 'hashlib' is not used.

Suggested change

import hashlib

Copilot · 2025-11-02T18:22:28Z

mocks/s3-mock/smoke_test.py

+"""
+
+import hashlib
+import time


Import of 'time' is not used.

Suggested change

import time

Copilot · 2025-11-02T18:22:29Z

mocks/s3-mock/smoke_test.py

+
+import hashlib
+import time
+import os


Import of 'os' is not used.

Suggested change

import os

Copilot · 2025-11-02T18:22:29Z

mocks/s3-mock/smoke_test.py

+import hashlib
+import time
+import os
+from pathlib import Path


Import of 'Path' is not used.

Suggested change

from pathlib import Path

- Adjust log levels from info to debug in message_builder.py and remove debug prefixes from warnings for cleaner output - Replace bare except with specific ET.ParseError in mock_s3_client.py for better error handling - Update README.md to use uv pip for installation - Consolidate object tagging logic into main PUT endpoint in main.py, removing redundant function for simpler API structure

- Remove outdated comment about enabling mock S3 in README.md - Add production usage warning in main README guidelines - Expand S3 mock README with detailed security limitations and alternatives - Ensures developers understand risks and avoid production deployment of insecure mock

Copilot

Pull Request Overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 12 comments.

Copilot · 2025-11-02T18:51:00Z

mocks/s3-mock/storage.py

+
+
+def calc_etag(data: bytes) -> str:
+    return hashlib.md5(data).hexdigest()


MD5 is used for ETag calculation without the usedforsecurity=False parameter. While MD5 is acceptable for ETag generation (which is not a security-critical operation), Python 3.9+ recommends explicitly marking non-security uses to avoid warnings in FIPS mode. Consider adding usedforsecurity=False: hashlib.md5(data, usedforsecurity=False).hexdigest()

Suggested change

return hashlib.md5(data).hexdigest()

return hashlib.md5(data, usedforsecurity=False).hexdigest()

Copilot · 2025-11-02T18:51:00Z

mocks/s3-mock/storage.py

+    # Sanitize key to prevent traversal
+    key = key.lstrip("/")
+    safe_parts = []
+    for part in key.split("/"):
+        if part in ("..", ""):
+            continue
+        safe_parts.append(part)
+    safe_key = "/".join(safe_parts)
+    obj_path = bucket_root / safe_key
+    meta_path = bucket_root / f"{safe_key}.meta.json"


Path traversal protection only filters .. and empty strings, but doesn't handle other potentially dangerous path components like . (current directory) or absolute paths. While . is generally safe, consider also checking for absolute path indicators on Windows (e.g., drive letters) if cross-platform compatibility is a concern. Additionally, the current approach silently drops dangerous parts - consider whether this could lead to unexpected behavior where a key like ../../../etc/passwd becomes etc/passwd instead of raising an error.

Suggested change

# Sanitize key to prevent traversal

key = key.lstrip("/")

safe_parts = []

for part in key.split("/"):

if part in ("..", ""):

continue

safe_parts.append(part)

safe_key = "/".join(safe_parts)

obj_path = bucket_root / safe_key

meta_path = bucket_root / f"{safe_key}.meta.json"

# Sanitize key to prevent traversal and reject dangerous keys

key = key.lstrip("/")

if os.path.isabs(key):

raise ValueError("Absolute paths are not allowed in S3 keys")

safe_parts = []

for part in key.split("/"):

if part in ("..", ".", ""):

raise ValueError(f"Invalid path component in S3 key: {part!r}")

# On Windows, reject drive letters (e.g., C:)

if os.name == "nt" and len(part) == 2 and part[1] == ":" and part[0].isalpha():

raise ValueError(f"Drive letter not allowed in S3 key: {part!r}")

safe_parts.append(part)

safe_key = "/".join(safe_parts)

obj_path = (bucket_root / safe_key).resolve()

meta_path = (bucket_root / f"{safe_key}.meta.json").resolve()

# Ensure the resolved paths are within the bucket root

bucket_root_resolved = bucket_root.resolve()

if not str(obj_path).startswith(str(bucket_root_resolved)):

raise ValueError("Path traversal detected in S3 key")

if not str(meta_path).startswith(str(bucket_root_resolved)):

raise ValueError("Path traversal detected in S3 key (meta)")

Copilot · 2025-11-02T18:51:00Z

mocks/s3-mock/smoke_test.py

+        test_delete_object()
+
+        print("\n" + "=" * 40)
+        print("✅ All tests passed!")


The success message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change

print("✅ All tests passed!")

print("All tests passed!")

Copilot · 2025-11-02T18:51:01Z

mocks/s3-mock/smoke_test.py

+        print("✅ All tests passed!")
+
+    except Exception as e:
+        print(f"\n❌ Test failed: {e}")


The error message uses an emoji '❌' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change

print(f"\n❌ Test failed: {e}")

print(f"\nERROR: Test failed: {e}")

Copilot · 2025-11-02T18:51:01Z

backend/modules/file_storage/mock_s3_client.py

+
+    def _calculate_etag(self, content_bytes: bytes) -> str:
+        """Calculate ETag for file content."""
+        return hashlib.md5(content_bytes).hexdigest()


MD5 is used for ETag calculation without the usedforsecurity=False parameter. While MD5 is acceptable for ETag generation (which is not a security-critical operation), Python 3.9+ recommends explicitly marking non-security uses to avoid warnings in FIPS mode. Consider adding usedforsecurity=False: hashlib.md5(content_bytes, usedforsecurity=False).hexdigest()

Suggested change

return hashlib.md5(content_bytes).hexdigest()

return hashlib.md5(content_bytes, usedforsecurity=False).hexdigest()

Copilot · 2025-11-02T18:51:02Z

agent_start.sh

+    echo "✅ Using Mock S3 (no Docker required)"
+else
+    # Check if MinIO is running
+    if ! docker ps | grep -q atlas-minio; then
+        echo "⚠️  MinIO is not running. Starting MinIO with docker-compose..."
+        docker-compose up -d minio minio-init
+        echo "✅ MinIO started successfully"
+        sleep 3
+    else
+        echo "✅ MinIO is already running"


The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change

echo "✅ Using Mock S3 (no Docker required)"

else

# Check if MinIO is running

if ! docker ps | grep -q atlas-minio; then

echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."

docker-compose up -d minio minio-init

echo "✅ MinIO started successfully"

sleep 3

else

echo "✅ MinIO is already running"

echo "Using Mock S3 (no Docker required)"

else

# Check if MinIO is running

if ! docker ps | grep -q atlas-minio; then

echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."

docker-compose up -d minio minio-init

echo "MinIO started successfully"

sleep 3

else

echo "MinIO is already running"

Copilot · 2025-11-02T18:51:02Z

agent_start.sh

+    if ! docker ps | grep -q atlas-minio; then
+        echo "⚠️  MinIO is not running. Starting MinIO with docker-compose..."
+        docker-compose up -d minio minio-init
+        echo "✅ MinIO started successfully"


The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change

echo "✅ MinIO started successfully"

echo "MinIO started successfully"

Copilot · 2025-11-02T18:51:02Z

agent_start.sh

+    echo "✅ Using Mock S3 (no Docker required)"
+else
+    # Check if MinIO is running
+    if ! docker ps | grep -q atlas-minio; then
+        echo "⚠️  MinIO is not running. Starting MinIO with docker-compose..."
+        docker-compose up -d minio minio-init
+        echo "✅ MinIO started successfully"
+        sleep 3
+    else
+        echo "✅ MinIO is already running"


The message uses an emoji '✅' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change

echo "✅ Using Mock S3 (no Docker required)"

else

# Check if MinIO is running

if ! docker ps | grep -q atlas-minio; then

echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."

docker-compose up -d minio minio-init

echo "✅ MinIO started successfully"

sleep 3

else

echo "✅ MinIO is already running"

echo "Using Mock S3 (no Docker required)"

else

# Check if MinIO is running

if ! docker ps | grep -q atlas-minio; then

echo "MinIO is not running. Starting MinIO with docker-compose..."

docker-compose up -d minio minio-init

echo "MinIO started successfully"

sleep 3

else

echo "MinIO is already running"

Copilot · 2025-11-02T18:51:02Z

agent_start.sh

+else
+    # Check if MinIO is running
+    if ! docker ps | grep -q atlas-minio; then
+        echo "⚠️  MinIO is not running. Starting MinIO with docker-compose..."


The warning message uses an emoji '⚠️' which violates the project's coding guidelines that explicitly state 'No emojis in code or docs' (from GEMINI.md and CLAUDE.md).

Suggested change

echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."

echo "MinIO is not running. Starting MinIO with docker-compose..."

Copilot · 2025-11-02T18:51:03Z

backend/modules/file_storage/mock_s3_client.py

+                        if key_elem is not None and value_elem is not None:
+                            tags[key_elem.text] = value_elem.text
+                except ET.ParseError:
+                    pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

pass

# Failed to parse tags XML; tags will be left empty. This is non-fatal as tags are optional.

logger.warning(f"Failed to parse tags XML for file {sanitize_for_logging(file_key)}", exc_info=True)

…mock implementation

garland3 added 3 commits November 2, 2025 05:20

added the progress demo back in

4582939

chore: add .ruff_cache to .gitignore

542dd48

Add .ruff_cache to .gitignore to exclude Ruff linter cache files from version control, preventing unnecessary commits of generated cache data.

garland3 requested a review from Copilot November 2, 2025 18:19

github-advanced-security bot found potential problems Nov 2, 2025

View reviewed changes

Copilot AI reviewed Nov 2, 2025

View reviewed changes

garland3 added 2 commits November 2, 2025 18:30

garland3 requested a review from Copilot November 2, 2025 18:48

Copilot AI reviewed Nov 2, 2025

View reviewed changes

refactor: improve logging messages and enhance security checks in S3 …

2343ecf

…mock implementation

garland3 merged commit 08e0e4d into main Nov 2, 2025
5 checks passed

garland3 deleted the wip-bring-back-s3-mock-issue-36 branch November 2, 2025 19:08

garland3 mentioned this pull request Nov 2, 2025

Bring back the s3 mock #36

Closed



		if __name__ == "__main__":
		port = int(os.getenv("PORT", "9002"))

	port = int(os.getenv("PORT", "9002"))
	port = int(os.getenv("PORT", "9001"))

		@app.put("/{bucket}/{key:path}")
		async def put_object(bucket: str, key: str, request: Request):



		def calc_etag(data: bytes) -> str:
		return hashlib.md5(data).hexdigest()

	return hashlib.md5(data).hexdigest()
	return hashlib.md5(data, usedforsecurity=False).hexdigest()

-    # Sanitize key to prevent traversal
-    key = key.lstrip("/")
-    safe_parts = []
-    for part in key.split("/"):
-        if part in ("..", ""):
-            continue
-        safe_parts.append(part)
-    safe_key = "/".join(safe_parts)
-    obj_path = bucket_root / safe_key
-    meta_path = bucket_root / f"{safe_key}.meta.json"
+    # Sanitize key to prevent traversal and reject dangerous keys
+    key = key.lstrip("/")
+    if os.path.isabs(key):
+        raise ValueError("Absolute paths are not allowed in S3 keys")
+    safe_parts = []
+    for part in key.split("/"):
+        if part in ("..", ".", ""):
+            raise ValueError(f"Invalid path component in S3 key: {part!r}")
+        # On Windows, reject drive letters (e.g., C:)
+        if os.name == "nt" and len(part) == 2 and part[1] == ":" and part[0].isalpha():
+            raise ValueError(f"Drive letter not allowed in S3 key: {part!r}")
+        safe_parts.append(part)
+    safe_key = "/".join(safe_parts)
+    obj_path = (bucket_root / safe_key).resolve()
+    meta_path = (bucket_root / f"{safe_key}.meta.json").resolve()
+    # Ensure the resolved paths are within the bucket root
+    bucket_root_resolved = bucket_root.resolve()
+    if not str(obj_path).startswith(str(bucket_root_resolved)):
+        raise ValueError("Path traversal detected in S3 key")
+    if not str(meta_path).startswith(str(bucket_root_resolved)):
+        raise ValueError("Path traversal detected in S3 key (meta)")

	print(f"\n❌ Test failed: {e}")
	print(f"\nERROR: Test failed: {e}")

	return hashlib.md5(content_bytes).hexdigest()
	return hashlib.md5(content_bytes, usedforsecurity=False).hexdigest()

	echo "✅ MinIO started successfully"
	echo "MinIO started successfully"

	echo "⚠️ MinIO is not running. Starting MinIO with docker-compose..."
	echo "MinIO is not running. Starting MinIO with docker-compose..."

	pass
	# Failed to parse tags XML; tags will be left empty. This is non-fatal as tags are optional.
	logger.warning(f"Failed to parse tags XML for file {sanitize_for_logging(file_key)}", exc_info=True)

Wip-bring-back-s3-mock-issue-36 #41

Wip-bring-back-s3-mock-issue-36 #41

Uh oh!

Conversation

garland3 commented Nov 2, 2025

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers