Skip to content

Conversation

@garland3
Copy link
Collaborator

minio s3 integration

- Implement handle_attach_file method in ChatService to attach user files to sessions via S3 key
- Update WebSocket connection adapter to authenticate user from query params or config fallback
- Handle "attach_file" message type in WebSocket endpoint, calling handle_attach_file with authenticated user
- Ensures file ownership verification and adds file to session context securely
@garland3 garland3 requested a review from Copilot October 25, 2025 20:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a MinIO S3 integration and adds a File Library feature that allows users to view, manage, and load files across all sessions, not just the current session.

Key changes:

  • Added MinIO S3 storage integration with system configuration files
  • Implemented a new File Library UI with tabbed interface (Session Files vs. All Files)
  • Added backend support for attaching files from library to current session via WebSocket
  • Enhanced file management with search, filter, and sort capabilities

Reviewed Changes

Copilot reviewed 11 out of 31 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
minio-data/.minio.sys/format.json Removed MinIO system configuration file (should not be in version control)
frontend/src/components/SessionFilesView.jsx Created new component for displaying session-specific files, extracted from FileManager
frontend/src/components/FileManagerPanel.jsx Added tabbed interface to switch between Session Files and File Library views
frontend/src/components/FileManager.jsx Converted to re-export of SessionFilesView for backward compatibility
frontend/src/components/AllFilesView.jsx New component for browsing all user files with search, filter, and sort capabilities
docs/file_library_implementation.md Added implementation plan documentation for the File Library feature
backend/tests/test_file_library.py Added test file with placeholder tests for File Library functionality
backend/routes/files_routes.py Added file size validation and datetime serialization fixes for file listing
backend/main.py Added WebSocket message handler for attach_file messages and user authentication
backend/infrastructure/transport/websocket_connection_adapter.py Added user_email field to WebSocket adapter for authentication
backend/application/chat/service.py Implemented handle_attach_file method to attach library files to current session
Comments suppressed due to low confidence (1)

minio-data/.minio.sys/format.json:1

  • MinIO system configuration files should not be committed to version control as they contain instance-specific metadata and could expose system configuration details. Add minio-data/ to .gitignore to prevent these files from being tracked.

@@ -0,0 +1,133 @@
import { useState, useEffect } from 'react'
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports include useState and useEffect from React, but neither hook is used in this component. Remove unused imports to keep the code clean.

Suggested change
import { useState, useEffect } from 'react'

Copilot uses AI. Check for mistakes.
Comment on lines 130 to 139
if isinstance(processed_file.get('last_modified'), str):
# If already a string, keep it
pass
else:
# Convert datetime to ISO format string
try:
processed_file['last_modified'] = processed_file['last_modified'].isoformat()
except AttributeError:
# If it's not a datetime object, convert to string
processed_file['last_modified'] = str(processed_file['last_modified'])
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty pass statement in the if block when last_modified is already a string is unnecessary. Restructure the logic to eliminate it by inverting the condition: check if it's NOT a string first, then convert it.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +185
# Fallback to test user or require auth
config_manager = app_factory.get_config_manager()
user_email = config_manager.app_settings.test_user or 'test@test.com'
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a hardcoded default email 'test@test.com' as a fallback for authentication bypasses proper user authentication. This could allow unauthorized access in production. Instead, reject the WebSocket connection when user_email is not provided or authentication fails.

Suggested change
# Fallback to test user or require auth
config_manager = app_factory.get_config_manager()
user_email = config_manager.app_settings.test_user or 'test@test.com'
# Reject connection if user is not provided or authentication fails
await websocket.close(code=4401, reason="Unauthorized: user authentication required")
return

Copilot uses AI. Check for mistakes.
Comment on lines 60 to 67
# Validate base64 content size (configurable limit to prevent abuse)
try:
content_size = len(request.content_base64) * 3 // 4 # approximate decoded size
max_size = 50 * 1024 * 1024 # 50MB default (configurable)
if content_size > max_size:
raise HTTPException(status_code=413, detail=f"File too large. Maximum size is {max_size // (1024*1024)}MB")
except Exception:
raise HTTPException(status_code=400, detail="Invalid base64 content")
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except block catches all exceptions and returns 'Invalid base64 content', but this could mask the HTTPException(413) raised for file size. Only catch exceptions related to base64 validation (e.g., during length calculation) and let HTTPException propagate.

Copilot uses AI. Check for mistakes.
@garland3 garland3 requested a review from Copilot October 25, 2025 20:55
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 11 out of 31 changed files in this pull request and generated 4 comments.

"""Upload a file to S3 storage."""
# Validate base64 content size (configurable limit to prevent abuse)
try:
content_size = len(request.content_base64) * 3 // 4 # approximate decoded size
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base64 size estimation formula * 3 // 4 should be documented more thoroughly. This approximation doesn't account for padding characters and may underestimate the actual decoded size by up to 2 bytes, potentially allowing files slightly larger than the intended limit.

Suggested change
content_size = len(request.content_base64) * 3 // 4 # approximate decoded size
# Decode base64 to get the exact size of the file in bytes.
decoded_content = base64.b64decode(request.content_base64, validate=True)
content_size = len(decoded_content)

Copilot uses AI. Check for mistakes.
backend/main.py Outdated
Comment on lines 181 to 185
user_email = websocket.query_params.get('user')
if not user_email:
# Reject connection if user is not provided or authentication fails
await websocket.close(code=4401, reason="Unauthorized: user authentication required")
return
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WebSocket authentication relies solely on a query parameter without any validation or token verification. This allows any client to impersonate any user by simply providing an email address. Implement proper authentication using the existing token-based system.

Copilot uses AI. Check for mistakes.
Comment on lines +337 to +343
if not self.file_manager or not user_email:
return {
"type": "file_attach",
"s3_key": s3_key,
"success": False,
"error": "File manager not available or no user email"
}
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message combines two distinct failure cases. Split this into separate error messages: 'File manager not available' and 'User email required' for clearer debugging and better user experience.

Suggested change
if not self.file_manager or not user_email:
return {
"type": "file_attach",
"s3_key": s3_key,
"success": False,
"error": "File manager not available or no user email"
}
if not self.file_manager:
return {
"type": "file_attach",
"s3_key": s3_key,
"success": False,
"error": "File manager not available"
}
if not user_email:
return {
"type": "file_attach",
"s3_key": s3_key,
"success": False,
"error": "User email required"
}

Copilot uses AI. Check for mistakes.
const fetchAllFiles = async () => {
try {
setLoading(true)
const response = await fetch('/api/files?limit=1000', {
Copy link

Copilot AI Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded limit of 1000 files may cause performance issues with large datasets. Consider implementing pagination or virtual scrolling, especially since all files are loaded into memory for client-side filtering and sorting.

Copilot uses AI. Check for mistakes.
Update websocket endpoint to authenticate users primarily through X-Authenticated-User header set by authenticated reverse proxy in production, falling back to insecure query parameter for local development. Add comprehensive security documentation explaining the authentication flow, production requirements, and risks of direct access. This ensures secure connections in production while maintaining development flexibility.
Relocated the /files/healthz endpoint to appear before the /files/{file_key:path} route in files_routes.py to avoid path capture conflicts in the router. The endpoint now includes an explanatory note in its docstring for clarity.
Prevent leading '-' in tags by conditionally enabling SHA types: use branch prefix for branches and 'pr-' for pull requests.
- Replace template variables with ${{ github.ref_name }}, ${{ github.event_name }}, and ${{ github.ref }} for safer and more accurate branch/PR tagging
- Simplify enabling conditions for SHA tags on branches and PRs
- Set 'latest' tag only on main branch push
- Remove trailing newline_clip económicallyselectedocument at the end of the file.
- In chat service, reduced verbosity in log messages by removing server/prompt_name and filename details to streamline output.
- Removed unused response variable assignment in websocket endpoint.
- Eliminated unused Filter icon import from frontend component.
garland3 and others added 3 commits October 25, 2025 15:58
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…error

Previously, the error log for failed file attachments included raw s3_key and exception details, potentially introducing multi-line entries or injection risks. This update strips '\n' and '\r' characters from both to ensure clean, single-line log output for better readability and security.
@garland3 garland3 merged commit fe8ace1 into main Oct 25, 2025
7 of 8 checks passed
@garland3 garland3 deleted the minio branch October 25, 2025 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants