-
Notifications
You must be signed in to change notification settings - Fork 5
Fix file management #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ndling - Add detailed step logging in tool_utils.py for execute_tools_workflow, execute_single_tool, and prepare_tool_arguments functions to trace tool execution process - Modify llm logging in error_utils.py to output full llm_response instead of just has_tool_calls for better debugging - Update pdfbasic main.py to include logging import and adjust _analyze_pdf_content function to accept optional original_filename parameter for improved context handling in PDF analysis
- Refactor chat service to directly manage file references in session context for existing S3 files, bypassing complex file handling utilities and improving efficiency - Modify PDF analysis tool to generate in-memory PDF reports with word frequency summaries, providing better visual output for text analytics
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds file transfer support for MCP tools and improves debugging capabilities. The changes enable tools to accept files via URLs (uploaded to S3) instead of base64 encoding, add new test tools for file handling, and enhance logging throughout the tool execution pipeline.
Key changes:
- Added URL-based file transfer support for MCP tools (alongside existing base64)
- Introduced two new MCP tools:
pdfbasic(PDF analysis) andfile_size_test(file transfer testing) - Enhanced logging with debug "Step" markers throughout tool execution flow
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| config/overrides/mcp.json | Added configuration for two new MCP tools: pdfbasic and file_size_test |
| base-image-update-plan.md | Removed outdated Ubuntu to Fedora migration plan document |
| backend/modules/mcp_tools/client.py | Added debug logging at tool execution entry point |
| backend/modules/llm/litellm_caller.py | Added validation when tool_choice is "required" to ensure LLM returns tool calls |
| backend/mcp/pdfbasic/main.py | Modified to accept URLs in addition to base64, added URL download logic, extensive logging |
| backend/mcp/file_size_test/main.py | New file implementing test tools for file transfer validation |
| backend/application/chat/utilities/tool_utils.py | Added debug logging to track tool execution workflow steps |
| backend/application/chat/utilities/error_utils.py | Changed logging to show full LLM response object for debugging |
| backend/application/chat/service.py | Simplified file attachment to use direct S3 client access and emit files update |
backend/mcp/pdfbasic/main.py
Outdated
| """ | ||
| Extract and analyze text content from PDF documents with comprehensive word frequency analysis. | ||
| This PDF processing tool provides detailed text analytics for PDF documents: |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spacing: 'This PDF' contains double space, should be 'This PDF'.
| This PDF processing tool provides detailed text analytics for PDF documents: | |
| This PDF processing tool provides detailed text analytics for PDF documents: |
backend/mcp/pdfbasic/main.py
Outdated
| if filename.startswith("/"): | ||
| # Construct absolute URL from relative path | ||
| # Default to localhost:8000 for local development | ||
| import os |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import statement should be moved to the top of the file with other imports. The os module is imported inside a conditional block, but it's a standard library import and should be at module level for consistency and readability.
backend/mcp/file_size_test/main.py
Outdated
| print(f"DEBUG: username: {username}") | ||
| try: |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.
| filename.startswith("/") | ||
| ) | ||
| print(f"DEBUG: is_url determined as: {is_url}") | ||
|
|
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.
| original_filename = "processed_file.txt" | ||
|
|
||
| print(f"DEBUG: Original file size: {len(file_bytes)} bytes") | ||
|
|
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.
backend/mcp/file_size_test/main.py
Outdated
| print(f"DEBUG: Exception type: {type(e).__name__}") | ||
| print(f"DEBUG: Filename that caused error: {filename}") | ||
| import traceback | ||
| print("DEBUG: Full traceback:") | ||
| traceback.print_exc() |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.
| } | ||
| } | ||
| print(f"DEBUG: About to return error result: {error_result}") | ||
| return error_result |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.
| try: | ||
| # Get file metadata | ||
| file_result = await self.file_manager.get_file(user_email, s3_key) | ||
| file_result = await self.file_manager.s3_client.get_file(user_email, s3_key) |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct access to s3_client breaks encapsulation. The file_manager should provide a method to retrieve files rather than exposing its internal s3_client. This creates tight coupling and makes the code harder to maintain or refactor.
| file_result = await self.file_manager.s3_client.get_file(user_email, s3_key) | |
| file_result = await self.file_manager.get_file(user_email, s3_key) |
| analysis_result = _analyze_pdf_content(instructions, filename, file_data_base64) | ||
| if "error" in analysis_result: | ||
| return analysis_result # Return the error if analysis failed | ||
| analysis_result = _analyze_pdf_content(instructions, filename, original_filename) |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling logic changed from checking top-level 'error' key to checking within 'results'. This is inconsistent with the error return structure in _analyze_pdf_content (line 122) which returns {'results': {'error': ...}}. However, the function should also handle cases where the entire analysis fails and returns {'error': ...} at the top level, not nested in results.
| analysis_result = _analyze_pdf_content(instructions, filename, original_filename) | |
| analysis_result = _analyze_pdf_content(instructions, filename, original_filename) | |
| if "error" in analysis_result: | |
| return analysis_result |
| """ | ||
|
|
||
| import base64 | ||
| import os |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'io' is not used.
| import os |
- Replace print() debugging statements with proper logger calls in file_size_test/main.py - Move os import to top of pdfbasic/main.py for consistency - Fix double space typo in pdfbasic/main.py docstring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Issue #20 is addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
|
|
||
| if tool_choice == "required" and not getattr(message, 'tool_calls', None): | ||
| logger.error(f"LLM failed to return tool calls when tool_choice was 'required'. Full response: {response}") | ||
| raise ValueError("LLM failed to return tool calls when tool_choice was 'required'.") |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation check will prevent the existing fallback mechanism from working. Lines 222-242 implement retry logic with tool_choice='auto' when 'required' fails, but this new check raises an exception before that retry logic can execute. The exception should only be raised after the retry logic has also failed, or the check should be removed to allow the existing error handling to work.
| raise ValueError("LLM failed to return tool calls when tool_choice was 'required'.") | |
| raise Exception("LLM failed to return tool calls when tool_choice was 'required'.") |
|
|
||
|
|
||
| def _analyze_pdf_content(instructions: str, filename: str, file_data_base64: str) -> Dict[str, Any]: | ||
| def _analyze_pdf_content(instructions: str, filename: str, original_filename: Optional[str] = None) -> Dict[str, Any]: |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature changed to remove the file_data_base64 parameter but the parameter name filename is now overloaded to mean either a URL or base64-encoded data (line 74). This is confusing API design. Consider renaming the parameter to file_reference or file_input to better reflect its dual purpose.
| if filename.startswith("/"): | ||
| # Construct absolute URL from relative path | ||
| # Default to localhost:8000 for local development | ||
| backend_url = os.getenv("BACKEND_URL", "http://localhost:8000") |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BACKEND_URL environment variable is used without validation. If this URL is user-controllable or comes from an untrusted source, it could enable SSRF attacks. Consider validating that the URL points to an allowed domain or use a configuration-based whitelist.
|
|
||
| if is_url: | ||
| if filename.startswith("/"): | ||
| backend_url = os.getenv("BACKEND_URL", "http://localhost:8000") |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BACKEND_URL environment variable is used without validation. If this URL is user-controllable or comes from an untrusted source, it could enable SSRF attacks. Consider validating that the URL points to an allowed domain or use a configuration-based whitelist.
backend/mcp/pdfbasic/main.py
Outdated
| url = filename | ||
|
|
||
| logger.info(f"Step 9: Downloading file from URL: {url}") | ||
| response = requests.get(url) |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requests.get() call lacks a timeout parameter, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.
| response = requests.get(url) | |
| response = requests.get(url, timeout=30) |
backend/mcp/file_size_test/main.py
Outdated
| else: | ||
| url = filename | ||
| logger.info(f"Downloading file for processing: {url}") | ||
| response = requests.get(url) |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requests.get() calls lack timeout parameters, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.
| response = requests.get(url) | |
| response = requests.get(url, timeout=30) |
backend/mcp/file_size_test/main.py
Outdated
|
|
||
| logger.debug(f"About to download from URL: {url}") | ||
| logger.info(f"Downloading file from URL: {url}") | ||
| response = requests.get(url) |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requests.get() calls lack timeout parameters, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.
| response = requests.get(url) | |
| response = requests.get(url, timeout=30) |
| if "error" in analysis_result: | ||
| return analysis_result # Return the error if analysis failed | ||
| analysis_result = _analyze_pdf_content(instructions, filename, original_filename) | ||
| if "error" in analysis_result.get("results", {}): |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling logic changed from checking if \"error\" in analysis_result (line 275 in old code) to checking within the nested 'results' key. However, looking at line 122, errors are returned as {\"results\": {\"error\": ...}}, so the new check is correct, but if _analyze_pdf_content ever returns errors in a different format (e.g., at the top level), they won't be caught. Consider maintaining consistency in error response structure or handling both formats.
| if "error" in analysis_result.get("results", {}): | |
| if "error" in analysis_result or "error" in analysis_result.get("results", {}): |
- Add 30-second timeout to all requests.get() calls to prevent indefinite hangs - Replace traceback.print_exc() with logger.exception() for better logging - Consolidate error logging messages for improved readability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This pull request introduces several improvements and new features related to file handling and tool integration, particularly focusing on file transfer, processing, and PDF analysis. The main highlights include a new MCP server for file size testing, enhanced PDF tool support for URL and base64 file inputs, and improved logging and debugging throughout the tool execution workflow.
New Features:
file_size_test/main.pythat provides two tools:get_file_size(returns the size of a transferred file, supporting both URLs and base64 data) andprocess_file_demo(demonstrates file processing and artifact return using the v2 MCP artifacts contract).PDF Tool Enhancements:
pdfbasic/main.pyto support file input via both URLs (including relative API paths) and base64, improving compatibility with frontend file transfer workflows. The PDF analysis tools now use the original filename when available and provide more robust error handling and logging. [1] [2] [3] [4] [5] [6] [7]File Attachment and Session Context Improvements:
chat/service.pyto directly add file references to the session context (instead of using a utility function), and emit afiles_updateevent to notify the UI after attachment. [1] [2]Logging and Debugging Improvements:
tool_utils.pyto aid in debugging and tracing tool calls, argument preparation, and file URL rewriting. [1] [2] [3] [4] [5]error_utils.pyfor better traceability.