Fix file management #24

garland3 · 2025-10-30T00:53:10Z

This pull request introduces several improvements and new features related to file handling and tool integration, particularly focusing on file transfer, processing, and PDF analysis. The main highlights include a new MCP server for file size testing, enhanced PDF tool support for URL and base64 file inputs, and improved logging and debugging throughout the tool execution workflow.

New Features:

Added a new MCP server file_size_test/main.py that provides two tools: get_file_size (returns the size of a transferred file, supporting both URLs and base64 data) and process_file_demo (demonstrates file processing and artifact return using the v2 MCP artifacts contract).

PDF Tool Enhancements:

Updated pdfbasic/main.py to support file input via both URLs (including relative API paths) and base64, improving compatibility with frontend file transfer workflows. The PDF analysis tools now use the original filename when available and provide more robust error handling and logging. [1] [2] [3] [4] [5] [6] [7]

File Attachment and Session Context Improvements:

Refactored file attachment logic in chat/service.py to directly add file references to the session context (instead of using a utility function), and emit a files_update event to notify the UI after attachment. [1] [2]

Logging and Debugging Improvements:

Added detailed step-by-step logging throughout the tool execution workflow in tool_utils.py to aid in debugging and tracing tool calls, argument preparation, and file URL rewriting. [1] [2] [3] [4] [5]
Improved logging of LLM tool call responses in error_utils.py for better traceability.

…ndling - Add detailed step logging in tool_utils.py for execute_tools_workflow, execute_single_tool, and prepare_tool_arguments functions to trace tool execution process - Modify llm logging in error_utils.py to output full llm_response instead of just has_tool_calls for better debugging - Update pdfbasic main.py to include logging import and adjust _analyze_pdf_content function to accept optional original_filename parameter for improved context handling in PDF analysis

- Refactor chat service to directly manage file references in session context for existing S3 files, bypassing complex file handling utilities and improving efficiency - Modify PDF analysis tool to generate in-memory PDF reports with word frequency summaries, providing better visual output for text analytics

backend/mcp/file_size_test/main.py

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot

Pull Request Overview

This PR adds file transfer support for MCP tools and improves debugging capabilities. The changes enable tools to accept files via URLs (uploaded to S3) instead of base64 encoding, add new test tools for file handling, and enhance logging throughout the tool execution pipeline.

Key changes:

Added URL-based file transfer support for MCP tools (alongside existing base64)
Introduced two new MCP tools: pdfbasic (PDF analysis) and file_size_test (file transfer testing)
Enhanced logging with debug "Step" markers throughout tool execution flow

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
config/overrides/mcp.json	Added configuration for two new MCP tools: pdfbasic and file_size_test
base-image-update-plan.md	Removed outdated Ubuntu to Fedora migration plan document
backend/modules/mcp_tools/client.py	Added debug logging at tool execution entry point
backend/modules/llm/litellm_caller.py	Added validation when tool_choice is "required" to ensure LLM returns tool calls
backend/mcp/pdfbasic/main.py	Modified to accept URLs in addition to base64, added URL download logic, extensive logging
backend/mcp/file_size_test/main.py	New file implementing test tools for file transfer validation
backend/application/chat/utilities/tool_utils.py	Added debug logging to track tool execution workflow steps
backend/application/chat/utilities/error_utils.py	Changed logging to show full LLM response object for debugging
backend/application/chat/service.py	Simplified file attachment to use direct S3 client access and emit files update

Copilot · 2025-10-30T00:57:41Z

backend/mcp/pdfbasic/main.py

+    """
+    Extract and analyze text content from PDF documents with comprehensive word frequency analysis.
+
+    This  PDF processing tool provides detailed text analytics for PDF documents:


Corrected spacing: 'This PDF' contains double space, should be 'This PDF'.

Suggested change

This PDF processing tool provides detailed text analytics for PDF documents:

This PDF processing tool provides detailed text analytics for PDF documents:

Copilot · 2025-10-30T00:57:42Z

backend/mcp/pdfbasic/main.py

+            if filename.startswith("/"):
+                # Construct absolute URL from relative path
+                # Default to localhost:8000 for local development
+                import os


Import statement should be moved to the top of the file with other imports. The os module is imported inside a conditional block, but it's a standard library import and should be at module level for consistency and readability.

Copilot · 2025-10-30T00:57:42Z

backend/mcp/file_size_test/main.py

+    print(f"DEBUG: username: {username}")
+    try:


Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot · 2025-10-30T00:57:42Z

backend/mcp/file_size_test/main.py

+            filename.startswith("/")
+        )
+        print(f"DEBUG: is_url determined as: {is_url}")
+


Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot · 2025-10-30T00:57:42Z

backend/mcp/file_size_test/main.py

+            original_filename = "processed_file.txt"
+
+        print(f"DEBUG: Original file size: {len(file_bytes)} bytes")
+


Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot · 2025-10-30T00:57:43Z

backend/mcp/file_size_test/main.py

+        print(f"DEBUG: Exception type: {type(e).__name__}")
+        print(f"DEBUG: Filename that caused error: {filename}")
+        import traceback
+        print("DEBUG: Full traceback:")
+        traceback.print_exc()


Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot · 2025-10-30T00:57:44Z

backend/mcp/file_size_test/main.py

+            }
+        }
+        print(f"DEBUG: About to return error result: {error_result}")
+        return error_result


Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot · 2025-10-30T00:57:44Z

backend/application/chat/service.py

        try:
            # Get file metadata
-            file_result = await self.file_manager.get_file(user_email, s3_key)
+            file_result = await self.file_manager.s3_client.get_file(user_email, s3_key)


Direct access to s3_client breaks encapsulation. The file_manager should provide a method to retrieve files rather than exposing its internal s3_client. This creates tight coupling and makes the code harder to maintain or refactor.

Suggested change

file_result = await self.file_manager.s3_client.get_file(user_email, s3_key)

file_result = await self.file_manager.get_file(user_email, s3_key)

Copilot · 2025-10-30T00:57:44Z

backend/mcp/pdfbasic/main.py

-    analysis_result = _analyze_pdf_content(instructions, filename, file_data_base64)
-    if "error" in analysis_result:
-        return analysis_result # Return the error if analysis failed
+    analysis_result = _analyze_pdf_content(instructions, filename, original_filename)


Error handling logic changed from checking top-level 'error' key to checking within 'results'. This is inconsistent with the error return structure in _analyze_pdf_content (line 122) which returns {'results': {'error': ...}}. However, the function should also handle cases where the entire analysis fails and returns {'error': ...} at the top level, not nested in results.

Suggested change

analysis_result = _analyze_pdf_content(instructions, filename, original_filename)

analysis_result = _analyze_pdf_content(instructions, filename, original_filename)

if "error" in analysis_result:

return analysis_result

Copilot · 2025-10-30T00:57:44Z

backend/mcp/file_size_test/main.py

+"""
+
+import base64
+import os


Import of 'io' is not used.

Suggested change

import os

- Replace print() debugging statements with proper logger calls in file_size_test/main.py - Move os import to top of pdfbasic/main.py for consistency - Fix double space typo in pdfbasic/main.py docstring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

garland3 · 2025-10-30T01:08:05Z

Issue #20 is addressed

Copilot

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Copilot · 2025-10-30T01:12:35Z

backend/modules/llm/litellm_caller.py

+
+            if tool_choice == "required" and not getattr(message, 'tool_calls', None):
+                logger.error(f"LLM failed to return tool calls when tool_choice was 'required'. Full response: {response}")
+                raise ValueError("LLM failed to return tool calls when tool_choice was 'required'.")


This validation check will prevent the existing fallback mechanism from working. Lines 222-242 implement retry logic with tool_choice='auto' when 'required' fails, but this new check raises an exception before that retry logic can execute. The exception should only be raised after the retry logic has also failed, or the check should be removed to allow the existing error handling to work.

Suggested change

raise ValueError("LLM failed to return tool calls when tool_choice was 'required'.")

raise Exception("LLM failed to return tool calls when tool_choice was 'required'.")

Copilot · 2025-10-30T01:12:35Z

backend/mcp/pdfbasic/main.py



-def _analyze_pdf_content(instructions: str, filename: str, file_data_base64: str) -> Dict[str, Any]:
+def _analyze_pdf_content(instructions: str, filename: str, original_filename: Optional[str] = None) -> Dict[str, Any]:


The function signature changed to remove the file_data_base64 parameter but the parameter name filename is now overloaded to mean either a URL or base64-encoded data (line 74). This is confusing API design. Consider renaming the parameter to file_reference or file_input to better reflect its dual purpose.

Copilot · 2025-10-30T01:12:36Z

backend/mcp/pdfbasic/main.py

+            if filename.startswith("/"):
+                # Construct absolute URL from relative path
+                # Default to localhost:8000 for local development
+                backend_url = os.getenv("BACKEND_URL", "http://localhost:8000")


The BACKEND_URL environment variable is used without validation. If this URL is user-controllable or comes from an untrusted source, it could enable SSRF attacks. Consider validating that the URL points to an allowed domain or use a configuration-based whitelist.

Copilot · 2025-10-30T01:12:36Z

backend/mcp/file_size_test/main.py

+
+        if is_url:
+            if filename.startswith("/"):
+                backend_url = os.getenv("BACKEND_URL", "http://localhost:8000")


The BACKEND_URL environment variable is used without validation. If this URL is user-controllable or comes from an untrusted source, it could enable SSRF attacks. Consider validating that the URL points to an allowed domain or use a configuration-based whitelist.

Copilot · 2025-10-30T01:12:36Z

backend/mcp/pdfbasic/main.py

+                url = filename
+
+            logger.info(f"Step 9: Downloading file from URL: {url}")
+            response = requests.get(url)


The requests.get() call lacks a timeout parameter, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.

Suggested change

response = requests.get(url)

response = requests.get(url, timeout=30)

Copilot · 2025-10-30T01:12:36Z

backend/mcp/file_size_test/main.py

+            else:
+                url = filename
+            logger.info(f"Downloading file for processing: {url}")
+            response = requests.get(url)


The requests.get() calls lack timeout parameters, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.

Suggested change

response = requests.get(url)

response = requests.get(url, timeout=30)

Copilot · 2025-10-30T01:12:37Z

backend/mcp/file_size_test/main.py

+
+            logger.debug(f"About to download from URL: {url}")
+            logger.info(f"Downloading file from URL: {url}")
+            response = requests.get(url)


The requests.get() calls lack timeout parameters, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.

Suggested change

response = requests.get(url)

response = requests.get(url, timeout=30)

Copilot · 2025-10-30T01:12:37Z

backend/mcp/pdfbasic/main.py

-    if "error" in analysis_result:
-        return analysis_result # Return the error if analysis failed
+    analysis_result = _analyze_pdf_content(instructions, filename, original_filename)
+    if "error" in analysis_result.get("results", {}):


The error handling logic changed from checking if \"error\" in analysis_result (line 275 in old code) to checking within the nested 'results' key. However, looking at line 122, errors are returned as {\"results\": {\"error\": ...}}, so the new check is correct, but if _analyze_pdf_content ever returns errors in a different format (e.g., at the top level), they won't be caught. Consider maintaining consistency in error response structure or handling both formats.

Suggested change

if "error" in analysis_result.get("results", {}):

if "error" in analysis_result or "error" in analysis_result.get("results", {}):

- Add 30-second timeout to all requests.get() calls to prevent indefinite hangs - Replace traceback.print_exc() with logger.exception() for better logging - Consolidate error logging messages for improved readability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

garland3 added 2 commits October 29, 2025 05:38

garland3 requested a review from Copilot October 30, 2025 00:53

github-advanced-security bot found potential problems Oct 30, 2025

View reviewed changes

backend/mcp/file_size_test/main.py Fixed Show fixed Hide fixed

Potential fix for code scanning alert no. 280: Unused import

bf6f559

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot AI reviewed Oct 30, 2025

View reviewed changes

garland3 requested a review from Copilot October 30, 2025 01:09

Copilot AI reviewed Oct 30, 2025

View reviewed changes

garland3 merged commit 87c917e into main Oct 30, 2025
7 of 8 checks passed

garland3 deleted the fix-file-management branch October 30, 2025 01:28

garland3 mentioned this pull request Oct 30, 2025

file transfer from the frontend to a mcp tool is broken #20

Closed

	This PDF processing tool provides detailed text analytics for PDF documents:
	This PDF processing tool provides detailed text analytics for PDF documents:

		original_filename = "processed_file.txt"

		print(f"DEBUG: Original file size: {len(file_bytes)} bytes")

	file_result = await self.file_manager.s3_client.get_file(user_email, s3_key)
	file_result = await self.file_manager.get_file(user_email, s3_key)

	raise ValueError("LLM failed to return tool calls when tool_choice was 'required'.")
	raise Exception("LLM failed to return tool calls when tool_choice was 'required'.")



		def _analyze_pdf_content(instructions: str, filename: str, file_data_base64: str) -> Dict[str, Any]:
		def _analyze_pdf_content(instructions: str, filename: str, original_filename: Optional[str] = None) -> Dict[str, Any]:

	response = requests.get(url)
	response = requests.get(url, timeout=30)

	if "error" in analysis_result.get("results", {}):
	if "error" in analysis_result or "error" in analysis_result.get("results", {}):

Fix file management #24

Fix file management #24

Uh oh!

Conversation

garland3 commented Oct 30, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

garland3 commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

garland3 commented Oct 30, 2025 •

edited

Loading