Skip to content

Conversation

@garland3
Copy link
Collaborator

This pull request introduces several improvements and new features related to file handling and tool integration, particularly focusing on file transfer, processing, and PDF analysis. The main highlights include a new MCP server for file size testing, enhanced PDF tool support for URL and base64 file inputs, and improved logging and debugging throughout the tool execution workflow.

New Features:

  • Added a new MCP server file_size_test/main.py that provides two tools: get_file_size (returns the size of a transferred file, supporting both URLs and base64 data) and process_file_demo (demonstrates file processing and artifact return using the v2 MCP artifacts contract).

PDF Tool Enhancements:

  • Updated pdfbasic/main.py to support file input via both URLs (including relative API paths) and base64, improving compatibility with frontend file transfer workflows. The PDF analysis tools now use the original filename when available and provide more robust error handling and logging. [1] [2] [3] [4] [5] [6] [7]

File Attachment and Session Context Improvements:

  • Refactored file attachment logic in chat/service.py to directly add file references to the session context (instead of using a utility function), and emit a files_update event to notify the UI after attachment. [1] [2]

Logging and Debugging Improvements:

  • Added detailed step-by-step logging throughout the tool execution workflow in tool_utils.py to aid in debugging and tracing tool calls, argument preparation, and file URL rewriting. [1] [2] [3] [4] [5]
  • Improved logging of LLM tool call responses in error_utils.py for better traceability.

…ndling

- Add detailed step logging in tool_utils.py for execute_tools_workflow,
  execute_single_tool, and prepare_tool_arguments functions to trace tool
  execution process
- Modify llm logging in error_utils.py to output full llm_response instead
  of just has_tool_calls for better debugging
- Update pdfbasic main.py to include logging import and adjust
  _analyze_pdf_content function to accept optional original_filename parameter
  for improved context handling in PDF analysis
- Refactor chat service to directly manage file references in session context for existing S3 files, bypassing complex file handling utilities and improving efficiency
- Modify PDF analysis tool to generate in-memory PDF reports with word frequency summaries, providing better visual output for text analytics
@garland3 garland3 requested a review from Copilot October 30, 2025 00:53
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds file transfer support for MCP tools and improves debugging capabilities. The changes enable tools to accept files via URLs (uploaded to S3) instead of base64 encoding, add new test tools for file handling, and enhance logging throughout the tool execution pipeline.

Key changes:

  • Added URL-based file transfer support for MCP tools (alongside existing base64)
  • Introduced two new MCP tools: pdfbasic (PDF analysis) and file_size_test (file transfer testing)
  • Enhanced logging with debug "Step" markers throughout tool execution flow

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
config/overrides/mcp.json Added configuration for two new MCP tools: pdfbasic and file_size_test
base-image-update-plan.md Removed outdated Ubuntu to Fedora migration plan document
backend/modules/mcp_tools/client.py Added debug logging at tool execution entry point
backend/modules/llm/litellm_caller.py Added validation when tool_choice is "required" to ensure LLM returns tool calls
backend/mcp/pdfbasic/main.py Modified to accept URLs in addition to base64, added URL download logic, extensive logging
backend/mcp/file_size_test/main.py New file implementing test tools for file transfer validation
backend/application/chat/utilities/tool_utils.py Added debug logging to track tool execution workflow steps
backend/application/chat/utilities/error_utils.py Changed logging to show full LLM response object for debugging
backend/application/chat/service.py Simplified file attachment to use direct S3 client access and emit files update

"""
Extract and analyze text content from PDF documents with comprehensive word frequency analysis.
This PDF processing tool provides detailed text analytics for PDF documents:
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spacing: 'This PDF' contains double space, should be 'This PDF'.

Suggested change
This PDF processing tool provides detailed text analytics for PDF documents:
This PDF processing tool provides detailed text analytics for PDF documents:

Copilot uses AI. Check for mistakes.
if filename.startswith("/"):
# Construct absolute URL from relative path
# Default to localhost:8000 for local development
import os
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import statement should be moved to the top of the file with other imports. The os module is imported inside a conditional block, but it's a standard library import and should be at module level for consistency and readability.

Copilot uses AI. Check for mistakes.
Comment on lines 59 to 60
print(f"DEBUG: username: {username}")
try:
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot uses AI. Check for mistakes.
filename.startswith("/")
)
print(f"DEBUG: is_url determined as: {is_url}")

Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot uses AI. Check for mistakes.
original_filename = "processed_file.txt"

print(f"DEBUG: Original file size: {len(file_bytes)} bytes")

Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot uses AI. Check for mistakes.
Comment on lines 261 to 265
print(f"DEBUG: Exception type: {type(e).__name__}")
print(f"DEBUG: Filename that caused error: {filename}")
import traceback
print("DEBUG: Full traceback:")
traceback.print_exc()
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot uses AI. Check for mistakes.
}
}
print(f"DEBUG: About to return error result: {error_result}")
return error_result
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive use of print() statements for debugging should be replaced with proper logging using the logger object that's already configured (line 16). The logger provides better control over log levels, formatting, and output destinations.

Copilot uses AI. Check for mistakes.
try:
# Get file metadata
file_result = await self.file_manager.get_file(user_email, s3_key)
file_result = await self.file_manager.s3_client.get_file(user_email, s3_key)
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct access to s3_client breaks encapsulation. The file_manager should provide a method to retrieve files rather than exposing its internal s3_client. This creates tight coupling and makes the code harder to maintain or refactor.

Suggested change
file_result = await self.file_manager.s3_client.get_file(user_email, s3_key)
file_result = await self.file_manager.get_file(user_email, s3_key)

Copilot uses AI. Check for mistakes.
analysis_result = _analyze_pdf_content(instructions, filename, file_data_base64)
if "error" in analysis_result:
return analysis_result # Return the error if analysis failed
analysis_result = _analyze_pdf_content(instructions, filename, original_filename)
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling logic changed from checking top-level 'error' key to checking within 'results'. This is inconsistent with the error return structure in _analyze_pdf_content (line 122) which returns {'results': {'error': ...}}. However, the function should also handle cases where the entire analysis fails and returns {'error': ...} at the top level, not nested in results.

Suggested change
analysis_result = _analyze_pdf_content(instructions, filename, original_filename)
analysis_result = _analyze_pdf_content(instructions, filename, original_filename)
if "error" in analysis_result:
return analysis_result

Copilot uses AI. Check for mistakes.
"""

import base64
import os
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'io' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
- Replace print() debugging statements with proper logger calls in file_size_test/main.py
- Move os import to top of pdfbasic/main.py for consistency
- Fix double space typo in pdfbasic/main.py docstring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@garland3
Copy link
Collaborator Author

garland3 commented Oct 30, 2025

Issue #20 is addressed

@garland3 garland3 requested a review from Copilot October 30, 2025 01:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.


if tool_choice == "required" and not getattr(message, 'tool_calls', None):
logger.error(f"LLM failed to return tool calls when tool_choice was 'required'. Full response: {response}")
raise ValueError("LLM failed to return tool calls when tool_choice was 'required'.")
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation check will prevent the existing fallback mechanism from working. Lines 222-242 implement retry logic with tool_choice='auto' when 'required' fails, but this new check raises an exception before that retry logic can execute. The exception should only be raised after the retry logic has also failed, or the check should be removed to allow the existing error handling to work.

Suggested change
raise ValueError("LLM failed to return tool calls when tool_choice was 'required'.")
raise Exception("LLM failed to return tool calls when tool_choice was 'required'.")

Copilot uses AI. Check for mistakes.


def _analyze_pdf_content(instructions: str, filename: str, file_data_base64: str) -> Dict[str, Any]:
def _analyze_pdf_content(instructions: str, filename: str, original_filename: Optional[str] = None) -> Dict[str, Any]:
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature changed to remove the file_data_base64 parameter but the parameter name filename is now overloaded to mean either a URL or base64-encoded data (line 74). This is confusing API design. Consider renaming the parameter to file_reference or file_input to better reflect its dual purpose.

Copilot uses AI. Check for mistakes.
if filename.startswith("/"):
# Construct absolute URL from relative path
# Default to localhost:8000 for local development
backend_url = os.getenv("BACKEND_URL", "http://localhost:8000")
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BACKEND_URL environment variable is used without validation. If this URL is user-controllable or comes from an untrusted source, it could enable SSRF attacks. Consider validating that the URL points to an allowed domain or use a configuration-based whitelist.

Copilot uses AI. Check for mistakes.

if is_url:
if filename.startswith("/"):
backend_url = os.getenv("BACKEND_URL", "http://localhost:8000")
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BACKEND_URL environment variable is used without validation. If this URL is user-controllable or comes from an untrusted source, it could enable SSRF attacks. Consider validating that the URL points to an allowed domain or use a configuration-based whitelist.

Copilot uses AI. Check for mistakes.
url = filename

logger.info(f"Step 9: Downloading file from URL: {url}")
response = requests.get(url)
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests.get() call lacks a timeout parameter, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.

Suggested change
response = requests.get(url)
response = requests.get(url, timeout=30)

Copilot uses AI. Check for mistakes.
else:
url = filename
logger.info(f"Downloading file for processing: {url}")
response = requests.get(url)
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests.get() calls lack timeout parameters, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.

Suggested change
response = requests.get(url)
response = requests.get(url, timeout=30)

Copilot uses AI. Check for mistakes.

logger.debug(f"About to download from URL: {url}")
logger.info(f"Downloading file from URL: {url}")
response = requests.get(url)
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests.get() calls lack timeout parameters, which could cause the application to hang indefinitely if the server doesn't respond. Add a timeout parameter (e.g., timeout=30) to prevent indefinite blocking.

Suggested change
response = requests.get(url)
response = requests.get(url, timeout=30)

Copilot uses AI. Check for mistakes.
if "error" in analysis_result:
return analysis_result # Return the error if analysis failed
analysis_result = _analyze_pdf_content(instructions, filename, original_filename)
if "error" in analysis_result.get("results", {}):
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling logic changed from checking if \"error\" in analysis_result (line 275 in old code) to checking within the nested 'results' key. However, looking at line 122, errors are returned as {\"results\": {\"error\": ...}}, so the new check is correct, but if _analyze_pdf_content ever returns errors in a different format (e.g., at the top level), they won't be caught. Consider maintaining consistency in error response structure or handling both formats.

Suggested change
if "error" in analysis_result.get("results", {}):
if "error" in analysis_result or "error" in analysis_result.get("results", {}):

Copilot uses AI. Check for mistakes.
- Add 30-second timeout to all requests.get() calls to prevent indefinite hangs
- Replace traceback.print_exc() with logger.exception() for better logging
- Consolidate error logging messages for improved readability

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@garland3 garland3 merged commit 87c917e into main Oct 30, 2025
7 of 8 checks passed
@garland3 garland3 deleted the fix-file-management branch October 30, 2025 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants