Skip to content

feat(tools): classify multimodal file reads#1240

Merged
Aaronontheweb merged 9 commits into
netclaw-dev:devfrom
Aaronontheweb:feature/file-read-multimodal-classification
May 31, 2026
Merged

feat(tools): classify multimodal file reads#1240
Aaronontheweb merged 9 commits into
netclaw-dev:devfrom
Aaronontheweb:feature/file-read-multimodal-classification

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Collaborator

@Aaronontheweb Aaronontheweb commented May 30, 2026

Summary

  • classify file_read targets before reading bytes
  • keep UTF-8 text reads and pagination behavior intact
  • hand image reads to image-capable models through session media references
  • return metadata and guidance for PDFs, media, archives, binary documents, and unknown binaries
  • harden model-input media materialization with modality, size, and magic-byte checks

Closes #1237

Validation

  • dotnet test src/Netclaw.Actors.Tests/Netclaw.Actors.Tests.csproj
  • dotnet slopwatch analyze
  • pwsh ./scripts/Add-FileHeaders.ps1 -Verify
  • git diff --check
  • openspec validate enable-file-read-multimodal-classification --strict

Notes

  • Draft PR, rebased on upstream dev.
  • Behavioral evals were not run locally because eval target credentials were not configured.

@Aaronontheweb Aaronontheweb added the tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc. label May 30, 2026
Comment thread src/Netclaw.Actors/Sessions/Pipelines/SessionToolExecutionPipeline.cs Outdated
@Aaronontheweb
Copy link
Copy Markdown
Collaborator Author

I have some thoughts on the code quality in this PR but I'm going to see what CoPilot comes up with first

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Collaborator Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging now to stall PR growth, but turning around and immediately refactoring and cleaning a few things up.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated racy test bug fix.

return;

var itemText = mediaReferences.Count == 1 ? "file" : "files";
_state = _state.AddSystemNudge(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll have to see how well this works in-practice.


namespace Netclaw.Security;

public static class MimeTypeCatalog
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to consolidate / clean this stuff in a subsequent PR - too many competing file type primitives floating around in the code base right now.

/// <summary>
/// Describes a file a tool wants to add to the next LLM call as model input.
/// </summary>
public sealed record ModelInputFileInfo(string FilePath, string FileName, string MimeType);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use value objects here.

@Aaronontheweb Aaronontheweb merged commit 594953f into netclaw-dev:dev May 31, 2026
14 checks passed
@Aaronontheweb Aaronontheweb deleted the feature/file-read-multimodal-classification branch May 31, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tools Issues related to agent tools: file_read, web_search, shell_execute, image processing, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable file_read to return parsed content for multimodal file types (images, audio, video)

2 participants