Skip to content

feat(channels): audience-gated attachment ingress contract + Slack PDF support#601

Merged
Aaronontheweb merged 8 commits into
devfrom
feature/channel-ingress-attachments
Apr 12, 2026
Merged

feat(channels): audience-gated attachment ingress contract + Slack PDF support#601
Aaronontheweb merged 8 commits into
devfrom
feature/channel-ingress-attachments

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

Fixes the user-reported bug where PDFs (and every other non-image
attachment) were silently dropped from Slack messages with no feedback
to the user. The root cause was two silent-drop holes layered on top
of each other:

  1. SlackThreadBindingActor.cs:218 hard-coded an image/* allowlist
    and swallowed every other MIME type with a DEBUG log line.
  2. LlmSessionActor.cs:1703-1717 separately stripped image refs from
    any turn targeting a non-vision model, emitting only a terse
    [Images removed …] placeholder.

This PR closes both holes with a single cross-channel ingress
contract, adds a per-audience policy surface for attachment trust,
moves per-session logs out of the agent-readable working directory,
and gives the agent a dynamic system-prompt hint so it can explain
to users what it received and why it can or can't view it.

What's in the box

Config surface (Netclaw.Configuration)

  • AttachmentCategory enum (Image, Pdf, Document, Archive, Media, Other)
    and AttachmentCategories.FromMime as the single centralized MIME
    classifier. Unknown types fail-closed to Other.
  • ChannelAttachmentPolicy record on ToolAudienceProfile with
    AllowedCategories, MaxFileBytes (default 25 MiB), and
    MaxFilesPerMessage (default 10). Defaults per audience:
    • Public: { Image } — conservative, since any workspace member
      can upload in a public Slack channel.
    • Team: everything except Other (unknown binaries).
    • Personal: all six categories.
  • Cap-consistency validation on daemon startup via
    ToolAudienceProfiles.ValidateChannelAttachments, plus a matching
    check in ToolAudienceProfilesDoctorCheck.
  • netclaw-config.v1.schema.json extended with ChannelAttachmentPolicy
    and AttachmentCategory definitions, with "default" values for
    SchemaFixResolver auto-migration.

Session working directory hardening (Netclaw.Actors.Protocol)

  • Removed the legacy single-arg
    SessionDirectoryHelper.GetSessionDirectory(SessionId) overload
    that defaulted to Path.GetTempPath(). NetclawPaths is now
    required on SessionServices and SessionPipeline — no more null
    fallbacks that silently leaked session data to /tmp.
  • InboxWriter with filesystem-level collision suffixing
    (foo.pdffoo_1.pdf → … up to _99), atomic writes via
    temp-file + File.Move, and FilenameSanitizer-backed path
    traversal defense.
  • GetOrCreateInboxDirectory + InboxSubdirectory/MediaSubdirectory
    constants.

Logs out of agent-readable session dir

  • Renamed NetclawPaths.LegacySessionLogsDirectory
    SessionLogsDirectory at {BasePath}/logs/sessions/.
    SessionLogActor now writes there instead of
    {SessionsDirectory}/{id}/logs/, so the agent's file_read tool
    (scoped to {session_dir}) can no longer observe its own audit
    trail — a leak the original session-dir layout had since the
    previous session-log refactor.

Slack ingress rewrite (SlackThreadBindingActor)

  • Implements the canonical attachment pipeline: resolve audience →
    pre-download gates (category, size, per-message count) → download
    → content scan → atomic inbox write → build [attachment] line →
    capability-gated DataContent inline. Pre-download gates
    short-circuit on Slack-reported metadata without burning bandwidth.
  • Canonical [attachment] text announcement format:
    [attachment] name="report.pdf" mime="application/pdf" size=284512 path="inbox/report.pdf" inlined="true"
    
    Multi-file messages batch into one TextContent block, preserving
    order.
  • note strings sourced from a shared AttachmentNotes class with
    two canonical prefix classes ("current model has no ..." for
    model-modality gaps, "format not inlineable" for formats no model
    can render natively).
  • Every rejection mode (category, size, count, scan, capability gap,
    inbox write failure, collision exhaustion) posts a user-visible
    reply through the existing SafePostAsync path. WARN-level logs
    with structured fields on every rejection; INFO-level on acceptance.

LlmSessionActor strict-consumer contract

  • Deletes the silent image-strip block + the now-unreachable
    "only images, no text" skip block. Replacement is an ERROR log
    naming the model, modalities, and offending refs; drops the
    unsupported DataContent items from the turn; appends a
    [system] ingress bug notice to the user content so the LLM
    has something to respond to. The user always gets a reply
    explaining the situation — no silent skips.
  • AttachmentContextHint constant added to
    InjectDynamicContextLayers, conditional on file_read being in
    the session's available tools. Documents inbox/, the
    [attachment] line format, the two canonical note prefix classes,
    and an imperative that the agent must acknowledge every attachment
    by name even when it can't view the contents.

Tests added

  • Configuration: ChannelAttachmentPolicyTests (31 cases) cover
    defaults per audience, MIME classification, validator cap-consistency.
  • Actors / Protocol: InboxWriterTests (9 cases) cover atomic
    writes, collision suffixing up to 99, exhaustion exception,
    temp-file cleanup, path traversal sanitization.
  • Actors / Sessions: ModalityGateTests rewritten for the new
    strict-consumer contract (image-with-text and image-only cases
    assert the ingress bug notice reaches the LLM and the legacy
    placeholder is gone); AttachmentContextHintTests (6 bear-trap
    cases) pin the canonical hint shape and cross-check against
    AttachmentNotes constants.
  • Actors / Channels: SlackAttachmentIngressTests (7 end-to-end
    cases) exercise the rewritten Slack pipeline against stubbed HTTP /
    scanner / reply fakes — PDF inlined, docx path-only, public-channel
    docx rejected pre-download, oversize rejected pre-download,
    15-file batch rejected with text still forwarded, filename
    collision across turns produces _1 suffix, scanner rejection
    surfaces user-visible reply with no inbox write.
  • Cli / Doctor: two new ConfigSchemaDoctorCheckTests verify
    schema round-trip — a legacy config without any ChannelAttachments
    block validates (additive optional fields), and an explicit block
    also validates.
  • 13 existing test fixtures migrated to require NetclawPaths via
    a shared TestNetclawPaths.AddTestNetclawPaths() helper (a
    consequence of removing the legacy SessionDirectoryHelper
    overload).

Quality gates

  • dotnet build Netclaw.slnx — 0 warnings, 0 errors
  • dotnet test — Configuration 176 / Actors 925 / Cli 13 ConfigSchema
    cases all green (1,114 tests total covering the change)
  • dotnet slopwatch analyze — 0 new violations against the refreshed
    baseline
  • Schema round-trip tested explicitly via new Doctor tests

OpenSpec

The full OpenSpec change (openspec/changes/channel-ingress-attachments/)
ships in this PR — proposal, design doc, spec deltas against
netclaw-input-adapters, netclaw-slack-socket, tool-approval-gates,
and netclaw-session, plus the tasks.md with Phases 1–9 complete.

The original plan ran to 15 phases / 76 tasks; the truncated plan
that actually shipped is 9 phases. Deferred as explicit follow-ups
(documented in tasks.md under a "Deferred" section):

  • Phase 7 (original plan) — SlackThreadHistoryFetcher backfill
    rewrite. Live ingest is fixed; backfill still hard-filters to image
    attachments. Not a regression for the reported bug; needs its own
    small PR.
  • Eval suite regression cases — PDF round-trip, model-modality
    gap, format-not-inlineable. Process/documentation work.
  • PRD updates — PRD-009 (Input Adapters) and PRD-002 (Gateway
    Security Envelope) should grow new sections covering the contract.
  • System skill syncfeeds/skills/.system/files/netclaw-operations/SKILL.md
    should gain agent-facing guidance about inbox/ and the
    [attachment] line format to match the dynamic-context block.
  • OpenSpec finalization (/opsx-verify, /opsx-sync,
    /opsx-archive) — bookkeeping that lands after the follow-ups
    catch up.

Test plan

  • CI passes on all configurations
  • Manual: DM the agent a PDF in a dev workspace → agent summarizes
    its contents
  • Manual: DM the agent a .docx → agent uses shell_execute or
    file_read on the inbox/ path and replies with a summary
  • Manual: drop a .docx in a public channel → agent replies with
    the policy rejection explaining the category isn't allowed
  • Manual: DM the agent a 50 MiB file → agent replies with the
    25 MiB size limit
  • Manual: session-log visibility — confirm file_read("logs/*.log")
    from the agent returns nothing under the new
    {BasePath}/logs/sessions/{id}/ layout

Commits

  1. 2cc0122 plan(openspec): channel-ingress-attachments
  2. be3c696 feat(channels): audience-gated attachment policy surface
  3. 99bc5cf feat(slack): audience-gated attachment ingress pipeline
  4. 8f8b7e2 chore: delete unused capability helper, truncate scope
  5. 6417588 feat(session): strict-consumer modality contract + attachment tests

Status

Draft. Leaving in draft for review of scope and the OpenSpec
artifacts before flipping to ready-for-review.

Establish canonical cross-channel contract for user-uploaded file
attachments so Slack, Discord, and future channels route inbound files
through one uniform pipeline: audience-gated policy check, size and
count caps, content scan, download to the durable
{SessionDirectory}/inbox/, [attachment] text announcement, and
capability-gated DataContent inlining via ModelCapabilityActor.

Fixes the silent-drop of non-image attachments at
SlackThreadBindingActor.cs:218 and the silent image strip at
LlmSessionActor.cs:1705-1719 by moving modality routing to ingress and
making the session actor a strict consumer that logs loudly on any
unsupported modality it receives.

Introduces ToolAudienceProfile.ChannelAttachments with default policy:
Public = {Image}, Team = images+pdf+docs+archives+media, Personal =
everything. Conservative defaults for public channels where any
workspace member can upload, permissive for DMs and private channels
where workspace auth is a meaningful filter.

Artifacts: proposal, design (8 decisions with rationale), specs deltas
against netclaw-input-adapters, netclaw-slack-socket, tool-approval-gates,
and netclaw-session, and a 67-item task checklist covering code,
tests, evals, PRD updates, and system skill sync.
Groundwork for the channel-ingress-attachments OpenSpec change. Adds
the per-audience ChannelAttachmentPolicy config surface, the canonical
MIME→category mapping, the InboxWriter for atomic attachment writes
with filesystem-level collision suffixing, and the session working
directory hardening that the Slack ingress rewrite depends on. The
channel-adapter rewrite itself (SlackThreadBindingActor, thread
history backfill, LlmSessionActor strict-consumer contract) lands in
a follow-up commit.

Config surface (Netclaw.Configuration)
- AttachmentCategory enum {Image, Pdf, Document, Archive, Media, Other}
  and AttachmentCategories.FromMime as the single centralized MIME
  classifier; unknown types fail-closed to Other.
- ChannelAttachmentPolicy record on ToolAudienceProfile with
  AllowedCategories, MaxFileBytes (default 25 MiB), and
  MaxFilesPerMessage (default 10).
- ToolAudienceProfileDefaults now wires per-audience defaults:
  Public={Image}; Team=everything except Other; Personal=all six.
- ToolAudienceProfiles.ValidateChannelAttachments enforces
  cap>0 when a category is allowed; daemon startup in Program.cs
  aborts on violation.
- netclaw-config.v1.schema.json extended with AttachmentCategory
  enum and ChannelAttachmentPolicy block, including default values so
  SchemaFixResolver can auto-migrate stale configs.
- ToolAudienceProfilesDoctorCheck validates the new policy and warns
  when SessionsDirectory resolves under Path.GetTempPath().

Session working directory hardening (Netclaw.Actors)
- SessionDirectoryHelper drops the legacy single-arg
  GetSessionDirectory(SessionId) overload that defaulted to
  Path.GetTempPath(). NetclawPaths is now required on SessionServices
  and SessionPipeline — no more null-coalescing fallbacks that silently
  leaked session data to /tmp. Added InboxSubdirectory/MediaSubdirectory
  constants, GetOrCreateInboxDirectory helper, and IsUnderTempPath
  diagnostic used by ToolAudienceProfilesDoctorCheck.
- InboxWriter provides ReserveUniquePath (filesystem-checked collision
  suffixing up to _99), WriteAtomicAsync (temp-file + File.Move), and
  SanitizeReserveAndWriteAsync that reuses FilenameSanitizer for path
  traversal defense. CollisionExhaustedException surfaces loud
  rejections to callers.
- LlmSessionActor collapses the redundant _sessionLogsBasePath field
  (it was a duplicate of _sessionsBasePath).

Logs out of agent-readable session dir (Netclaw.Configuration +
Netclaw.Actors + Netclaw.Daemon)
- NetclawPaths.LegacySessionLogsDirectory renamed to
  SessionLogsDirectory at {BasePath}/logs/sessions/. Previously
  SessionLogActor wrote logs into {BasePath}/sessions/{id}/logs/,
  which is inside the session working directory the agent's file_read
  tool can access via the {session_dir} token. Logs now live outside
  that tree so the LLM cannot observe its own audit trail.
  SessionCatalogService and LlmSessionActor both updated to the new
  path. EnsureDirectoriesExist creates SessionLogsDirectory.
- SessionLogActor.GetSessionLogsDirectory is now public (was internal)
  so Netclaw.Daemon can compute the same path deterministically.

Tests
- ChannelAttachmentPolicyTests (31 cases) covering default matrix per
  audience, MIME→category mapping with case-insensitivity and
  unknown-type fallback, and validator cap-vs-allowlist consistency.
- InboxWriterTests (9 cases) covering plain writes, collision chains,
  extensionless names, exhaustion exception, atomic-write temp-file
  cleanup, and path-traversal sanitization in the round-trip helper.
- 13 test fixtures that construct SessionServices migrated to require
  a NetclawPaths registration via a new shared
  TestNetclawPaths.AddTestNetclawPaths() helper.
- ReminderManagerActorTests direct SessionPipeline construction
  updated to pass NetclawPaths.

Full solution builds clean; 176 Netclaw.Configuration.Tests + 912
Netclaw.Actors.Tests pass with no regressions.

Refs: openspec/changes/channel-ingress-attachments/{proposal,design,specs,tasks}.md
Rewrites SlackThreadBindingActor's attachment handling to implement
the canonical cross-channel ingress contract defined in
netclaw-input-adapters. The previous behavior — silently dropping
non-image files at SlackThreadBindingActor.cs:218 — is gone. Inbound
Slack attachments now flow through per-file audience/category checks,
size caps, per-message count caps, download, scan, atomic inbox
write, and capability-gated DataContent inlining. Every rejection
path posts a user-visible reply; no silent drops.

Slack ingress pipeline (SlackThreadBindingActor)
- ProcessInboundAttachmentsAsync runs the eleven-step contract in
  order: resolve TrustAudience → look up ChannelAttachmentPolicy →
  reject the whole attachment batch if count exceeds
  MaxFilesPerMessage → for each file, map MIME→AttachmentCategory,
  check AllowedCategories, check MaxFileBytes, download, scan,
  write to {SessionDirectory}/inbox/ via InboxWriter, decide
  inlining against ModelCapabilities.InputModalities, build the
  canonical [attachment] text line, optionally emit DataContent.
- TryIngestSingleAttachmentAsync encapsulates the per-file work.
  Pre-download gates (category, size, count) short-circuit without
  consuming bandwidth. A single multi-file message produces exactly
  one TextContent carrying all [attachment] lines in message order,
  plus one DataContent per inlined file.
- BuildAttachmentLine emits the canonical format:
  [attachment] name="..." mime="..." size=N path="inbox/..." inlined="true|false" [note="..."]
  with EscapeQuoted handling for embedded quotes and backslashes.
- FormatBytes renders human-readable byte counts in user-visible
  rejection replies.
- All rejection paths surface through SafePostAsync with a
  category-specific message (too-many-files, category-not-allowed,
  too-large, download-failed, scan-blocked, collision-exhausted,
  inbox-write-failed). Loud WARN logs with structured fields
  (name, mime, audience, category, inlined) replace the previous
  DEBUG silent-drop line.

Shared helpers (Netclaw.Actors.Channels)
- AttachmentNotes: static class with the three canonical note
  strings (ModelMissingImage, ModelMissingPdf, FormatNotInlineable)
  per the netclaw-input-adapters spec. Channel adapters SHALL
  source note text from here so the canonical prefixes never drift.
- ChannelIngressCapabilityQuery: thin wrapper over
  ModelCapabilityActor.Ask with a 2-second default timeout and a
  typed CapabilityQueryResult (Ok / TimedOut / Failed). The Slack
  binding actor reads ModelCapabilities directly rather than going
  through this helper, but the helper remains available for future
  channels (Discord, subagents) that need dynamic model resolution.

SlackGatewayDependencies
- Gains three required fields: ToolAudienceProfiles (for
  ChannelAttachmentPolicy lookup), ModelCapabilities (for inline
  gating), and NetclawPaths (for inbox write target). SlackChannel
  resolves them from DI and forwards them to the gateway actor.

Tests
- ChannelIngressCapabilityQueryTests: 5 Akka.Hosting.TestKit cases
  covering happy path (cache-hit response), timeout fallback,
  text-only modalities, and argument validation.
- TestSlackGatewayDeps: shared test helper with default
  AudienceProfiles, vision-capable / text-only ModelCapabilities,
  and per-test NetclawPaths so Slack test fixtures don't have to
  repeat the boilerplate.
- 22 SlackGatewayDependencies construction sites across four test
  files updated to supply the new required fields.

Full solution builds clean; 917 Netclaw.Actors.Tests pass (+5 new
ChannelIngressCapabilityQuery cases, no regressions in the 34
existing Slack integration tests that exercise the rewritten path).

Remaining work (tracked in tasks.md):
- Phase 7: SlackThreadHistoryFetcher backfill rewrite (generalize
  from image-only to all MIME types with audience/capability gating
  at merge time).
- Phase 8: LlmSessionActor strict-consumer contract + attachment-
  aware dynamic context block.
- Phases 9-15: unit tests, eval cases, PRD updates, skill sync,
  quality gates, OpenSpec finalization.

Refs: openspec/changes/channel-ingress-attachments/
…truncate scope

Removes ChannelIngressCapabilityQuery + CapabilityQueryResult + its
test file. Built as scaffolding for dynamic-model channels (Discord,
subagents, failover) per design D1, then never called: Phase 6
revealed the Slack implementation can read
_dependencies.ModelCapabilities.InputModalities directly from the DI
singleton with no round-trip cost. Zero production call sites, zero
risk to remove. When a second channel actually needs dynamic modality
resolution, the right shape for it will emerge from its own
requirements — pre-building it here was premature.

Also truncates tasks.md to reflect the narrower scope that actually
matters for the user-reported bug (silent PDF drop in Slack ingest).
The original plan ran 15 phases / 76 tasks; the truncated plan is 9
phases covering config surface, path hardening, inbox writer, Slack
ingress rewrite, LlmSessionActor strict-consumer contract, targeted
regression tests, and quality gates. Deferred as explicit follow-up
work: backfill rewrite, eval suite, PRD updates, system skill sync,
OpenSpec finalization.

Net: -245 lines production + -147 lines test + tasks.md scope note
added. 158 affected tests still pass.
Closes out Phases 7-9 of the channel-ingress-attachments change.
The user-reported bug (silent PDF drop in Slack ingest) was already
fixed in earlier commits; this commit hardens the second silent-drop
path in LlmSessionActor, adds the agent-facing attachment hint,
covers the rewritten Slack pipeline with targeted regression tests,
and closes the quality gates.

LlmSessionActor strict-consumer contract
- Deletes the silent image-strip block (lines 1703-1717 in the old
  layout) and its follow-on "only images, no text" skip block. The
  replacement logs ERROR with ModelId, InputModalities, and the
  offending refs, drops the unsupported DataContent items from the
  turn, and appends a [system] notice about the ingress bug to the
  user content so the LLM has something to respond to. No turns
  skipped — the user always gets a reply explaining the situation.
- The legacy placeholder string "[Images removed — the current
  model does not support vision input]" is gone from production code.
- InjectDynamicContextLayers now appends AttachmentContextHint to
  the system prompt when file_read is granted. The hint documents
  inbox/, the [attachment] line format, the two canonical note
  prefix classes (model-modality gap vs format-not-inlineable), and
  an imperative that the agent must acknowledge every attachment by
  name even when it cannot view the contents.
- AttachmentContextHint is an internal const so bear-trap tests can
  pin its shape without reaching into private state.

Tests
- ModalityGateTests rewritten for the new contract: both the
  text-with-image and image-only cases now assert the ingress bug
  notice appears in the user content sent to the LLM rather than a
  legacy "[Images removed]" TextOutput, and the image-only case no
  longer skips the turn.
- AttachmentContextHintTests (6 cases) pins the canonical shape of
  the dynamic-context block — names inbox/, documents inlined=true|
  false, explains both note classes, and forbids silent omission.
  Cross-checks the hint text against the AttachmentNotes constants
  so drift in either surface fails loudly.
- SlackAttachmentIngressTests (7 cases) exercises the rewritten
  Slack pipeline end-to-end against stubbed HTTP/scanner/reply
  fakes: PDF inlined, docx path-only, public-channel docx rejected
  pre-download, oversize rejected pre-download, too-many-files
  rejects the batch but forwards text, filename collision across
  turns in the same thread produces _1 suffix, scanner rejection
  surfaces a user-visible reply with no inbox write.

Quality gates
- dotnet build Netclaw.slnx: 0 warnings, 0 errors.
- dotnet test: Configuration 176, Actors 925, Cli 13 ConfigSchema
  cases (including two new round-trip tests for configs with and
  without an explicit ChannelAttachments block) — all green.
- dotnet slopwatch analyze: 0 new violations. Two intentional
  best-effort empty-catch patterns (InboxWriter.TryDeleteTemp and
  InboxWriterTests.Dispose) baselined in .slopwatch/baseline.json;
  these are deliberate temp-file and test-directory cleanup on
  throw paths where logging is not available.
- Schema round-trip: a legacy config without any ChannelAttachments
  block validates cleanly, as do configs with explicit blocks.

tasks.md updated to reflect all Phase 7-9 completion. The
channel-ingress-attachments change now has Phases 1-9 done; Phases
deferred to follow-ups (backfill rewrite, eval suite, PRD updates,
system skill sync, OpenSpec finalization) remain in the explicit
"Deferred" section.
Resolves two conflicts from dev commits landed while this branch
was in progress:

- LlmSessionActor.InjectDynamicContextLayers: both sides added a
  new dynamic-context block in the same spot. dev added the
  [working-context] block (recent files / working state); this
  branch added the [attachments] hint conditional on file_read.
  Both coexist — the working-context block emits first, then the
  attachments hint. Resolution keeps both additions.

- CompactionIntegrationTests.ConfigureServices: dev's WorkingContext
  changes registered a tool registry + fake executor for
  tool-execution compaction tests; this branch added the required
  NetclawPaths registration via the shared AddTestNetclawPaths()
  helper. Resolution interleaves both registrations in the natural
  order (tool executor + registry → test paths → session services
  → tool services).

Post-merge test state:
- dotnet build Netclaw.slnx: 0 warnings, 0 errors
- Configuration: 176 passing
- Actors: 960 passing (up from 925 — dev added 35 WorkingContext
  and compaction cases)
- Cli: 415 passing (up from 13 targeted earlier — full suite now
  includes a lot of unrelated tests, all green)

Total: 1,551 tests passing across the affected projects.
…s, magic string

Code-review pass on the channel-ingress-attachments work. Findings
that turned into fixes:

- Drop bytes.ToArray() in SlackThreadBindingActor before constructing
  DataContent — Microsoft.Extensions.AI.DataContent has a
  ReadOnlyMemory<byte> constructor that stores the memory directly.
  Saves up to 25 MiB × MaxFilesPerMessage of needless copying per
  inbound message in the worst case.
- Remove the two redundant Directory.CreateDirectory calls in
  InboxWriter (one in WriteAtomicAsync, one in
  SanitizeReserveAndWriteAsync). The Slack ingress already creates
  the inbox via SessionDirectoryHelper.GetOrCreateInboxDirectory
  before invoking the writer; the duplicate calls turn into 30
  extra stat syscalls per 10-file message. Documented the
  caller-creates contract on InboxWriter and SanitizeReserveAndWriteAsync.
- Add FileReadTool.ToolName const ("file_read") and reference it
  from LlmSessionActor.HasFileReadGranted instead of a magic string.
  Matches the existing ShellTool.ToolName pattern.
- Move the InjectDynamicContextLayers XML doc back onto its method
  — when HasFileReadGranted was inserted between the comment and
  its original target, the doc had stranded itself on the wrong
  method in IDE hover.
- Tighten the LlmSessionActor ingress-bug log line from a mixed
  snake_case + colon-prose hybrid to plain structured fields
  (model/modalities/offending) so it matches slack_attachment_rejected
  and other event-style logs in the file.

Build verified clean against the full solution.
@Aaronontheweb Aaronontheweb marked this pull request as ready for review April 12, 2026 00:15
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) April 12, 2026 00:15
@Aaronontheweb Aaronontheweb merged commit 622cc81 into dev Apr 12, 2026
4 checks passed
@Aaronontheweb Aaronontheweb deleted the feature/channel-ingress-attachments branch April 12, 2026 00:21
Aaronontheweb added a commit that referenced this pull request Apr 13, 2026
PR #601 added the audience-gated ChannelAttachmentPolicy that allows
PDFs, Office documents, archives, and media for Team/Personal
audiences — but MagicByteValidator's hardcoded AllowedExtensions
dictionary still only accepted PNG/JPG/GIF/WebP, rejecting everything
else at ingress with "File extension '.pdf' is not allowed". The
SlackAttachmentIngressTests suite used NullContentScanner by default,
so the Pdf_in_dm_* and Docx_in_dm_* tests only exercised the policy
layer and never saw the real scanner.

Rewrite MagicByteValidator around a signature-rule table keyed by
declared MIME. Support every category the Team audience advertises:
PDF, OOXML/ODF, legacy OLE Office, plain/structured text, RTF,
zip/7z/rar/gzip/bzip2/xz, and mp3/mp4/wav/ogg/avi/webm/mkv. Harden
each matcher beyond minimum magic — validate PDF version digit, ZIP
exact header pair, gzip DEFLATE method, bzip2 BCD-Pi block header,
RAR v4/v5 variant tail, ISO BMFF box size + printable-ASCII major
brand, Ogg version byte, ID3v2 major version, MP3 strict 12-bit sync
plus reserved-layer check. Seed ContentPolicy.DefaultAllowedMimeTypes
from the validator's supported set so the two layers can't drift, and
raise DefaultMaxFileSizeBytes from 20 MB to 25 MiB to match
ChannelAttachmentPolicy.

Flip SlackAttachmentIngressTests.BuildGateway to default to the real
MagicByteContentScanner so the existing Pdf_in_dm_* and Docx_in_dm_*
regression tests now actually exercise production behavior. Add a
PlainText_in_dm_* ingress test, 13 MagicByteValidator category happy
paths, and 15 adversarial polyglot-rejection tests for the hardened
matchers.
Aaronontheweb added a commit that referenced this pull request Apr 13, 2026
…626)

* fix(security): expand MagicByteValidator beyond image-only allowlist

PR #601 added the audience-gated ChannelAttachmentPolicy that allows
PDFs, Office documents, archives, and media for Team/Personal
audiences — but MagicByteValidator's hardcoded AllowedExtensions
dictionary still only accepted PNG/JPG/GIF/WebP, rejecting everything
else at ingress with "File extension '.pdf' is not allowed". The
SlackAttachmentIngressTests suite used NullContentScanner by default,
so the Pdf_in_dm_* and Docx_in_dm_* tests only exercised the policy
layer and never saw the real scanner.

Rewrite MagicByteValidator around a signature-rule table keyed by
declared MIME. Support every category the Team audience advertises:
PDF, OOXML/ODF, legacy OLE Office, plain/structured text, RTF,
zip/7z/rar/gzip/bzip2/xz, and mp3/mp4/wav/ogg/avi/webm/mkv. Harden
each matcher beyond minimum magic — validate PDF version digit, ZIP
exact header pair, gzip DEFLATE method, bzip2 BCD-Pi block header,
RAR v4/v5 variant tail, ISO BMFF box size + printable-ASCII major
brand, Ogg version byte, ID3v2 major version, MP3 strict 12-bit sync
plus reserved-layer check. Seed ContentPolicy.DefaultAllowedMimeTypes
from the validator's supported set so the two layers can't drift, and
raise DefaultMaxFileSizeBytes from 20 MB to 25 MiB to match
ChannelAttachmentPolicy.

Flip SlackAttachmentIngressTests.BuildGateway to default to the real
MagicByteContentScanner so the existing Pdf_in_dm_* and Docx_in_dm_*
regression tests now actually exercise production behavior. Add a
PlainText_in_dm_* ingress test, 13 MagicByteValidator category happy
paths, and 15 adversarial polyglot-rejection tests for the hardened
matchers.

* fix slack attachment policy bypasses and fail-open scanning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant