Skip to content

Improve binary-file detection in template engine #47

@Ilyes512

Description

@Ilyes512

Parent: #42 (architectural review)

Problem

pkg/template/template.go isBinary (~line 256):

func isBinary(path string) bool {
    f, err := os.Open(path)
    if err != nil { return false }
    defer f.Close()

    buf := make([]byte, 512)
    n, _ := f.Read(buf)
    buf = buf[:n]

    if slices.Contains(buf, 0) { return true }
    return !utf8.Valid(buf)
}

This is a heuristic with two failure modes:

  1. A binary file whose first 512 bytes happen to contain no null bytes and parse as valid UTF-8 (rare for most formats but possible for some image headers, audio metadata, or compressed payloads with text-shaped prefixes) will be treated as text and run through text/template. If the body contains {{ literally, Parse will fail; if it doesn't, the file passes through untouched but the engine has paid the cost.
  2. A UTF-16 text file starts with a BOM and contains many zero bytes — it gets detected as binary and copied verbatim, missing template expansion. This is consistent with most tools but worth acknowledging.

Proposed direction

Two approaches; pick one or combine:

  1. Use http.DetectContentType on the first 512 bytes. If the result starts with application/ or is in a known binary class, copy verbatim. Then fall back to the existing null-byte / invalid-UTF-8 check.
  2. Promote .specsverbatim as the canonical mechanism. Document that binary detection is best-effort and that any binary-like file should be listed in .specsverbatim. Update docs/content/docs/architecture/template-engine.md accordingly. The implementation can stay simple if the docs make expectations clear.

Acceptance criteria

  • If approach 1: http.DetectContentType is the first check; null/UTF-8 is the fallback. Tests cover a small JPEG (binary), a UTF-8 source file (text), a UTF-16 BOM-prefixed file (currently treated as binary — preserve that behavior or change deliberately), and a tarball/gzip header (binary).
  • If approach 2: template-engine.md adds a "Binary detection" subsection with a recommended .specsverbatim pattern (e.g., *.png, *.jpg, *.ico, *.woff2, *.pdf).
  • Either way: a new test in pkg/template/template_test.go documents the chosen behavior with at least three file types.

References

  • pkg/template/template.go:256isBinary
  • pkg/template/template.go:194 — call site (if t.verbatim.Matches(...) || isBinary(srcPath))
  • pkg/template/verbatim.go.specsverbatim mechanism

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions