Skip to content

Add support for unparseable/encrypted files#309

Open
micprog wants to merge 9 commits into
masterfrom
slang-encrypted
Open

Add support for unparseable/encrypted files#309
micprog wants to merge 9 commits into
masterfrom
slang-encrypted

Conversation

@micprog
Copy link
Copy Markdown
Member

@micprog micprog commented Jun 2, 2026

No description provided.

@micprog micprog requested a review from fischeti June 2, 2026 11:07
@fischeti fischeti force-pushed the fischeti/filter-script branch from 9e123c1 to 3d9735e Compare June 3, 2026 13:00
Base automatically changed from fischeti/filter-script to master June 3, 2026 13:13
@micprog micprog force-pushed the slang-encrypted branch from 5ae5dae to f96c553 Compare June 3, 2026 13:32
micprog and others added 3 commits June 3, 2026 17:52
IEEE-1735 encrypted IP makes slang trip at the surrounding `endmodule`
even though the lexer skips the protect envelope itself. Previously,
that error in any one file aborted the entire `bender script --top X`
invocation because session.cpp threw on the first per-file error.

bender-slang now collects per-tree pass/fail status instead of throwing
on parse errors. System-load failures (file unreadable) are still
fatal — only parse errors are tolerated. Diagnostics are still surfaced
to stderr per file so users can see exactly what slang complained
about.

A new failed_tree_indices() bridge function lets the Rust side learn
which trees had errors. apply_slang_filters builds a force_kept_paths
set from those indices and unions it with the reachability-derived
kept_paths during file filtering, then prints a clear warning per
file: "slang reported parse errors in <path>; preserving in script
output regardless of reachability".

The tolerance is on by default with loud per-file warnings so real
syntax bugs still get noticed; users wanting strict behavior can read
stderr and act on it. The dangling-reference case (parent instantiates
a module slang can't see) is already handled in analysis.cpp's DFS,
which silently skips refs missing from the symbol table.

Fixture: tests/pickle has a new `encrypted` target with three files —
encrypted_ip.sv (realistic IEEE-1735 envelope around a module body),
encrypted_user.sv (instantiates encrypted_ip via a dangling ref), and
encrypted_top.sv. The test confirms all three survive `--top
encrypted_top` even though slang fails to parse encrypted_ip.sv.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iles

Three follow-ups on the parse-error tolerance added in the previous
commit:

* script now prints a single summary warning at end of filtering:
  "warning: slang reported parse errors in N file(s); kept in script
  output: a.sv, b.sv" — instead of one stderr line per file, which got
  noisy on big designs. The diagnostic for each file still comes
  through (slang emits those itself).

* script gains `--drop-unparseable`. Default behavior (keep) is
  unchanged. With the flag set, files slang couldn't parse are dropped
  from the output regardless of reachability; the summary warning's
  verb switches from "kept" to "dropped". Useful when the user knows
  the failures are real bugs in their RTL and wants the script to
  reflect that.

* `--source-annotations` now emits `// UNPARSEABLE: slang reported
  parse errors` above each kept-but-failed file in the script, so a
  reader can spot them in the file list itself. Reuses the existing
  FileEntry.comment plumbing via a fresh `unparseable_comment` helper
  in emit_template.

* pickle refuses on slang parse errors instead of silently emitting
  corrupt output: the slang printer reconstructs from the (partial)
  parse tree, and for encrypted protect envelopes that meant the
  base64 data block got mangled into something no downstream decrypt
  tool would accept. Pre-tolerance pickle threw automatically; now we
  check `failed_indices()` after parsing and surface a clear error
  pointing users at `bender script --top` for the file-list-only case.

apply_slang_filters now returns the filtered groups alongside the set
of kept-unparseable paths, which the caller threads into
emit_template for the comment annotation. The retain predicate is
unified into a single closure handling both the --top axis and the
--drop-unparseable axis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, the parse-error tolerance was blanket: any file slang
couldn't parse was kept in the script output with a warning. That
masked genuine syntax bugs in user RTL.

Use slang's own taxonomy to discriminate. The lexer/preprocessor emit
`slang::diag::ProtectedEnvelope` (the [-Wprotected-envelope] warning)
every time they skip an IEEE-1735 protect envelope; real syntax bugs
never trigger it. We capture that signal per tree alongside the
existing parse-error bit, expose it via `protected_tree_indices()`,
and split the failed set into two:

  * encrypted_paths — failed AND emitted ProtectedEnvelope. Always
    tolerated (legal SystemVerilog that slang merely struggles to
    digest fully).
  * broken_paths    — failed without ProtectedEnvelope. Looks like a
    real syntax bug; aborts the run by default.

A new `--allow-broken` flag flips that abort into a warning, mirroring
how `--allow-broken` (and similar escape hatches) work elsewhere in
bender. The summary stderr now prints separate lines for encrypted vs
broken so users can tell apart automatic vs explicit tolerance.

bender-slang itself stays policy-free: parse_group is still lenient
(returns Ok on parse errors, throws only on system-load failures), and
the new protected_indices() complements the existing failed_indices().
Policy lives in `bender` (script.rs decides; pickle.rs continues to
refuse all parse errors since rewriting a partial tree corrupts the
output).

bender-slang/tests/basic.rs: the old `parse_invalid_file_returns_parse_error`
test was wrong after the previous tolerance commit and is updated to
the new contract — it asserts parse_group succeeds, failed_indices()
reports broken.sv, and protected_indices() is empty.

Fixture: tests/pickle/Bender.yml gains a `broken` target wired to the
existing tests/pickle/src/broken.sv. New script tests cover both
failure (default) and `--allow-broken` (kept with warning).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@micprog micprog force-pushed the slang-encrypted branch from f96c553 to d83131a Compare June 3, 2026 15:52
@fischeti
Copy link
Copy Markdown
Contributor

fischeti commented Jun 3, 2026

The changes makes sense in general. I have some ideas on how to make the slang interface cleaner. Particularly the *_indices calls are not very elegant incl. the existing reachable_tree_indices. I implemented it like this previously, since cxx does not allow to pass vectors of syntax trees over the FFI e.g. Vec<slang::syntax::SyntaxTree>. But talking to the clanker, there might actually be a way around that which would result in a much more elegant FFI in general. I will have a go at it and will push some (maybe bigger) changes on top🤓

Copy link
Copy Markdown
Contributor

@fischeti fischeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the bender-slang FFI now which looks cleaner now imo. A new ParsedTree type now bundles the actual syntax tree as well as parsed_ok and encrypted flags that can be probed easily. I also bumped the bender-slang version again.

The only remaining issue I have with it now are the --drop-unparseable and --allow-broken flags. For me it is not entirely clear how they differ resp. what the actual use cases are.

Comment thread src/cmd/script.rs
Comment on lines +106 to +115
/// Drop files slang could not parse (default: keep them)
#[cfg(feature = "slang")]
#[arg(long, global = true, help_heading = "General Script Options")]
pub drop_unparseable: bool,

/// Tolerate slang parse errors in non-encrypted files instead of failing
#[cfg(feature = "slang")]
#[arg(long, global = true, help_heading = "General Script Options")]
pub allow_broken: bool,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between those two flags?

Comment thread src/cmd/script.rs

/// Tolerate slang parse errors in non-encrypted files instead of failing
#[cfg(feature = "slang")]
#[arg(long, global = true, help_heading = "General Script Options")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially, it could make sense to group the slang flags separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants