Skip to content

fix(#969): canonize after meet compression so --compress still dedups#970

Merged
yegor256 merged 1 commit into
masterfrom
claude/github-issue-969-gxz98y
Jul 3, 2026
Merged

fix(#969): canonize after meet compression so --compress still dedups#970
yegor256 merged 1 commit into
masterfrom
claude/github-issue-969-gxz98y

Conversation

@yegor256

@yegor256 yegor256 commented Jul 3, 2026

Copy link
Copy Markdown
Member

Closes #969.

Problem

When --compress and --canonize are combined for LaTeX sequence output
(e.g. dataize --output=latex --sequence --compress --canonize),
structurally identical sub-expressions across reduction steps were not
factored out into \phinoMeet/\phinoAgain — they were printed in full over
and over.

The cause was ordering: canonize ran before the meet compression. It
renames each step's λ-bindings to F1, F2, … numbering by traversal
position and restarting the counter for every expression. Because the
counter restarts per step and numbers by position, the same logical lambda
got a different Fn in different steps. The meet pass compares candidates by
structural equality, so Function "F3" and Function "F4" looked distinct and
the enclosing sub-expressions never matched — --canonize silently sabotaged
the deduplication --compress is meant to perform.

Fix

Canonization now runs after the meet compression, so the meet pass sees the
stable, original lambda names:

  • The _canonize flag is threaded through PrintContext/LatexContext and
    applied at render time rather than pre-applied to the chain in the runners
    (runRewrite/runDataize no longer canonize up front).
  • For the LaTeX sequence path, rewrittensToLatex feeds un-canonized
    expressions to meetInExpressions (via compressedRewrittens /
    compressedExpressions) and canonizes only its output
    (canonizedRewrittens / canonizedExpressions).
  • For the other output formats (no meet pass), printRewrittens canonizes just
    before rendering, preserving the previous behavior.
  • canonizeExpression gains ExPhiMeet/ExPhiAgain cases so lambdas inside a
    factored-out \phinoMeet{…} body still get renamed.

Tests

Added a dataize --output=latex --compress --canonize --sequence CLI test over
a whole-expression sequence that asserts a \phinoMeet is produced — the
scenario that previously collapsed only when --canonize was dropped. The
existing focused compress+canonize test continues to pass.

🤖 Generated with Claude Code

https://claude.ai/code/session_01JJxZDChgfupZEqUMbtSzHm


Generated by Claude Code

--canonize renames each step's λ-bindings to F1, F2, … numbering by
traversal position and restarting per expression, so the same logical
lambda got a different Fn in different steps. The meet pass compares
sub-expressions by structural equality, so those renamed lambdas never
matched and identical sub-expressions were never factored into
\phinoMeet/\phinoAgain — --canonize silently defeated --compress.

Canonization now runs after the meet compression instead of before it:
the meet pass sees the original (stable) lambda names, and only its
output is canonized. The _canonize flag is threaded through
PrintContext/LatexContext and applied at render time (inside
rewrittensToLatex after the meet for the LaTeX sequence path, and in
printRewrittens for the other formats) rather than pre-applied to the
chain in the runners. canonizeExpression also gains ExPhiMeet/ExPhiAgain
cases so lambdas inside a factored-out \phinoMeet body still get renamed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JJxZDChgfupZEqUMbtSzHm
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Benchmark comparison

Regression threshold: +20%. Same-runner A/B against the PR merge-base; the bench fixture is frozen across both runs.

Bench base (us) head (us) delta
parse/phi 126855.46 129545.97 +2.1%
parse/xmir 633916.01 638820.89 +0.8%
print/salty/multiline 957322.70 972558.76 +1.6%
print/sweet/flat 268446.09 275657.26 +2.7%
print/sweet/multiline 267677.85 299897.73 +12.0%
rewrite/normalize 59562.65 61431.03 +3.1%

OK: no case regressed above the +20% threshold.

@yegor256

yegor256 commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

@rultor merge

@rultor

rultor commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@rultor merge

@yegor256 OK, I'll try to merge now. You can check the progress of the merge here.

@rultor

rultor commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@rultor merge

@yegor256 Oops, I failed. You can see the full log here (spent 19min).

\u001b[?7l    with-phi.yaml [ ]\u001b[?7h
\u001b[K    with-phi.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    simple.yaml [ ]\u001b[?7h
\u001b[K    simple.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    with-package.yaml [ ]\u001b[?7h
\u001b[K    with-package.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    with-duplicate-attribute.yaml [ ]\u001b[?7h
\u001b[K    with-duplicate-attribute.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    base-dispatch.yaml [ ]\u001b[?7h
\u001b[K    base-dispatch.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    application-with-dispatch.yaml [ ]\u001b[?7h
\u001b[K    application-with-dispatch.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    formation.yaml [ ]\u001b[?7h
\u001b[K    formation.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    application.yaml [ ]\u001b[?7h
\u001b[K    application.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    reverse-dispatch.yaml [ ]\u001b[?7h
\u001b[K    reverse-dispatch.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    atoms.yaml [ ]\u001b[?7h
\u001b[K    atoms.yaml [\u001b[32m✔\u001b[0m]
  prohibit to convert to XMIR
\u001b[?7l    [[ ]] [ ]\u001b[?7h
\u001b[K    [[ ]] [\u001b[32m✔\u001b[0m]
\u001b[?7l    T [ ]\u001b[?7h
\u001b[K    T [\u001b[32m✔\u001b[0m]
\u001b[?7l    [[ x -> ? ]] [ ]\u001b[?7h
\u001b[K    [[ x -> ? ]] [\u001b[32m✔\u001b[0m]
\u001b[?7l    [[ ^ -> 5 ]] [ ]\u001b[?7h
\u001b[K    [[ ^ -> 5 ]] [\u001b[32m✔\u001b[0m]
\u001b[?7l    Q.x.y.z [ ]\u001b[?7h
\u001b[K    Q.x.y.z [\u001b[32m✔\u001b[0m]
\u001b[?7l    "Hello" [ ]\u001b[?7h
\u001b[K    "Hello" [\u001b[32m✔\u001b[0m]
\u001b[?7l    Q [ ]\u001b[?7h
\u001b[K    Q [\u001b[32m✔\u001b[0m]
\u001b[?7l    $ [ ]\u001b[?7h
\u001b[K    $ [\u001b[32m✔\u001b[0m]
  XMIR printing packs
\u001b[?7l    simple.yaml [ ]\u001b[?7h
\u001b[K    simple.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    application.yaml [ ]\u001b[?7h
\u001b[K    application.yaml [\u001b[32m✔\u001b[0m]
Yaml
  parses yaml rule
\u001b[?7l    simple.yaml [ ]\u001b[?7h
\u001b[K    simple.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    where-without-args.yaml [ ]\u001b[?7h
\u001b[K    where-without-args.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    special-attrs-in-condition.yaml [ ]\u001b[?7h
\u001b[K    special-attrs-in-condition.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    when.yaml [ ]\u001b[?7h
\u001b[K    when.yaml [\u001b[32m✔\u001b[0m]
  fails on yaml typos
\u001b[?7l    wrong-condition-name.yaml [ ]\u001b[?7h
\u001b[K    wrong-condition-name.yaml [\u001b[32m✔\u001b[0m]
\u001b[?7l    two-condition-objects.yaml [ ]\u001b[?7h
\u001b[K    two-condition-objects.yaml [\u001b[32m✔\u001b[0m]
  rejects a label that equals the name
\u001b[?7l    in a morphing rule [ ]\u001b[?7h
\u001b[K    in a morphing rule [\u001b[32m✔\u001b[0m]
\u001b[?7l    in a dataization rule [ ]\u001b[?7h
\u001b[K    in a dataization rule [\u001b[32m✔\u001b[0m]
\u001b[?7l    in a contextualization rule [ ]\u001b[?7h
\u001b[K    in a contextualization rule [\u001b[32m✔\u001b[0m]
  keeps effective labels unique across rule sets
\u001b[?7l    across morphing, dataization and contextualization rules [ ]\u001b[?7h
\u001b[K    across morphing, dataization and contextualization rules [\u001b[32m✔\u001b[0m]

Finished in 0.9639 seconds
\u001b[32m1084 examples, 0 failures\u001b[0m
\u001b[?25hTest suite spec: PASS
Test suite logged to:
/home/r/repo/./dist-newstyle/build/x86_64-linux/ghc-9.6.7/phino-0.0.0/t/spec/test/phino-0.0.0-spec.log
1 of 1 test suites (1 of 1 test cases) passed.
+ mv /home/r/repo .
+ '[' -n '' ']'
++ whoami
+ sudo chown -R ubuntu repo
+ cd repo
+ git push origin master
remote: error: GH013: Repository rule violations found for refs/heads/master.        
remote: Review all repository rules at https://github.com/objectionary/phino/rules?ref=refs%2Fheads%2Fmaster        
remote: 
remote: - Changes must be made through a pull request.        
remote: 
To github.com:objectionary/phino.git
 ! [remote rejected] master -> master (push declined due to repository rule violations)
error: failed to push some refs to 'github.com:objectionary/phino.git'
container ec56cf3e4cb7adf2c1a01b62f3fa1741d2b68c75be2ba6c6386c4d32ead49a9f is dead
Fri Jul  3 12:58:44 UTC 2026

@yegor256 yegor256 marked this pull request as ready for review July 3, 2026 13:34
Copilot AI review requested due to automatic review settings July 3, 2026 13:34
@yegor256 yegor256 merged commit f828645 into master Jul 3, 2026
25 checks passed
@yegor256 yegor256 deleted the claude/github-issue-969-gxz98y branch July 3, 2026 13:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reorders --canonize relative to LaTeX meet compression so --compress can still deduplicate structurally identical sub-expressions across sequence steps (producing \phinoMeet/\phinoAgain) even when canonization is enabled.

Changes:

  • Threaded a _canonize flag through PrintContextLatexContext and moved canonization to render-time, instead of canonizing the rewritten chain up-front in CLI runners.
  • In the LaTeX sequence pipeline, applied canonization only after meet compression (meetInExpressions) for both whole-expression and focused-expression sequences.
  • Extended the canonizer to traverse ExPhiMeet/ExPhiAgain and added a regression CLI test asserting \phinoMeet is produced for dataize --output=latex --sequence --compress --canonize.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/CLISpec.hs Adds regression coverage for --compress --canonize --sequence producing a \phinoMeet.
src/LaTeX.hs Implements “compress first, canonize after” for LaTeX sequence rendering and threads _canonize via LatexContext.
src/CLI/Types.hs Adds _canonize to PrintContext so printing decisions can be made at render time.
src/CLI/Runners.hs Stops canonizing rewritten chains up-front; passes _canonize through PrintContext.
src/CLI/Helpers.hs Canonizes right before rendering for non-LaTeX-sequence formats; passes _canonize into LatexContext.
src/Canonizer.hs Exports canonizeExpr, reuses it in canonize, and canonizes inside ExPhiMeet/ExPhiAgain wrappers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@0crat

0crat commented Jul 5, 2026

Copy link
Copy Markdown

@yegor256 Thanks for the contribution! You've earned +8 points for this: +16 as a basis; -8 for the lack of code review. Please, keep them coming. Your running score is +1707; don't forget to check your Zerocracy account too).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--canonize defeats --compress: identical subexpressions with different F-numbers are never merged into \phinoMeet

5 participants