Skip to content

fix: sort compatibility matrices for deterministic output#3006

Merged
markphelps merged 1 commit into
mainfrom
fix/compat-matrix-sorting
May 7, 2026
Merged

fix: sort compatibility matrices for deterministic output#3006
markphelps merged 1 commit into
mainfrom
fix/compat-matrix-sorting

Conversation

@markphelps
Copy link
Copy Markdown
Collaborator

Fixes the compatibility matrix update workflow opening PRs even when the underlying data has not changed.

Problem

The scheduled update-compatibility-matrices workflow was creating PRs on days when no new CUDA/PyTorch/TensorFlow versions were released. This happened because the generated JSON was not deterministic.

Root Cause

  • CUDA: already sorted correctly
  • PyTorch: non-deterministic because fetchCurrentTorchVersions iterates over a Go map with random iteration order
  • TensorFlow: non-deterministic because it iterates over HTML table rows in document order

Changes

  • Added stable sorting to FetchTorchCompatibilityMatrix
  • Added stable sorting to FetchTensorFlowCompatibilityMatrix
  • Added tests verifying deterministic sorting for both generators

This should make the workflow idempotent: if the upstream data has not changed, the generated JSON will be byte-for-byte identical and no PR will be opened.

The compatibility matrix update workflow was opening PRs even when
the underlying data hadn't changed because the output was not
idempotent.

- CUDA was already sorted correctly
- PyTorch was non-deterministic because fetchCurrentTorchVersions
  iterates over a Go map with random iteration order
- TensorFlow was non-deterministic because it iterates over HTML
  table rows in document order

Add stable sorting to both PyTorch and TensorFlow generators before
serializing the JSON output, plus tests verifying deterministic
sorting.

Fixes non-idempotent compatibility matrix PRs.
@markphelps markphelps requested a review from a team as a code owner May 7, 2026 14:35
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk Bot commented May 7, 2026

The tests pass and there are no linting issues. This is a well-crafted PR that:

  1. Fixes a real issue: Non-deterministic JSON output was causing unnecessary PRs from the compatibility matrix update workflow
  2. Root cause correctly identified:
    • PyTorch: Go map iteration order is random
    • TensorFlow: HTML table row order varies
  3. Solution is idiomatic: Uses slices.SortFunc with cmp.Or for clean multi-field sorting
  4. Edge cases handled: Nil CUDA pointers are properly converted to empty strings for comparison
  5. Tests verify correctness: Both sorting implementations have dedicated tests
  6. Follows conventions: Uses testify assertions (no raw if checks with t.Fatal)

LGTM

github run

Copy link
Copy Markdown
Contributor

@michaeldwan michaeldwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good fix

@markphelps markphelps merged commit de3af59 into main May 7, 2026
44 checks passed
@markphelps markphelps deleted the fix/compat-matrix-sorting branch May 7, 2026 16:08
pull Bot pushed a commit to Mu-L/cog that referenced this pull request May 7, 2026
…icate#3009)

torchGPUPackage picks the best match by CUDA version but never
compared torch versions when CUDA was equal -- so iteration order
determined which patch won. After replicate#3006 sorted the matrix ascending,
torch==2.3 started resolving to 2.3.0 instead of 2.3.1.

Break ties by preferring the higher torch patch version, consistent
with baseImageComponentNormalisation which already does this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants