-
Notifications
You must be signed in to change notification settings - Fork 565
Open
Labels
testingTesting and coverage related issues.Testing and coverage related issues.xla:tpuTPU specific issues and PRsTPU specific issues and PRs
Description
We should have a test that trains a very simple model with a pallas kernel across two slices of TPUv4 and checks that it doesn't hang.
Currently our pre-submit CI only runs things on 1 slice of TPUv4 and that doesn't cover cases like multi-slice training.
Post-submit CI requires human diligence to monitor and revert changes, which has proven to be ineffective. As long as we can afford it, we should test things in pre-submit and not post-submit.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
testingTesting and coverage related issues.Testing and coverage related issues.xla:tpuTPU specific issues and PRsTPU specific issues and PRs