Skip to content

Commit

Permalink
[CI/Build] Tweak Marlin Nondeterminism Issues (vllm-project#4713)
Browse files Browse the repository at this point in the history
  • Loading branch information
robertgshaw2-neuralmagic committed May 19, 2024
1 parent 18355a9 commit 64367a0
Showing 1 changed file with 3 additions and 5 deletions.
8 changes: 3 additions & 5 deletions tests/models/test_gptq_marlin.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
"""Compares the outputs of gptq vs gptq_marlin
Note: GPTQ and Marlin do not have bitwise correctness.
As a result, in this test, we just confirm that the top selected tokens of the
Marlin/GPTQ models are in the top 3 selections of each other.
Marlin/GPTQ models are in the top 5 selections of each other.
Note: Marlin internally uses locks to synchronize the threads. This can
result in very slight nondeterminism for Marlin. As a result, we re-run the test
up to 3 times to see if we pass.
Note: This test currently fails running with --forked with the following:
RuntimeError: Cannot re-initialize CUDA in forked subprocess.
To use CUDA with multiprocessing, you must use the 'spawn' start method
Run `pytest tests/models/test_gptq_marlin.py`.
"""
import os
Expand Down Expand Up @@ -49,7 +47,7 @@
]


@pytest.mark.flaky(reruns=2)
@pytest.mark.flaky(reruns=3)
@pytest.mark.skipif(gptq_marlin_not_supported,
reason="gptq_marlin is not supported on this GPU type.")
@pytest.mark.parametrize("model", MODELS)
Expand Down

0 comments on commit 64367a0

Please sign in to comment.