Problem Description
For W4A16, lamabada_openai accuracy is 0 but piqa has value
For AutoScheme mix-precision, we got an error AttributeError: 'Autotuner' object has no attribute '_cache_lock'
Reproduction Steps
To produce:
For W4A16:
def test_auto_scheme_export(self):
model_name = get_model_path("facebook/opt-125m")
ar = AutoRound(model=model_name, scheme="W4A16", iters=0, disable_opt_rtn=True)
ar.quantize_and_save(self.save_dir)
model_args = f"pretrained={self.save_dir}"
task_name = "lambada_openai"
# task_name = "piqa"
result = simple_evaluate(model="hf", model_args=model_args, tasks=task_name, batch_size="auto")
print(result["results"][task_name]["acc,none"])
assert result["results"][task_name]["acc,none"] > 0.25
shutil.rmtree(self.save_dir, ignore_errors=True)
For AutoScheme, requires fix: 55a2797
def test_auto_scheme_export(self):
model_name = get_model_path("facebook/opt-125m")
scheme = AutoScheme(avg_bits=3, options=("W2A16", "W4A16", "W8A16", "BF16"))
ar = AutoRound(model=model_name, scheme=scheme, iters=0, disable_opt_rtn=True)
ar.quantize_and_save(self.save_dir)
model_args = f"pretrained={self.save_dir}"
result = simple_evaluate(model="hf", model_args=model_args, tasks="lambada_openai", batch_size="auto")
print(result["results"]["lambada_openai"]["acc,none"])
assert result["results"]["lambada_openai"]["acc,none"] > 0.25
shutil.rmtree(self.save_dir, ignore_errors=True)
Environment Information
No response
Error Logs
AutoTuner error log for AutoScheme
../auto_round_extension/triton/qlinear_tritonv2_zp.py:182: in forward 02:05:05 [60/24492]
out = quant_linear_fn.apply(
/home/xinhe/.local/lib/python3.12/site-packages/torch/autograd/function.py:581: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
../auto_round_extension/triton/triton_utils_zp/dequant.py:172: in forward
output = quant_matmul_248(input, qweight, scales, qzeros, g_idx, bits, maxq)
../auto_round_extension/triton/triton_utils_zp/dequant.py:161: in quant_matmul_248
W = dequant248(qweight, scales, qzeros, g_idx, bits, maxq=maxq, input_dtype=input_dtype)
../auto_round_extension/triton/triton_utils_zp/dequant.py:154: in dequant248
return dequant248_core(qweight, scales, qzeros, g_idx, bits, maxq=maxq, input_dtype=input_dtype)
../auto_round_extension/triton/triton_utils_zp/dequant.py:132: in dequant248_core
dequant_kernel_248[grid](
/home/xinhe/.local/lib/python3.12/site-packages/triton/runtime/jit.py:419: in <lambda>
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
/home/xinhe/.local/lib/python3.12/site-packages/gptqmodel/utils/nogil_patcher.py:224: in patched_run
config, used_cached_result, bench_time = _get_config_for_key(self, key, args, kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <triton.runtime.autotuner.Autotuner object at 0x7a1258890b30>
key = (589824, 'torch.int32', 'torch.float16', 'torch.int32', 'torch.int32', 'torch.float16')
args = (tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
dtype=torch.float16), 589824)
kwargs = {'bits': 2, 'grid': <function dequant248_core.<locals>.<lambda> at 0x7a11b03a5ee0>, 'maxq': 3, 'num_gro
ups': 6, ...}
def _get_config_for_key(self, key, args, kwargs):
> with self._cache_lock:
E AttributeError: 'Autotuner' object has no attribute '_cache_lock'
/home/xinhe/.local/lib/python3.12/site-packages/gptqmodel/utils/nogil_patcher.py:149: AttributeError
Additional Context
No response
Problem Description
For W4A16, lamabada_openai accuracy is 0 but piqa has value
For AutoScheme mix-precision, we got an error
AttributeError: 'Autotuner' object has no attribute '_cache_lock'Reproduction Steps
To produce:
For W4A16:
For AutoScheme, requires fix: 55a2797
Environment Information
No response
Error Logs
Additional Context
No response