Skip to content

## Fix GPTQ producing invalid QArray when subchannel quantization not specified#198

Merged
copybara-service[bot] merged 1 commit intomainfrom
test_859382529
Jan 22, 2026
Merged

## Fix GPTQ producing invalid QArray when subchannel quantization not specified#198
copybara-service[bot] merged 1 commit intomainfrom
test_859382529

Conversation

@copybara-service
Copy link
Copy Markdown

@copybara-service copybara-service Bot commented Jan 22, 2026

Fix GPTQ producing invalid QArray when subchannel quantization not specified

Summary

Fixes a bug where gptq_core.quantize_weight creates inadvertent subchannel quantization that can produce invalid QArrays, even when the user did not request subchannel quantization.

Problem

When subchannel quantization is not specified (tiled_axes is empty), quantize_weight defaults the groupsize to rows:

groupsize = how.tiled_axes.get(1, rows)

This causes scales to be computed every rows columns, creating ceil(columns/rows) scale groups. When columns is not evenly divisible by rows, this produces an invalid QArray because the resulting scale shape violates the QArray contract (all(qvalue_dim % scale_dim == 0 for each axis))

When subchannel quantization is not specified, the scale should have shape (rows, 1) (per-channel quantization), not (rows, num_groups) (sub-channel).

Solution

This change modifies the default groupsize from rows to columns:

groupsize = how.tiled_axes.get(1, columns)

This ensures:

  • If subchannel is specified: use the user's groupsize
  • If subchannel is not specified: use columns as groupsize, producing per-channel quantization with scale.shape = (rows, 1)

@copybara-service copybara-service Bot changed the title ## Fixes QArray identity reshape failure with subchannel quantization ## Fix GPTQ producing invalid QArray when subchannel quantization not specified Jan 22, 2026
… specified

### Summary

Fixes a bug where `gptq_core.quantize_weight` creates inadvertent subchannel quantization that can produce invalid QArrays, even when the user did not request subchannel quantization.

### Problem

When subchannel quantization is not specified (`tiled_axes` is empty), `quantize_weight` defaults the groupsize to `rows`:

```python
groupsize = how.tiled_axes.get(1, rows)
```

This causes scales to be computed every `rows` columns, creating `ceil(columns/rows)` scale groups. When `columns` is not evenly divisible by `rows`, this produces an invalid QArray because the resulting scale shape violates the QArray contract (`all(qvalue_dim % scale_dim == 0 for each axis)`)

When subchannel quantization is not specified, the scale should have shape `(rows, 1)` (per-channel quantization), not `(rows, num_groups)` (sub-channel).

### Solution

This change modifies the default groupsize from `rows` to `columns`:

```python
groupsize = how.tiled_axes.get(1, columns)
```

This ensures:
- If subchannel is specified: use the user's groupsize
- If subchannel is not specified: use `columns` as groupsize, producing per-channel quantization with `scale.shape = (rows, 1)`

PiperOrigin-RevId: 859455381
@copybara-service copybara-service Bot merged commit 0684523 into main Jan 22, 2026
@copybara-service copybara-service Bot deleted the test_859382529 branch January 22, 2026 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant