GPTQ quantization not working

Running ``quantize.py`` with ``--mode int4-gptq`` does not seem to work:

- code tries to import ``lm-evaluation-harness`` which is not included/documented/used
- import in ``eval.py`` is incorrect, should probably be ``from model import Transformer as LLaMA `` instead of ``from model import LLaMA``
- after fixing two above issues, next one is a circular import
- after fixing that, ``import lm_eval`` should be replaced with ``import lm_eval.base``
- there is one other circular import
- there are a few other missing imports from lm_eval
- and a few other errors

Overall here are the fixes I had to apply to make it run: https://github.com/lopuhin/gpt-fast/commit/86d990bfbce46d10169c8e21e3bfec5cbd203b96

Based on this, could you please check if the right version of the code was included for GPTQ quantization?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GPTQ quantization not working #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

GPTQ quantization not working #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions