Skip to content

Support for 4bit quantization #449

@rahuldshetty

Description

@rahuldshetty

Feature request

It seems we now have support for loading models using 4bit quantization starting from bitsandbytes>=0.39.0
Link: FP4 Quantization

Motivation

Running really large language models on smaller GPUs.

Your contribution

Plan should be to upgrade the bitsandbytes package and provide a ENV variable to control the type of quantization method to be used while running the server.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions