Support for 4bit quantization

### Feature request

It seems we now have support for loading models using 4bit quantization starting from bitsandbytes>=0.39.0 
Link: [FP4 Quantization](https://huggingface.co/docs/transformers/main_classes/quantization#fp4-quantization)

### Motivation

Running really large language models on smaller GPUs.

### Your contribution

Plan should be to upgrade the bitsandbytes package and provide a ENV variable to control the type of quantization method to be used while running the server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for 4bit quantization #449

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for 4bit quantization #449

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions