Skip to content

Support for Loading GGUF Quantized Model in Python Pipeline #109

@NiloufarAb

Description

@NiloufarAb

Hello, and thanks for this excellent project!

I am currently using the Llama3-8B-1.58-100B-tokens quantized model (ggml-model-i2_s.gguf) from the BitNet repository. The model performs well during inferencing, but I am having difficulty loading the GGUF file format directly in my Python chatbot pipeline to interact with CCV files.

I have tried using llama.cpp and the transformers library, but both approaches resulted in compatibility issues due to the GGUF file format.

What I’ve Tried:
Tested inferencing with BitNet, which worked as expected.
Attempted to load the GGUF file using llama.cpp and transformers, but encountered incompatibilities.
Searched through the documentation and available issues but couldn’t find a solution or official guidance for loading GGUF models in custom Python pipelines.

Request: Could you please provide guidance on how to load the GGUF model directly in Python? Alternatively, is there any internal BitNet function or script that supports GGUF model loading for integration into Python-based chatbot pipelines?

Thank you for your help, and I appreciate any advice or guidance you can provide!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions