Skip to content

jgraef/llama-cpp-rs

Repository files navigation

Rust bindings for llama.cpp

The crate llama-cpp contains idiomatic Rust bindings for llama.cpp. It offers a low-level synchronous API and a high-level asynchronous API.

A simple command line interface that also serves as an example is included in llama-cpp-cli. Try running:

cargo run -- chat -m path/to/model.gguf

llama-cpp-sys contains the low-level FFI bindings to llama.cpp. It has the llama.cpp source code in a git submodule. The build script takes care of building, and linking to llama.cpp. It links statically to it and generates bindings using bindgen. Make sure to pull the submodule:

git submodule update --init -- llama-cpp-sys/llama.cpp/

Async Runtime

Both Tokio and async-std are supported. You choose which one is used by enabling one of the following features:

  • runtime-async-std
  • runtime-tokio

Features

  • Text Generation
  • Embedding
  • GPU support
  • LORA
  • grammar sampling
  • repetition penalties
  • beam search
  • classifier free guidance
  • logit bias
  • tokio runtime
  • async-std runtime
  • API server
  • Sliding context
  • Prompt templates

Example

Examples are located in examples/. They are standalone crates.

// load model asynchronously
let model = ModelLoader::load(model_path, Default::default())
    .wait_for_model()
    .await?;

// prompt
let prompt = "The capital of France is";
print!("{}", prompt);
stdout().flush()?;

// create an inference session.
let session = Session::from_model(model, Default::default());

// create a sequence and feed prompt to it.
let mut sequence = session.sequence();
sequence
    .push(Tokenize {
        text: prompt,
        add_bos: true,
        allow_special: false,
    })
    .await?;

// create a response stream from it
let stream = inference.stream::<String>(Default::default());
pin_mut!(stream);

// stream LLM output piece by piece
while let Some(piece) = stream.try_next().await? {
    print!("{piece}");
    stdout().flush()?;
}

About

llama.cpp bindings for Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published