VLLM Client

A Rust client library for vLLM inference engine with OpenAI-compatible API.

Design Philosophy

Python Compatible: API style aligns with openai-python, reducing migration cost
Flexibility First: Both input and output support serde_json::Value, maximizing flexibility
Minimal Abstraction: No over-encapsulation, let users work directly with JSON
Convenience Helpers: Provide parsing helper methods, but don't force their use

Features

✅ Chat Completions API (/v1/chat/completions)
✅ Legacy Completions API (/v1/completions)
✅ Streaming Response (SSE)
✅ Tool Calling (Function Calling)
✅ Multimodal Support (Image Input)
✅ Thinking Mode (vLLM Reasoning Models Extension)

Installation

[dependencies]
vllm-client = "0.1"

Quick Start

Simple Chat

use vllm_client::*;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = VllmClient::new("http://localhost:8000/v1");

    let response = client.chat.completions.create()
        .model("Qwen/Qwen2.5-72B-Instruct")
        .messages(json!([
            {"role": "user", "content": "Hello, tell me about yourself"}
        ]))
        .temperature(0.7)
        .max_tokens(512)
        .send()
        .await?;

    println!("{}", response.content.unwrap());
    Ok(())
}

Streaming

let mut stream = client.chat.completions.create()
    .model("Qwen/Qwen2.5-72B-Instruct")
    .messages(json!([{"role": "user", "content": "Write a poem"}]))
    .stream(true)
    .send_stream()
    .await?;

while let Some(event) = stream.next().await {
    if let StreamEvent::Content(delta) = event {
        print!("{}", delta);
    }
}

Tool Calling

let response = client.chat.completions.create()
    .model("Qwen/Qwen2.5-72B-Instruct")
    .messages(json!([{"role": "user", "content": "What's the weather in Beijing?"}]))
    .tools(json!([
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather information",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string"}
                    },
                    "required": ["city"]
                }
            }
        }
    ]))
    .send()
    .await?;

if response.has_tool_calls() {
    for call in &response.tool_calls {
        let args: serde_json::Value = call.parse_args()?;
        let result = execute_tool(&call.name, args);

        // Construct tool result message
        let tool_message = call.result(json!({"temp": 25}));
    }
}

API Style

// Chain calls aligned with openai-python
client.chat.completions.create()
    .model("model-name")
    .messages(json!([...]))
    .temperature(0.7)
    .max_tokens(1024)
    .tools(json!([...]))
    .stream(true)
    .send()
    .await?

Documentation

License

MIT OR Apache-2.0

VLLM Client

一个 Rust 客户端库，用于对接 vLLM 推理引擎的 OpenAI 兼容 API。

设计理念

Python 兼容：API 风格对齐 openai-python，降低迁移成本
灵活优先：输入输出均支持 serde_json::Value，最大化灵活性
最小抽象：不做过度封装，让用户直接操作 JSON
便捷辅助：提供解析辅助方法，但不强制使用

特性

✅ Chat Completions API (/v1/chat/completions)
✅ Legacy Completions API (/v1/completions)
✅ 流式响应 (SSE)
✅ 工具调用 (Function Calling)
✅ 多模态支持（图像输入）
✅ 思考模式（vLLM 推理模型扩展）

安装

[dependencies]
vllm-client = "0.1"

快速开始

简单对话

use vllm_client::*;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = VllmClient::new("http://localhost:8000/v1");

    let response = client.chat.completions.create()
        .model("Qwen/Qwen2.5-72B-Instruct")
        .messages(json!([
            {"role": "user", "content": "你好，介绍一下自己"}
        ]))
        .temperature(0.7)
        .max_tokens(512)
        .send()
        .await?;

    println!("{}", response.content.unwrap());
    Ok(())
}

流式输出

let mut stream = client.chat.completions.create()
    .model("Qwen/Qwen2.5-72B-Instruct")
    .messages(json!([{"role": "user", "content": "写一首诗"}]))
    .stream(true)
    .send_stream()
    .await?;

while let Some(event) = stream.next().await {
    if let StreamEvent::Content(delta) = event {
        print!("{}", delta);
    }
}

工具调用

let response = client.chat.completions.create()
    .model("Qwen/Qwen2.5-72B-Instruct")
    .messages(json!([{"role": "user", "content": "北京天气？"}]))
    .tools(json!([
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "获取天气",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string"}
                    },
                    "required": ["city"]
                }
            }
        }
    ]))
    .send()
    .await?;

if response.has_tool_calls() {
    for call in &response.tool_calls {
        let args: serde_json::Value = call.parse_args()?;
        let result = execute_tool(&call.name, args);

        // 构造工具结果消息
        let tool_message = call.result(json!({"temp": 25}));
    }
}

API 风格

// 对齐 openai-python 的链式调用
client.chat.completions.create()
    .model("model-name")
    .messages(json!([...]))
    .temperature(0.7)
    .max_tokens(1024)
    .tools(json!([...]))
    .stream(true)
    .send()
    .await?

文档

License

MIT OR Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
plan.md		plan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLLM Client

Design Philosophy

Features

Installation

Quick Start

Simple Chat

Streaming

Tool Calling

API Style

Documentation

License

VLLM Client

设计理念

特性

安装

快速开始

简单对话

流式输出

工具调用

API 风格

文档

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VLLM Client

Design Philosophy

Features

Installation

Quick Start

Simple Chat

Streaming

Tool Calling

API Style

Documentation

License

VLLM Client

设计理念

特性

安装

快速开始

简单对话

流式输出

工具调用

API 风格

文档

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages