PPL LLM Serving

Overview

ppl.llm.serving is a part of PPL.LLM system.

We recommend users who are new to this project to read the Overview of system.

ppl.llm.serving is a serving based on ppl.nn.llm for various Large Language Models(LLMs). This repository contains a server based on gRPC and inference support for LLaMA.

Prerequisites

Linux running on x86_64 or arm64 CPUs
GCC >= 9.4.0
CMake >= 3.18
Git >= 2.7.0
CUDA Toolkit >= 11.4. 11.6 recommended. (for CUDA)

Quick Start

Here is a brief tutorial, refer to LLaMA Guide for more details.

Installing Prerequisites(on Debian or Ubuntu for example)
```
apt-get install build-essential cmake git
```

Cloning Source Code

git clone https://github.com/openppl-public/ppl.llm.serving.git

Building from Source

./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87'" -DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87'"

NCCL is required if multiple GPU devices are used.

Exporting Models

Refer to ppl.pmx for details.
Running Server
```
./ppl-build/ppl_llama_server /path/to/server/config.json
```
Server config examples can be found in src/models/llama/conf. You are expected to give the correct values before running the server.
- model_dir: path of models exported by ppl.pmx.
- model_param_path: params of models. $model_dir/params.json.
- tokenizer_path: tokenizer files for sentencepiece.
Running client: send request through gRPC to query the model
```
./ppl-build/client_sample 127.0.0.1:23333
```
See tools/client_sample.cc for more details.

Benchmarking

./ppl-build/client_qps_measure 127.0.0.1:23333 /path/to/tokenizer/path tools/samples_1024.json

See tools/client_qps_measure.cc for more details.

Running inference offline:
```
./ppl-build/offline_inference /path/to/server/config.json
```
See tools/offline_inference.cc for more details.

License

This project is distributed under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
cmake		cmake
docs		docs
src		src
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ISSUE_TEMPLATE

.github/ISSUE_TEMPLATE

cmake

cmake

docs

docs

src

src

tools

tools

.clang-format

.clang-format

.gitignore

.gitignore

CMakeLists.txt

CMakeLists.txt

LICENSE

LICENSE

README.md

README.md

build.sh

build.sh

Repository files navigation

PPL LLM Serving

Overview

Prerequisites

Quick Start

License

About

Releases

Packages

Languages

License

open-lm/ppl.llm.serving

Folders and files

Latest commit

History

Repository files navigation

PPL LLM Serving

Overview

Prerequisites

Quick Start

License

About

Resources

License

Stars

Watchers

Forks

Languages