llm-inference-solutions

A collection of all available inference solutions for the LLMs

Name	Org	Description
vllm	UC Berkeley	A high-throughput and memory-efficient inference and serving engine for LLMs
Text-Generation-Inference	Hugginface🤗	Large Language Model Text Generation Inference
llm-engine	ScaleAI	Scale LLM Engine public repository
DeepSpeed	Microsoft	DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective
OpenLLM	BentoML	Operating LLMs in production
LLMDeploy	InternLM Team	LMDeploy is a toolkit for compressing, deploying, and serving LLM
FlexFlow	CMU,Stanford,UCSD	A distributed deep learning framework.
CTranslate2	OpenNMT	Fast inference engine for Transformer models
Fastchat	lm-sys	An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Triton-Inference-Server	Nvidia	The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Lepton.AI	lepton.ai	A Pythonic framework to simplify AI service building
ScaleLLM	Vectorch	A high-performance inference system for large language models, designed for production environments
Lorax	Predibase	Serve 100s of Fine-Tuned LLMs in Production for the Cost of 1
TensorRT-LLM	Nvidia	TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines
mistral.rs	mistral.rs	Blazingly fast LLM inference.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback