Skip to content

A collection of all available inference solutions for the LLMs

License

Notifications You must be signed in to change notification settings

mani-kantap/llm-inference-solutions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

llm-inference-solutions

A collection of all available inference solutions for the LLMs

Name Org Description
vllm UC Berkeley A high-throughput and memory-efficient inference and serving engine for LLMs
Text-Generation-Inference Hugginface🤗 Large Language Model Text Generation Inference
llm-engine ScaleAI Scale LLM Engine public repository
DeepSpeed Microsoft DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective
OpenLLM BentoML Operating LLMs in production
LLMDeploy InternLM Team LMDeploy is a toolkit for compressing, deploying, and serving LLM
FlexFlow CMU,Stanford,UCSD A distributed deep learning framework.
CTranslate2 OpenNMT Fast inference engine for Transformer models
Fastchat lm-sys An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Triton-Inference-Server Nvidia The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Lepton.AI lepton.ai A Pythonic framework to simplify AI service building
ScaleLLM Vectorch A high-performance inference system for large language models, designed for production environments
Lorax Predibase Serve 100s of Fine-Tuned LLMs in Production for the Cost of 1
TensorRT-LLM Nvidia TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines
mistral.rs mistral.rs Blazingly fast LLM inference.

About

A collection of all available inference solutions for the LLMs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published