Large Language Models (LLMs) Inference

Setting up LLM inference services within data centers and/or on-premise environments.

Large language models (LLMs) are a powerful tool with the potential to revolutionize a wide range of industries. However, deploying and managing LLMs on-premise can be a complex and challenging task. This repo provides ready to deploy configuration and python code for setting up of llm inference servers. This includes REST API and web interface to chat with llm models. The implementation is based on docker containers.

The focus is primarily on runtime inference, it does not cater for fine-tuning and training of llm. The model serving includes original model and/or quantized versions.

Three Tier Architecture - LLM Inference:

Three tier architecture for llm inference is used to perform on premise deployment. This architecture allows greater flexibility and agility. It is assumed that on premise hosting infrastructure is behind firewalls with no outbound connectivity to internet as part of security policies.

Backend llm inference server
Web application server
Front-end using web browser

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
01_inference_tgi_quantized		01_inference_tgi_quantized
02_tiny_inference_single_model		02_tiny_inference_single_model
03_serving_phi3-vision		03_serving_phi3-vision
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models (LLMs) Inference

Three Tier Architecture - LLM Inference:

About

Releases

Packages

Languages

License

hsarfraz/llm-Inference

Folders and files

Latest commit

History

Repository files navigation

Large Language Models (LLMs) Inference

Three Tier Architecture - LLM Inference:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages