Skip to content

Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)

License

Notifications You must be signed in to change notification settings

llm-db/FineInfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FineInfer

| Paper |

FineInfer is a research prototype for fine-tuning and serving large language models.

FineInfer supports concurrent parameter-efficient fine-tuning and inference through the following features:

  • Deferred continuous batching
  • Hybrid system architecture
  • Heterogeneous batching

Get Started

Installation and examples

The current version removes some previous features and functionalities. If you need them, please download previous versions.

Citation

@inproceedings{FineInfer,
  author = {He, Yongjun and Lu, Yao and Alonso, Gustavo},
  title = {Deferred Continuous Batching in Resource-Efficient Large Language Model Serving},
  year = {2024},
  booktitle = {Proceedings of the 4th Workshop on Machine Learning and Systems},
  pages = {98–106},
  series = {EuroMLSys '24}
}

About

Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages