Skip to content

v0.6.0

Latest

Choose a tag to compare

@maugustosilva maugustosilva released this 04 May 20:24
· 244 commits to main since this release
41b6d9a

What's Changed

  • Full conversion to python, with a new CLI and a new declarative specification language for experiment description
    • Plugin architecture makes adding new stages to the life cycle fluent and scalable for future features.
    • User experience was enhanced with a much more meaningful logging and message display
    • Extensive health checking during and at the end of the deployment.
  • New standup method available: "Fast Model Actuator" (FMA)
    • Fast Model Actuation (FMA) is a Kubernetes-native system for efficiently managing LLM inference servers and reduces model startup latency from minutes to seconds. FMA uses two techniques: vLLM sleep/wake, where model instances move tensors from GPU to CPU memory — freeing accelerator resources while keeping the process alive for rapid wake-up and model swapping, where a persistent launcher process handles initialization upfront so instances can be swapped without full cold starts.
  • Significant improvements for perfomance data collection, including relevant changes on benchmark report
    • "Time-series" metrics on version 0.2 of the benchmark reports now include both statics summarization and link to raw collected data on csv format.
  • Tighter integration with Workload Variant Autoscaler (WVA), including the ability to deploy multiple models on the same namespace as defined within a scenario. In the same vein - allowing one or more stacks in the scenario to be deployed and torn down based on user preference.
  • Ability to provide different parameters for vllm process on different pods (by using LeaderWorkerSet (LWS) Kubernetes API).
    • Allow filling in stack details from a YAML file from harness pod.
    • Assorted corrections and robustness improvements.
  • The "capacity planner" and "configuration explorer" are now part of a new project: https://github.com/llm-d-incubation/llm-d-planner
  • Strongly enhanced development constructs including pre-commit and CICD that safe guard existing library patterns and functionality.

Regular Contributors to this release

New Contributors

What's Changed

Full Changelog: v0.5.0...v0.6.0