Skip to content

v0.12.1

Choose a tag to compare

@tgaddair tgaddair released this 25 Nov 21:15
· 59 commits to main since this release
c0e5798

🎉 Enhancements

  • Add support for adapter loading in mllama by @ajtejankar in #669
  • Record number of skipped tokens in the response by @tgaddair in #681
  • Record TTFT and TPOT in response headers by @tgaddair in #684
  • Add cli arg --speculation-max-batch-size by @tgaddair in #686
  • Use --predibase-api-token parameter when downloading by @joseph-predibase in #687
  • Launcher args for compile max batch size and rank by @tgaddair in #690

🐛 Bugfixes

🔧 Maintenance

Full Changelog: v0.12.0...v0.12.1