Prerequisites
Feature Description
I need a cold start or readiness check to be as fast as possible. Hope there is a way to disable the warmup warming up the model with an empty run when starting the server.
Motivation
I am using llamafile with AWS Lambda behind the Lambda Web Adapter. The lazier I can be on init the better I can get a working instance running without hitting various 10s timeout issue.
Possible Implementation
Not sure, thinking a --warmup=false server option would be helpful?