- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.5k
 
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
 - I carefully followed the README.md.
 - I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
 - I reviewed the Discussions, and have a new and useful enhancement to share.
 
Feature Description
During server startup / model loading, a web client that tries to access / or a .html page receives the model loading json error message. It would be more desirable to instead, and only for obvious web page requests, send a loading page that refreshes itself periodically until the model is done loading.
Motivation
Georgi (and team?)-
Thank you for the wonderful work you've done with llama-cpp. I'm using it, and have proposed this FR to make it easy for regular users to run LLMs on their Windows 11 Pro machines without needing any technical chops. My software starts a Ubuntu VM containing llama.cpp, then connects to it from the user's web browser. Usually, the model is still loading when the browser opens, so I want them to see that it's working and automatically reload the page when ready, not see an error and make them reload the page.
In short, some systems should look pretty to everyday users even when they're not ready.
-Brad
Brad Hutchings
brad@DemoMachine.net
Possible Implementation
- Create a 
loading.htmlembedded static file. - Add a 
loading.htmlembedded static file. - Add logic to the 
auto middleware_server_stateassignment to handle web page cases during loading, internally redirecting the request toloading.html. 
1. Create a loading.html embedded static file.
<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="refresh" content="5">
    </head>
    <body>
        <div id="loading">
            The model is loading. Please wait.<br/>
            The user interface will appear soon.
        </div>
    </body>
</html>
This can be overriden by a loading.html file in the static files path for custom UI.
2. Add a loading.html embedded static file.
Around line 3344 of server.cpp (as of 2024-08-24):
    svr->Get("/index-new.html",        handle_static_file(index_new_html, index_new_html_len, "text/html; charset=utf-8"));
    svr->Get("/loading.html",             handle_static_file(loading_html, loading_len, "text/html; charset=utf-8"));
It looks like index_new_html comes from an autogenerated file, so you'll have to track that down too. I used the static files path for my location.html file rather than fully implementing a built-in file.
3. Add logic to the auto middleware_server_state assignment...
Around line 2670 of server.cpp (as of 2024-08-24):
    auto middleware_server_state = [&res_error, &state](const httplib::Request & req, httplib::Response & res) {
        server_state current_state = state.load();
        if (current_state == SERVER_STATE_LOADING_MODEL) {
            httplib::Request & modified_req = (httplib::Request &) req;
            const char* path_c = modified_req.path.c_str();
            int path_c_len = strlen(path_c);
            char last_five[6];
            strcpy(last_five, path_c + (path_c_len -5));
            if ((strcmp(path_c, "/") == 0) || (strcmp(last_five, ".html") == 0)) {
                modified_req.path = "/loading.html";
            }
            else {
                res_error(res, format_error_response("Loading model", ERROR_TYPE_UNAVAILABLE));
                return false;
            }
        }
        return true;
    };
On compile, you'll get a warning about the de-const-ing req.
The intention is to check the req.path and change it to "loading.html" if it's the root path or the path ends in ".html". My C++ is rusty. Apologies for that.