Skip to content

Conversation

@magicse
Copy link
Contributor

@magicse magicse commented Feb 7, 2025

ref #11605, ggml-org/llama.vim#35

For correct processing ctrl+c signal

Make sure to read the contributing guidelines before submitting a PR

For correct processing ctrl+c signal
@ngxson
Copy link
Collaborator

ngxson commented Feb 7, 2025

Can you explain why this change is needed and which case does it fix?

@ngxson
Copy link
Collaborator

ngxson commented Feb 7, 2025

What you explained is not related to this change. I don't see why ctrl+c handler is related to that

@magicse
Copy link
Contributor Author

magicse commented Feb 7, 2025

Wow, sorry about that! I made two changes.

Regarding these changes:
We attempted to register the Ctrl+C event after ctx_server.queue_tasks.start_loop(). As a result, the user was no longer able to interrupt the process.

To allow the user to use Ctrl+C at any time, I think we need to register the event earlier in the process.

@ngxson
Copy link
Collaborator

ngxson commented Feb 7, 2025

Wow, sorry about that! I made two changes.

That's why you should have a proper title and description for each PR.

@ngxson
Copy link
Collaborator

ngxson commented Feb 7, 2025

This introduce another bug, now I can no longer interrupt model loading.

@ngxson ngxson changed the title Update server.cpp llama : fix ctrl+c handler Feb 7, 2025
@ggerganov
Copy link
Member

ref #11605

@magicse
Copy link
Contributor Author

magicse commented Feb 7, 2025

I'll double check everything today. Could you please clarify what operating system you use?

@ngxson
Copy link
Collaborator

ngxson commented Feb 7, 2025

I'm using MacOS. It should be the same on all platforms. I think you moved the handler to before model loading.

@magicse
Copy link
Contributor Author

magicse commented Feb 9, 2025

Hi @ngxson @ggerganov
I make code with more messaging for testing, and test it in Windows (mingw build) and Linux Ubuntu.

std::function<void(int)> shutdown_handler;
std::atomic_flag is_terminating = ATOMIC_FLAG_INIT;

inline void signal_handler(int signum) {
    fprintf(stderr, "Signal handler invoked with signal: %d\n", signum);
    if (is_terminating.test_and_set()) {
        // in case it hangs, we can force terminate the server by hitting Ctrl+C twice
        // this is for better developer experience, we can remove when the server is stable enough
        fprintf(stderr, "Received second signal_handler interrupt, terminating immediately.\n");
        exit(1);
    }
    // First press detected, initiate shutdown
    fprintf(stderr, "Received first signal_handler Shutdown initiated, signal: %d\n", signum);
    shutdown_handler(signum);
}

int main(int argc, char ** argv) {
    // own arguments required by this example
    common_params params;

    if (!common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SERVER)) {
        return 1;
    }

    common_init();

    // struct that contains llama context and inference
    server_context ctx_server;


    shutdown_handler = [&](int signum) {
	std::cout << "Shutdown handler invoked with signal: " << signum << std::endl;
        ctx_server.queue_tasks.terminate();
    };


	#if defined (__unix__) || (defined (__APPLE__) && defined (__MACH__))
	    struct sigaction sigint_action;
	    sigint_action.sa_handler = signal_handler;
	    sigemptyset (&sigint_action.sa_mask);
	    sigint_action.sa_flags = 0;
	    sigaction(SIGINT, &sigint_action, NULL);
	    sigaction(SIGTERM, &sigint_action, NULL);
	#elif defined (_WIN32)
        auto console_ctrl_handler = +[](DWORD ctrl_type) -> BOOL {
            std::cout << "\nCtrl+C detected, triggering signal handler...\n";
            if (ctrl_type == CTRL_C_EVENT) {
                signal_handler(SIGINT); 
                return true;
            }
            return false;
        };

	if (!SetConsoleCtrlHandler(reinterpret_cast<PHANDLER_ROUTINE>(console_ctrl_handler), true)) {
		std::cerr << "Failed to set control handler\n";
		return 1;
	}
	#endif

(twice pressing ctrl+c) Linux Ubuntu log_1 of ctrl+c (before server initialsation) - processed ok

llama_init_from_model: n_ctx_pre_seq (4096) > n_ctx_train (1024) -- possible training context overflow
llama_kv_cache_init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 12, can_shift = 1
^CSignal handler invoked with signal: 2
Received first signal_handler Shutdown initiated, signal: 2
Shutdown handler invoked with signal: 2
^CSignal handler invoked with signal: 2
Received second signal_handler interrupt, terminating immediately.
magic@Ubuntu2204:~/llama.cpp_31_01_2025/llama.cpp/Build/bin$ 

(once pressing ctrl+c) Linux Ubuntu log_2 of ctrl+c (after server initialsation) - processed ok

<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
^CSignal handler invoked with signal: 2
Received first signal_handler Shutdown initiated, signal: 2
Shutdown handler invoked with signal: 2
srv  update_slots: all slots are idle
magic@Ubuntu2204:~/llama.cpp_31_01_2025/llama.cpp/Build/bin$ 

(twice pressing ctrl+c) Windows log_1 of ctrl+c (before server initialsation) - processed ok

common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)

Ctrl+C detected, triggering signal handler...
Signal handler invoked with signal: 2
Received first signal_handler Shutdown initiated, signal: 2
Shutdown handler invoked with signal: 2

Ctrl+C detected, triggering signal handler...
Signal handler invoked with signal: 2
Received second signal_handler interrupt, terminating immediately.

Z:\GPT_Local\llama.cpp_31_01_2025\llama.cpp\Build\bin>

(once pressing ctrl+c) Windows log_2 of ctrl+c (after server initialsation) - processed ok

<|im_start|>assistant
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle

Ctrl+C detected, triggering signal handler...
Signal handler invoked with signal: 2
Received first signal_handler Shutdown initiated, signal: 2
Shutdown handler invoked with signal: 2
srv  update_slots: all slots are idle

Z:\GPT_Local\llama.cpp_31_01_2025\llama.cpp\Build\bin>

@ngxson
Copy link
Collaborator

ngxson commented Feb 9, 2025

Linux Ubuntu log_1 of ctrl+c (before server initialsation) - processed ok

did you ctrl+C twice?

@magicse
Copy link
Contributor Author

magicse commented Feb 9, 2025

There are both - Log_1 - twice, Log_2 - once

@ngxson
Copy link
Collaborator

ngxson commented Feb 9, 2025

Same for windows, you ctrl+C twice and this is not expected.

Server should exit after you hit ctrl+C just once. Inside container environment, terminal signal is only delivered once.

@magicse
Copy link
Contributor Author

magicse commented Feb 9, 2025

on Windows - if we try register ctl+c event after ctx_server.queue_tasks.start_loop(); it didnt work and didnt proces ctrl+c events.
The registration should typically happen, before the server starts listening for requests.

@magicse
Copy link
Contributor Author

magicse commented Feb 9, 2025

also tested in docker

magic@Ubuntu2204:~/llama.cpp_31_01_2025/llama.cpp/Build/bin$ sudo docker run -v /home/magic/models:/models -p 8000:8080 llama-server /app/llama-server -m /models/small-llama2-Q8_0.gguf

By hit ctrl+C just once in docker

<|im_start|>assistant
'
main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
^CShutdown handler invoked with signal: 2
Signal handler invoked with signal: 2
Received first signal_handler Shutdown initiated, signal: 2
srv  update_slots: all slots are idle
magic@Ubuntu2204:~/llama.cpp_31_01_2025/llama.cpp/Build/bin$ 

magic@Ubuntu2204:~/llama.cpp_31_01_2025/llama.cpp/Build/bin$ sudo docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
magic@Ubuntu2204:~/llama.cpp_31_01_2025/llama.cpp/Build/bin$ 

// struct that contains llama context and inference
server_context ctx_server;

shutdown_handler = [&](int) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why this whole code block must be this higher up. Can't it be move to just right before ctx_server.queue_tasks.start_loop(); ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ngxson , I think yes.

@ngxson
Copy link
Collaborator

ngxson commented Feb 10, 2025

I'm closing as #11795 is already addressed this issue (also the case where SIGTERM arrives when server is busy processing an on-going request)

@ngxson ngxson closed this Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants