Feature Request: Direct way to check the status of the abort mechanism.

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Currently, checking if the backend aborted is only possible as far as I can tell via:
```cpp
if (llama_decode(ctx, batch) == 2) {//....
```
This is very hacky, as it means the only way to check the abort status requires an active batch, which if we aborted inference, has possibly been discarded already. The way that abort currently functions as well, means that the backend graph will only ever attempt to abort when decode is called. 

My proposal is to decouple the concept of decoding and status checking if possible allowing:
```cpp
if(llama_check_backend_abort_status(ctx) == 2) { //...
```

This would have to prod backends into checking their compute status and then reporting it, but not in a way tied to the decode and allows us to do this without a real batch.

### Motivation

Aborting work is common due to a variety of circumstances in server development. A server does not want to add work to its processing queue using continuous batching if adjacent work is attempting to abort the decode, which will fail the entire continuous batch decode step. Llama decode as the only possible way of checking the backend status is likely to cause fragility in future developments. Because of the interaction between continuous batching and adding in new work *after* the abort was signaled, we can run into a batch decode error for new work, this means if we'd like to abort work, we have to trap for the abort signal, and then run fake decodes until it returns status code 2. There's a multitude of potential edge cases that arise as a result of this that become difficult to deal with, as this is a source of potentially deadlocking server threads without clear ways to know if it's possible or allowed to resume. 

### Possible Implementation

It's necessary to modify the backend interface to include a separate path to prod the status that doesn't actually involve computing anything. A cursory glance means a modification to all backends and an addition to the backend iface. Having a status for no processing or similar would also be very welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Direct way to check the status of the abort mechanism. #12525

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Direct way to check the status of the abort mechanism. #12525

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions