Refactor requests and responses APIs

Copied from #134, written by @frapschen 

The current implementation process of /v1/chat/completions and /v1/completions APIs:
```mermaid
flowchart TB
    rr@{ shape: circle, label: "Start" } --> ra["/v1/chat/completions"]
    rr@{ shape: circle, label: "Start" } --> rb["/v1/completions"]
    ra -->|s.HandleChatCompletions| A
    rb -->|s.HandleTextCompletions| A
    subgraph request-process
        A[handleCompletions] --> B{shouldInjectFailure}
        B -->|Yes| C[Return RandomFailure]
        B -->|No| E[parse raw request to a vllmReq]
        E --> F[validate vllmReq Request]
        F --> G[build CompletionReqCtx]
    end

    G -->|push to queue| H
    subgraph queue
        H@{ shape: das, label: "CompletionReqCtx channel" }
    end

    H -->|pop from queue| qb[reqCtx]
    subgraph reqProcessingWorker
        qb --> qc[mock metrics]
        qc --> qd[handle KVCache]
        qd --> qe[handel ToolCalls]
        qe --> qf{IsStream}
        qf -->|Yes| qg[return StreamingResponse]
        qg --> qj[build and write response chunk]
        qj --> ql[responseSentCallback]
        qf -->|No| qh[return Response]
        qh --> qk[build and write response]
        qk --> ql
        ql --> qm@{ shape: dbl-circ, label: "Stop" }
    end
```

Throughout the entire process, the `CompletionReqCtx` queue connects these two parts.

### request-process

In this flow, the main tasks performed are:
- Parsing the raw request body into a Go structize request
- Validating the structure of the request body
- Constructing a `CompletionReqCtx` and sending it to the queue

Currently, this process has high coupling, making it difficult to add new APIs. The following refactoring is required:

- Rename the `CompletionRequest` interface to `Request`, generalizing it to represent all potential future OpenAI requests. Add the following methods to this interface: 
  - `UnmarshalRequest`: Converts the raw request body into a specific request Go struct.
  - `ValidateRequest`: Validates the request Go struct.
  - `BuildRequestCtx`: Constructs a RequestCtx (this is a new interface, which will be discussed later) and sends it to the queue.
- Rename the `BaseCompletionRequest` struct to `BaseRequest`, generalizing it to represent the Go struct for all potential future OpenAI requests.  

With these changes, we abstract the request processing logic and establish a standardized processing pipeline:
```
body parsing → Request struct validation → building request ctx
```

After this refactoring, the request-process component will support easy integration of new APIs.

### reqProcessingWorker

The current code implementation combines the processing logic for two types of APIs (the completions API and the chat completions API), making it inconvenient to add new APIs.

While preserving shared logic, we need to decouple the response construction and sending for each API. Specifically:

- Introduce a new interface `RequestCtx` with the following method:
  - `handle`: responsible for handling different types of APIs and returning the corresponding response.
-  The existing `CompletionReqCtx` must implement the `RequestCtx` interface, thereby achieving separation of concerns.

With this refactoring, the request-processing(reqProcessingWorker) flow will no longer depend on any specific API.

### The flowchart after refactoring:
```mermaid
flowchart TB
    rc1@{ shape: circle, label: "Start" } --> ra["/v1/chat/completions"]
    rc1@{ shape: circle, label: "Start" } --> ra2["/v1/completions"]
    ra --> A[build ChatCompletions request]
    ra2 --> A2[build Completions request]
    A --> B[Request]
    A2 --> B[Request]
    subgraph request-process 
        B -->|Yes| C[Return RandomFailure]
        B -->|No| E[Request.UnmarshalRequest]
        E --> F[Request.ValidateRequest]
        F --> G[reRequestq.BuildRequestCtx]
    end

    G -->|push to queue| H
    subgraph queue
        H@{ shape: das, label: "CompletionReqCtx channel" }
    end

    H -->|pop from queue| qb[reqCtx]
    subgraph reqProcessingWorker
        qb --> qc[reqCtx.handle]
        qc --> qm@{ shape: dbl-circ, label: "Stop" }
    end
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor requests and responses APIs #259

request-process

reqProcessingWorker

The flowchart after refactoring:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor requests and responses APIs #259

Description

request-process

reqProcessingWorker

The flowchart after refactoring:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions