-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Copied from #134, written by @Frapschen
The current implementation process of /v1/chat/completions and /v1/completions APIs:
flowchart TB
rr@{ shape: circle, label: "Start" } --> ra["/v1/chat/completions"]
rr@{ shape: circle, label: "Start" } --> rb["/v1/completions"]
ra -->|s.HandleChatCompletions| A
rb -->|s.HandleTextCompletions| A
subgraph request-process
A[handleCompletions] --> B{shouldInjectFailure}
B -->|Yes| C[Return RandomFailure]
B -->|No| E[parse raw request to a vllmReq]
E --> F[validate vllmReq Request]
F --> G[build CompletionReqCtx]
end
G -->|push to queue| H
subgraph queue
H@{ shape: das, label: "CompletionReqCtx channel" }
end
H -->|pop from queue| qb[reqCtx]
subgraph reqProcessingWorker
qb --> qc[mock metrics]
qc --> qd[handle KVCache]
qd --> qe[handel ToolCalls]
qe --> qf{IsStream}
qf -->|Yes| qg[return StreamingResponse]
qg --> qj[build and write response chunk]
qj --> ql[responseSentCallback]
qf -->|No| qh[return Response]
qh --> qk[build and write response]
qk --> ql
ql --> qm@{ shape: dbl-circ, label: "Stop" }
end
Throughout the entire process, the CompletionReqCtx queue connects these two parts.
request-process
In this flow, the main tasks performed are:
- Parsing the raw request body into a Go structize request
- Validating the structure of the request body
- Constructing a
CompletionReqCtxand sending it to the queue
Currently, this process has high coupling, making it difficult to add new APIs. The following refactoring is required:
- Rename the
CompletionRequestinterface toRequest, generalizing it to represent all potential future OpenAI requests. Add the following methods to this interface:UnmarshalRequest: Converts the raw request body into a specific request Go struct.ValidateRequest: Validates the request Go struct.BuildRequestCtx: Constructs a RequestCtx (this is a new interface, which will be discussed later) and sends it to the queue.
- Rename the
BaseCompletionRequeststruct toBaseRequest, generalizing it to represent the Go struct for all potential future OpenAI requests.
With these changes, we abstract the request processing logic and establish a standardized processing pipeline:
body parsing → Request struct validation → building request ctx
After this refactoring, the request-process component will support easy integration of new APIs.
reqProcessingWorker
The current code implementation combines the processing logic for two types of APIs (the completions API and the chat completions API), making it inconvenient to add new APIs.
While preserving shared logic, we need to decouple the response construction and sending for each API. Specifically:
- Introduce a new interface
RequestCtxwith the following method:handle: responsible for handling different types of APIs and returning the corresponding response.
- The existing
CompletionReqCtxmust implement theRequestCtxinterface, thereby achieving separation of concerns.
With this refactoring, the request-processing(reqProcessingWorker) flow will no longer depend on any specific API.
The flowchart after refactoring:
flowchart TB
rc1@{ shape: circle, label: "Start" } --> ra["/v1/chat/completions"]
rc1@{ shape: circle, label: "Start" } --> ra2["/v1/completions"]
ra --> A[build ChatCompletions request]
ra2 --> A2[build Completions request]
A --> B[Request]
A2 --> B[Request]
subgraph request-process
B -->|Yes| C[Return RandomFailure]
B -->|No| E[Request.UnmarshalRequest]
E --> F[Request.ValidateRequest]
F --> G[reRequestq.BuildRequestCtx]
end
G -->|push to queue| H
subgraph queue
H@{ shape: das, label: "CompletionReqCtx channel" }
end
H -->|pop from queue| qb[reqCtx]
subgraph reqProcessingWorker
qb --> qc[reqCtx.handle]
qc --> qm@{ shape: dbl-circ, label: "Stop" }
end