Skip to content

Refactor requests and responses APIs #259

@irar2

Description

@irar2

Copied from #134, written by @Frapschen

The current implementation process of /v1/chat/completions and /v1/completions APIs:

flowchart TB
    rr@{ shape: circle, label: "Start" } --> ra["/v1/chat/completions"]
    rr@{ shape: circle, label: "Start" } --> rb["/v1/completions"]
    ra -->|s.HandleChatCompletions| A
    rb -->|s.HandleTextCompletions| A
    subgraph request-process
        A[handleCompletions] --> B{shouldInjectFailure}
        B -->|Yes| C[Return RandomFailure]
        B -->|No| E[parse raw request to a vllmReq]
        E --> F[validate vllmReq Request]
        F --> G[build CompletionReqCtx]
    end

    G -->|push to queue| H
    subgraph queue
        H@{ shape: das, label: "CompletionReqCtx channel" }
    end

    H -->|pop from queue| qb[reqCtx]
    subgraph reqProcessingWorker
        qb --> qc[mock metrics]
        qc --> qd[handle KVCache]
        qd --> qe[handel ToolCalls]
        qe --> qf{IsStream}
        qf -->|Yes| qg[return StreamingResponse]
        qg --> qj[build and write response chunk]
        qj --> ql[responseSentCallback]
        qf -->|No| qh[return Response]
        qh --> qk[build and write response]
        qk --> ql
        ql --> qm@{ shape: dbl-circ, label: "Stop" }
    end
Loading

Throughout the entire process, the CompletionReqCtx queue connects these two parts.

request-process

In this flow, the main tasks performed are:

  • Parsing the raw request body into a Go structize request
  • Validating the structure of the request body
  • Constructing a CompletionReqCtx and sending it to the queue

Currently, this process has high coupling, making it difficult to add new APIs. The following refactoring is required:

  • Rename the CompletionRequest interface to Request, generalizing it to represent all potential future OpenAI requests. Add the following methods to this interface:
    • UnmarshalRequest: Converts the raw request body into a specific request Go struct.
    • ValidateRequest: Validates the request Go struct.
    • BuildRequestCtx: Constructs a RequestCtx (this is a new interface, which will be discussed later) and sends it to the queue.
  • Rename the BaseCompletionRequest struct to BaseRequest, generalizing it to represent the Go struct for all potential future OpenAI requests.

With these changes, we abstract the request processing logic and establish a standardized processing pipeline:

body parsing → Request struct validation → building request ctx

After this refactoring, the request-process component will support easy integration of new APIs.

reqProcessingWorker

The current code implementation combines the processing logic for two types of APIs (the completions API and the chat completions API), making it inconvenient to add new APIs.

While preserving shared logic, we need to decouple the response construction and sending for each API. Specifically:

  • Introduce a new interface RequestCtx with the following method:
    • handle: responsible for handling different types of APIs and returning the corresponding response.
  • The existing CompletionReqCtx must implement the RequestCtx interface, thereby achieving separation of concerns.

With this refactoring, the request-processing(reqProcessingWorker) flow will no longer depend on any specific API.

The flowchart after refactoring:

flowchart TB
    rc1@{ shape: circle, label: "Start" } --> ra["/v1/chat/completions"]
    rc1@{ shape: circle, label: "Start" } --> ra2["/v1/completions"]
    ra --> A[build ChatCompletions request]
    ra2 --> A2[build Completions request]
    A --> B[Request]
    A2 --> B[Request]
    subgraph request-process 
        B -->|Yes| C[Return RandomFailure]
        B -->|No| E[Request.UnmarshalRequest]
        E --> F[Request.ValidateRequest]
        F --> G[reRequestq.BuildRequestCtx]
    end

    G -->|push to queue| H
    subgraph queue
        H@{ shape: das, label: "CompletionReqCtx channel" }
    end

    H -->|pop from queue| qb[reqCtx]
    subgraph reqProcessingWorker
        qb --> qc[reqCtx.handle]
        qc --> qm@{ shape: dbl-circ, label: "Stop" }
    end
Loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions