-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Code Pointer: service.py
The idea behind batching is simple: you aggregate similar requests into a single payload for more efficient processing.
What I want to get discussion on is: should Services and Replicas handle batching or should they just blindly pass payloads along?
IMO there are currently 2 potential patterns (that can/should co-exist) for supporting batching without upstreaming to Services/Replicas
- Caller Batching: Agents expose an endpoint that explicitly expects a batched input
- Caller + Agent operate on batched payloads
- Con: Caller has to deal with batching
- Actor Batching: Agents expose an endpoint to enqueue requests and Agents writes separate logic to batch process internally
- Caller operates on singletons
- Agents handles aggregating and operates on batched payloads
- Con: Agents are a tad more complex and will share similar boilerplate
Where Services/Replicas can potentially comes in is with replica routing and prebatching in the Actor Batching cases.
- Callers call singleton endpoints
- Services route batched requests to specific Replicas
- Aware Routing: Services route payloads (w/ similarity criterion) and to specific Replicas
- Replicas aggregate singleton requests
- Prebatching: Replicas create an aggregated payload and passes it along to the Agent's Batched endpoint
- Agents process aggregated payload
~ Note that the singleton endpoint called by the user, never reaches the Actor ~
In combination, this gives Forge the ability to generalize customizing the routing and batching process.
Beyond that, this has an added side effect of simplifying the Actor flow (execution and creating new ones)
[Current] Without Services/Replica handling batching
def NewActor(ForgeActor):
...
@endpoint
async def add_request(self):
# Enqueue
async def run(self):
# Process from internal Queue
async def process_batched_request(self):
# Execution
With Services/Replica handling batching
def NewActor(ForgeActor):
...
@endpoint
async def process_batched_request(self):
# Execution
@classmethod
def enbatch(cls, request, cummulative_batch):
# Add request to batch struct
Welcome all feedback, but here are a few seeding questions
- If we plan to upstream Service/Replica, does this make them too RL specific?
- Does custom routing match the identity of Forge ("Users don't need to think about the network magic")?