This Rust project implements a high-performance in-process asynchronous request dispatcher using tokio. It distributes requests across a pool of workers with support for least-loaded worker selection and session affinity.
cargo run --example gcp-genai-api- Async request handling: Uses
tokiotasks and channels to process jobs concurrently. - Least-loaded worker selection: Requests are routed to the worker with the lowest total load (active + queued jobs).
- Session affinity: Optionally routes requests from the same session to the same worker within a configurable TTL.
- Worker metrics: Tracks in-flight jobs, queue length, capacity, and errors for observability.
- Configurable concurrency: Each worker has a semaphore-limited concurrency to prevent overload.
Client/API
│
▼
+----------------------+
| Worker Pool |
| +----------------+ |
| | Session Map | |
| | session -> idx | |
| +----------------+ |
+----------------------+
│
▼ Least-Loaded Worker Selection
│
+----+----+----+----+
| W0 | W1 | W2 | WN |
+----+----+----+----+
│ │ │
▼ ▼ ▼
Queue Queue Queue ...
│ │ │
In-flight/In-flight/In-flight
Errors/Errors/Errors
A Job represents a request sent to a worker along with a channel to respond asynchronously:
pub struct Job<R, S> {
pub req: R,
pub respond_to: oneshot::Sender<S>,
}A worker is responsible for processing jobs asynchronously. It includes:
- Job queue (MPSC channel)
- In-flight counter (
AtomicUsize) - Error counter (
AtomicUsize)
pub struct WorkerHandle<R, S> {
pub id: usize,
pub tx: mpsc::Sender<Job<R, S>>,
pub inflight: Arc<AtomicUsize>,
pub errors: Arc<AtomicUsize>,
}Workers are spawned with the helper function:
pub fn spawn_workers<R, S, F, Fut, E>(n: usize, handler: F) -> Vec<WorkerHandle<R, S>>
where
F: Fn(R) -> Fut + Send + Clone + 'static,
Fut: Future<Output = Result<S, E>> + Send + 'static,
E: Send + std::fmt::Debug + 'static,This function creates n workers, each executing the provided handler function with a concurrency limit.
The WorkerPool manages worker selection and job submission.
pub struct WorkerPool<R, S> {
workers: Vec<WorkerHandle<R, S>>,
affinity: DashMap<String, (usize, Instant)>,
affinity_ttl: Duration,
}Worker selection logic:
- Least-loaded: Picks the worker with the lowest in-flight + queued count.
- Session affinity: Routes requests for a session to the same worker if within TTL.
Submitting a job:
let response: S = wp.submit(Some("session-id"), request).await?;Getting metrics:
let metrics = wp.worker_metrics();use axum::{routing::post, Router};
use std::sync::Arc;
use tokio;
#[tokio::main]
async fn main() {
let workers = spawn_workers(num_cpus::get(), move |req| {
async move {
// Handle request asynchronously
process_request(req).await
}
});
let wp = WorkerPool::new(workers, std::time::Duration::from_secs(30));
let app = Router::new().route("/submit", post({
let wp = wp.clone();
move |req| async move {
lb.submit(Some("session-id"), req).await
}
}));
axum::Server::bind(&"0.0.0.0:8080".parse().unwrap())
.serve(app.into_make_service())
.await
.unwrap();
}Each worker reports:
worker_id– Unique worker identifierinflight– Number of currently processing jobsqueue_len– Jobs waiting in the worker's queuecapacity– Maximum queue capacityerrors– Number of failed or dropped jobs
MAX_CONCURRENCY_PER_WORKER // Maximum concurrent jobs per worker
MAX_WORKER_QUEUE_CAPACITY // Maximum queue length per workerAdjust these values to tune throughput and backpressure behavior.
See CONTRIBUTING.md for details.
Apache 2.0. See LICENSE for details.
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.