A hivemind server hosts one or several experts and processes incoming requests to those experts. It periodically re-publishes these experts to the dht via a dedicated hivemind.dht.DHT peer that runs in background. The experts can be accessed directly as hivemind.client.RemoteExpert("addr:port", "expert.uid.here") or as a part of hivemind.client.RemoteMixtureOfExperts that finds the most suitable experts across the DHT.
The hivemind.server module is organized as follows:
- Server is the main class that publishes experts, accepts incoming requests, and passes them to Runtime for compute.
- Runtime balances the device (GPU) usage between several ExpertBackend instances that each service one expert.
- ExpertBackend is a wrapper for torch.nn.Module that can be accessed by remote clients. It has two TaskPool -s for forward and backward requests.
- TaskPool stores incoming requests for a batch-parallel computation (e.g. forward pass), groups them into batches and offers those batches to Runtime for processing.
hivemind.server
hivemind.server