[CODE] Basic Inferencing API on hivemind.Server aka rpc_inference #3

justheuristic · 2022-05-31T18:54:55Z

Why: talk to a 176B model ran on hundreds of small devices.

Implement an extended hivemind.Server that has forward/backward as usual, and an additional RPC named forward_incremental (stream<->stream)

Here's the protocol for forward_incremental:

client sends request containing:
- requested layers
- requested max sequence length
- [optional: bid?]

server responds with info protobuf that contains:

bool accepted: if True, server decides to let client run inference and will await first request for T=10 seconds.
- [optional: queue?]
[float queue length: 0 if accepted right now, N if need to wait for N other nodes to finish before running]
[float throughput: server's estimated computation time, including time in queue]

client sends prefix embeddings:

Tensor prefix input_embeddings [1, prefix_length, hidden_size] with compression
[optional prefix attention mask[prefix_length, prefix_length], default = tril]

server runs forward pass, saves attention caches and return

Tensor prefix output_embeddings [1, prefix_length, hidden_size] with compression

client sends another token input embeddings

Tensor input_embeddings [1, 1, hidden_size] with compression
[optional prefix attention mask[1, prefix_length + prev_tokens], default = tril]

server runs forward pass with attention cache and returns

Tensor output_embeddings
current length

GOTO step 5 while current length <= max length
If client does not send ping in T seconds (maybe empty message if no data yet), server closes connection.

Don't think about it:

support fixed max length for now, e.g. 1024 or 2048?
inference up to 256 steps excluding prefix - to ensure we don't spend too long with the same node?
select one or more of that node's consequent layers to inference at once?
send more than one token at a time?
option to backtrack for a few tokens for beam search inference?
beam search with multiple hypotheses - and an option to reorder them internally?

The text was updated successfully, but these errors were encountered:

justheuristic · 2022-06-19T10:46:26Z

Things that should be done before the first demo:

rpc_inference does computes something :)
client has a way of running rpc_inference for more than one step :)
test cache basic functionality (no leaks, values are properly reused)
test session basic functionality (closes properly, no leaks, multi-input)
rpc_inference works w/o mask and other stuff
rpc_inference uses cache correctly

[list of non-required steps merged into a separate issue]

justheuristic · 2022-06-20T13:54:55Z

awaiting post-merge review by @GreenFatGuy

justheuristic added the development label May 31, 2022

justheuristic self-assigned this May 31, 2022

justheuristic mentioned this issue Jun 12, 2022

[CODE] fault-tolerant inference (client side) #7

Closed

justheuristic transferred this issue from another repository Jun 12, 2022

justheuristic changed the title ~~[CODE] Inferencing API on hivemind.Server aka forward_incremental~~ [CODE] Inferencing API on hivemind.Server aka rpc_inference Jun 19, 2022

justheuristic changed the title ~~[CODE] Inferencing API on hivemind.Server aka rpc_inference~~ [CODE] Basic Inferencing API on hivemind.Server aka rpc_inference Jun 19, 2022

justheuristic added a commit that referenced this issue Jun 19, 2022

TODOs were moved to #3 (comment)

47c3083

justheuristic closed this as completed Jun 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CODE] Basic Inferencing API on hivemind.Server aka rpc_inference #3

[CODE] Basic Inferencing API on hivemind.Server aka rpc_inference #3

justheuristic commented May 31, 2022

justheuristic commented Jun 19, 2022 •

edited

justheuristic commented Jun 20, 2022

[CODE] Basic Inferencing API on hivemind.Server aka rpc_inference #3

[CODE] Basic Inferencing API on hivemind.Server aka rpc_inference #3

Comments

justheuristic commented May 31, 2022

justheuristic commented Jun 19, 2022 • edited

justheuristic commented Jun 20, 2022

justheuristic commented Jun 19, 2022 •

edited