You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
server runs forward pass with attention cache and returns
Tensor output_embeddings
current length
GOTO step 5 while current length <= max length
If client does not send ping in T seconds (maybe empty message if no data yet), server closes connection.
Don't think about it:
support fixed max length for now, e.g. 1024 or 2048?
inference up to 256 steps excluding prefix - to ensure we don't spend too long with the same node?
select one or more of that node's consequent layers to inference at once?
send more than one token at a time?
option to backtrack for a few tokens for beam search inference?
beam search with multiple hypotheses - and an option to reorder them internally?
The text was updated successfully, but these errors were encountered:
client has a way of running rpc_inference for more than one step :)
test cache basic functionality (no leaks, values are properly reused)
test session basic functionality (closes properly, no leaks, multi-input)
rpc_inference works w/o mask and other stuff
rpc_inference uses cache correctly
[list of non-required steps merged into a separate issue]
justheuristic
changed the title
[CODE] Inferencing API on hivemind.Server aka forward_incremental
[CODE] Inferencing API on hivemind.Server aka rpc_inference
Jun 19, 2022
justheuristic
changed the title
[CODE] Inferencing API on hivemind.Server aka rpc_inference
[CODE] Basic Inferencing API on hivemind.Server aka rpc_inference
Jun 19, 2022
Why: talk to a 176B model ran on hundreds of small devices.
Implement an extended hivemind.Server that has forward/backward as usual, and an additional RPC named
forward_incremental
(stream<->stream)GOTO step 5 while current length <= max length
If client does not send ping in T seconds (maybe empty message if no data yet), server closes connection.
Don't think about it:
The text was updated successfully, but these errors were encountered: