Direct communication between Client and LLM Inference Engine #143

dan-homebrew · 2023-09-11T09:48:34Z

Deliverable

To deliver by EOD Tuesday

Revert to initial architecture (Client and LLM communicate directly, SSEs streamed directly to Client)
Simple AuthZ for LLM Inference Server

Problem

See bug: llama 7B on llama.cpp on Apple Silicon is responding at 4 tokens per minute #38
Latency due to way we handle LLM responses
- Serverless function "collates" LLM responses before persisting in Hasura
- Hasura is responsible for updating client on frontend
Bigger issue identified
- Strategy for handling LLM persistence w/o introducing latency
- Strategy for making LLM conversations stateful

Private Zenhub Image

…een-client-llm fix: #143 - Direct communication between client and llm inference service

* chore: use OpenAI parser * chore: access host's services * chore: take out llm service - GGUF model for the latest llama.cpp support

dan-homebrew mentioned this issue Sep 11, 2023

bug: LLM stops responding after 3-4 messages #45

Closed

dan-homebrew assigned hiro-v and louis-jan Sep 11, 2023

dan-homebrew changed the title ~~Streamline LLM-Client Interactions~~ Direct communication between Client and LLM Inference Engine Sep 11, 2023

dan-homebrew added the P0: critical Mission critical label Sep 11, 2023

dan-homebrew mentioned this issue Sep 11, 2023

bug: Formatting is removed from LLM response #151

Closed

louis-jan mentioned this issue Sep 12, 2023

fix: #143 - Direct communication between client and llm inference service #159

Merged

6 tasks

0xSage added the type: bug Something isn't working label Sep 12, 2023

dan-homebrew closed this as completed in 6b462e3 Sep 12, 2023

dan-homebrew added a commit that referenced this issue Sep 12, 2023

Merge pull request #159 from janhq/fix/#143-direct-communication-betw…

83d2e34

…een-client-llm fix: #143 - Direct communication between client and llm inference service

louis-jan added a commit that referenced this issue Sep 12, 2023

re-#143: use OpenAI decoder and mutate final result from client (#164)

6aae985

* chore: use OpenAI parser * chore: access host's services * chore: take out llm service - GGUF model for the latest llama.cpp support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct communication between Client and LLM Inference Engine #143

Direct communication between Client and LLM Inference Engine #143

dan-homebrew commented Sep 11, 2023 •

edited

Loading

Direct communication between Client and LLM Inference Engine #143

Direct communication between Client and LLM Inference Engine #143

Comments

dan-homebrew commented Sep 11, 2023 • edited Loading

Deliverable

Problem

dan-homebrew commented Sep 11, 2023 •

edited

Loading