Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline chat: Quality and Reliability Improvements #393

Merged
merged 29 commits into from Aug 2, 2023

Conversation

sabaimran
Copy link
Collaborator

@sabaimran sabaimran commented Aug 1, 2023

Incoming

Major

Fix Prompt Size Exceeded Issue

Improve Llama 2 Model Download

Fix Segmentation Fault due to Race

Improve Chat Response Latency

Fix Fake Dialogue Continuation

Minor

  • Improve default message for Chat window on web when it's not configured. Include hint to use offline chat.
  • Add null check in perform_chat_checks method
  • Add offline chat director unit tests

Performance Analysis (Time to First Token)

v0.10.0 this branch
Query 1 52s 28s
Query 2 33s 42s
Query 3 67s 38s

debanjum and others added 7 commits July 31, 2023 17:21
Previously the system message was getting dropped when the context
size with chat history would be more than the max prompt size
supported by the cat model

Now only the previous chat messages are dropped or the current
message is truncated but the system message is kept to provide
guidance to the chat model
- Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S
- Use a proper Llama Tokenizer for counting tokens for truncation with Llama
- Add additional null checks when running
@sabaimran sabaimran added fix Fix something that isn't working as expected upgrade New feature or request labels Aug 1, 2023
@sabaimran sabaimran requested a review from debanjum August 1, 2023 03:58
Copy link
Collaborator

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks great with all the bug fixes and quality improvements based on user feedback! 🚀

tests/test_conversation_utils.py Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
src/khoj/processor/conversation/gpt4all/utils.py Outdated Show resolved Hide resolved
src/khoj/processor/conversation/gpt4all/chat_model.py Outdated Show resolved Hide resolved
src/khoj/processor/conversation/gpt4all/chat_model.py Outdated Show resolved Hide resolved
- Use same batch_size in extract question actor as the chat actor
- Log final location the chat model is to be stored in, instead of
  it's temp filename while it is being downloaded
It would previously some times start generating fake dialogue with
it's internal prompt patterns of <s>[INST] in responses.

This is a jarring experience. Stop generation response when hit <s>

Resolves #398
Create regression text to ensure it does not throw the prompt size
exceeded context window error
@debanjum debanjum force-pushed the improve-llama-2-perf-and-quality-and-fixes branch from 8a94dba to 185a1fb Compare August 2, 2023 03:52
- Only make them update config when it's run conditions are satisfies
- Use static schema version to simplify reasoning about run conditions
This should ease readability, indicates which version this
migration script will update the schema to once applied
@debanjum debanjum merged commit 16c6bfc into master Aug 2, 2023
4 checks passed
@debanjum debanjum deleted the improve-llama-2-perf-and-quality-and-fixes branch August 2, 2023 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fix something that isn't working as expected upgrade New feature or request
Projects
None yet
2 participants