Migrate to Llama.cpp for Offline Chat #680

debanjum · 2024-03-20T21:49:48Z

Benefits

Support all GGUF format chat models
Support more GPUs like AMD, Nvidia, Mac, Vulcan (previously just Vulcan, Mac)
Support more capabilities like larger context window, schema enforcement, speculative decoding etc.

Changes

Major

978ebfe Use llama.cpp for offline chat models
- Support larger context window
- Automatically apply appropriate chat template. So offline chat models not using llama2 format are now supported
- shiny new default offline chat model
f8ba541 Enable extract queries actor to improve notes search with offline chat
aafb878 Update documentation to use llama.cpp for offline chat in Khoj

Minor

e131de9 Use NouseResearch's Hermes-2-Pro as default offline chat model in khoj.yml
a05c0f1 Rename GPT4AllChatProcessor* to OfflineChatProcessor Config, Model
a52d1d7 Only add location to image prompt generator when location known

sabaimran

Exciting getting llama.cpp running & having extract_questions working with the offline models.

src/khoj/database/models/__init__.py

pyproject.toml

documentation/docs/get-started/setup.mdx

src/khoj/migrations/migrate_offline_chat_default_model_2.py

src/khoj/processor/conversation/offline/utils.py

documentation/docs/features/chat.md

- Benefits of moving to llama-cpp-python from gpt4all: - Support for all GGUF format chat models - Support for AMD, Nvidia, Mac, Vulcan GPU machines (instead of just Vulcan, Mac) - Supports models with more capabilities like tools, schema enforcement, speculative ddecoding, image gen etc. - Upgrade default chat model, prompt size, tokenizer for new supported chat models - Load offline chat model when present on disk without requiring internet - Load model onto GPU if not disabled and device has GPU - Load model onto CPU if loading model onto GPU fails - Create helper function to check and load model from disk, when model glob is present on disk. `Llama.from_pretrained' needs internet to get repo info from HuggingFace. This isn't required, if the model is already downloaded Didn't find any existing HF or llama.cpp method that looked for model glob on disk without internet

- How to pip install khoj to run offline chat on GPU After migration to llama-cpp-python more GPU types are supported but require build step so mention how - New default offline chat model - Where to get supported chat models from on HuggingFace

Previously we were skipping the extract questions step for offline chat as default offline chat model wasn't good enough to output proper json given the time it took to extract questions. The new default offline chat models gives json much more regularly and with date filters, so the extract questions step becomes useful given the impact on latency

Notice and truncate the question it self at this point

debanjum force-pushed the migrate-to-llama-cpp-for-offline-chat branch 2 times, most recently from f8ba541 to 5f8e494 Compare March 20, 2024 23:05

debanjum requested a review from sabaimran March 20, 2024 23:27

sabaimran reviewed Mar 21, 2024

View reviewed changes

debanjum force-pushed the migrate-to-llama-cpp-for-offline-chat branch from 5f8e494 to b68d88a Compare March 24, 2024 11:18

debanjum added 6 commits March 26, 2024 22:33

Only add location to image prompt generator when location known

0a7392f

Use Hermes-2-Pro as default offline chat model in khoj.yml

2a0b943

Rename GPT4AllChatProcessor* to OfflineChatProcessor Config, Model

1ebd5c3

debanjum force-pushed the migrate-to-llama-cpp-for-offline-chat branch from b68d88a to 4912c0e Compare March 27, 2024 05:03

debanjum added 4 commits March 31, 2024 00:59

Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat

886d49e

Fix docs showing how to setup llama-cpp with Khoj

c6487f2

Handle msg truncation when question is larger than max prompt size

4228965

Notice and truncate the question it self at this point

Let offline chat model set context window. Improve, fix prompts

7afee2d

debanjum merged commit 3c3e48b into master Apr 2, 2024
9 checks passed

debanjum deleted the migrate-to-llama-cpp-for-offline-chat branch April 2, 2024 16:02

nmathey mentioned this pull request Jun 9, 2024

[FIX] Docker-compose up #748

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to Llama.cpp for Offline Chat #680

Migrate to Llama.cpp for Offline Chat #680

debanjum commented Mar 20, 2024 •

edited

Loading

sabaimran left a comment

Migrate to Llama.cpp for Offline Chat #680

Migrate to Llama.cpp for Offline Chat #680

Conversation

debanjum commented Mar 20, 2024 • edited Loading

Benefits

Changes

Major

Minor

sabaimran left a comment

Choose a reason for hiding this comment

debanjum commented Mar 20, 2024 •

edited

Loading