khoj-ai · debanjum · Apr 2, 2024 · Mar 20, 2024 · Mar 15, 2024 · Mar 15, 2024
diff --git a/documentation/docs/features/chat.md b/documentation/docs/features/chat.md
@@ -14,16 +14,16 @@ You can configure Khoj to chat with you about anything. When relevant, it'll use
 
 ### Setup (Self-Hosting)
 #### Offline Chat
-Offline chat stays completely private and works without internet using open-source models.
+Offline chat stays completely private and can work without internet using open-source models.
 
 > **System Requirements**:
 >  - Minimum 8 GB RAM. Recommend **16Gb VRAM**
 >  - Minimum **5 GB of Disk** available
 >  - A CPU supporting [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) is required
->  - A Mac M1+ or [Vulcan supported GPU](https://vulkan.gpuinfo.org/) should significantly speed up chat response times
+>  - An Nvidia, AMD GPU or a Mac M1+ machine would significantly speed up chat response times
 
 1. Open your [Khoj offline settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and click *Enable* on the Offline Chat configuration.
-2. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the offline chat model you want to use. Make sure to use `Offline` as its type. We currently only support offline models that use the [Llama chat prompt](https://replicate.com/blog/how-to-prompt-llama#wrap-user-input-with-inst-inst-tags) format. We recommend using `mistral-7b-instruct-v0.1.Q4_0.gguf`.
+2. Open your [Chat model options settings](http://localhost:42110/server/admin/database/chatmodeloptions/) and add any [GGUF chat model](https://huggingface.co/models?library=gguf) to use for offline chat. Make sure to use `Offline` as its type. For a balanced chat model that runs well on standard consumer hardware we recommend using [Hermes-2-Pro-Mistral-7B by NousResearch](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF) by default.
 
 
 :::tip[Note]

diff --git a/documentation/docs/get-started/setup.mdx b/documentation/docs/get-started/setup.mdx
@@ -101,24 +101,44 @@ sudo -u postgres createdb khoj --password
 
 ##### Local Server Setup
 - *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine*
+- Check [llama-cpp-python setup](https://python.langchain.com/docs/integrations/llms/llamacpp#installation) if you hit any llama-cpp issues with the installation
 
 Run the following command in your terminal to install the Khoj backend.
 
 ```mdx-code-block
   <Tabs groupId="operating-systems">
     <TabItem value="macos" label="MacOS">
     ```shell
+# ARM/M1+ Machines
+MAKE_ARGS="-DLLAMA_METAL=on" python -m pip install khoj-assistant
+
+# Intel Machines
 python -m pip install khoj-assistant
     ```
     </TabItem>
     <TabItem value="win" label="Windows">
       ```shell
-      py -m pip install khoj-assistant
+ # 1. (Optional) To use NVIDIA (CUDA) GPU
+ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
+ # 1. (Optional) To use AMD (ROCm) GPU
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on"
+ # 1. (Optional) To use VULCAN GPU
+ CMAKE_ARGS="-DLLAMA_VULKAN=on"
+
+ # 2. Install Khoj
+ py -m pip install khoj-assistant
       ```
     </TabItem>
     <TabItem value="unix" label="Linux">
       ```shell
-python -m pip install khoj-assistant
+ # CPU
+ python -m pip install khoj-assistant
+ # NVIDIA (CUDA) GPU
+ CMAKE_ARGS="DLLAMA_CUBLAS=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
+ # AMD (ROCm) GPU
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
+ # VULCAN GPU
+ CMAKE_ARGS="-DLLAMA_VULKAN=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
       ```
     </TabItem>
   </Tabs>
@@ -179,13 +199,13 @@ If you're using a custom domain, you must use an SSL certificate. You can use [L
 1. Go to http://localhost:42110/server/admin and login with your admin credentials.
     1. Go to [OpenAI settings](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/) in the server admin settings to add an OpenAI processor conversation config. This is where you set your API key. Alternatively, you can go to the [offline chat settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and simply create a new setting with `Enabled` set to `True`.
     2. Go to the ChatModelOptions if you want to add additional models for chat.
-       - Set the `chat-model` field to a supported chat model[^1] of your choice. For example, you can specify `gpt-4-turbo-preview` if you're using OpenAI or `mistral-7b-instruct-v0.1.Q4_0.gguf` if you're using offline chat.
+       - Set the `chat-model` field to a supported chat model[^1] of your choice. For example, you can specify `gpt-4-turbo-preview` if you're using OpenAI or `NousResearch/Hermes-2-Pro-Mistral-7B-GGUF` if you're using offline chat.
        - Make sure to set the `model-type` field to `OpenAI` or `Offline` respectively.
        - The `tokenizer` and `max-prompt-size` fields are optional. Set them only when using a non-standard model (i.e not mistral, gpt or llama2 model).
 1. Select files and folders to index [using the desktop client](/get-started/setup#2-download-the-desktop-client). When you click 'Save', the files will be sent to your server for indexing.
     - Select Notion workspaces and Github repositories to index using the web interface.
 
-[^1]: Khoj, by default, can use [OpenAI GPT3.5+ chat models](https://platform.openai.com/docs/models/overview) or [GPT4All chat models that follow Llama2 Prompt Template](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models2.json). See [this section](/miscellaneous/advanced#use-openai-compatible-llm-api-server-self-hosting) to use non-standard chat models
+[^1]: Khoj, by default, can use [OpenAI GPT3.5+ chat models](https://platform.openai.com/docs/models/overview) or [GGUF chat models](https://huggingface.co/models?library=gguf). See [this section](/miscellaneous/advanced#use-openai-compatible-llm-api-server-self-hosting) to use non-standard chat models
 
 :::tip[Note]
 Using Safari on Mac? You might not be able to login to the admin panel. Try using Chrome or Firefox instead.

diff --git a/documentation/docs/miscellaneous/credits.md b/documentation/docs/miscellaneous/credits.md
@@ -10,4 +10,4 @@ Many Open Source projects are used to power Khoj. Here's a few of them:
 - Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
 - [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
 - [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
-- [GPT4All](https://github.com/nomic-ai/gpt4all) to chat with local LLM
+- [Llama.cpp](https://github.com/ggerganov/llama.cpp) to chat with local LLM
diff --git a/pyproject.toml b/pyproject.toml
@@ -62,8 +62,7 @@ dependencies = [
     "pymupdf >= 1.23.5",
     "django == 4.2.10",
     "authlib == 1.2.1",
-    "gpt4all == 2.1.0; platform_system == 'Linux' and platform_machine == 'x86_64'",
-    "gpt4all == 2.1.0; platform_system == 'Windows' or platform_system == 'Darwin'",
+    "llama-cpp-python == 0.2.56",
     "itsdangerous == 2.1.2",
     "httpx == 0.25.0",
     "pgvector == 0.2.4",

diff --git a/src/khoj/database/adapters/__init__.py b/src/khoj/database/adapters/__init__.py
@@ -43,7 +43,7 @@
 from khoj.search_filter.file_filter import FileFilter
 from khoj.search_filter.word_filter import WordFilter
 from khoj.utils import state
-from khoj.utils.config import GPT4AllProcessorModel
+from khoj.utils.config import OfflineChatProcessorModel
 from khoj.utils.helpers import generate_random_name, is_none_or_empty
 
 
@@ -705,8 +705,8 @@ def get_valid_conversation_config(user: KhojUser, conversation: Conversation):
             conversation_config = ConversationAdapters.get_default_conversation_config()
 
         if offline_chat_config and offline_chat_config.enabled and conversation_config.model_type == "offline":
-            if state.gpt4all_processor_config is None or state.gpt4all_processor_config.loaded_model is None:
-                state.gpt4all_processor_config = GPT4AllProcessorModel(conversation_config.chat_model)
+            if state.offline_chat_processor_config is None or state.offline_chat_processor_config.loaded_model is None:
+                state.offline_chat_processor_config = OfflineChatProcessorModel(conversation_config.chat_model)
 
             return conversation_config
 

diff --git a/src/khoj/database/models/__init__.py b/src/khoj/database/models/__init__.py
@@ -80,7 +80,7 @@ class ModelType(models.TextChoices):
 
     max_prompt_size = models.IntegerField(default=None, null=True, blank=True)
     tokenizer = models.CharField(max_length=200, default=None, null=True, blank=True)
-    chat_model = models.CharField(max_length=200, default="mistral-7b-instruct-v0.1.Q4_0.gguf")
+    chat_model = models.CharField(max_length=200, default="NousResearch/Hermes-2-Pro-Mistral-7B-GGUF")
     model_type = models.CharField(max_length=200, choices=ModelType.choices, default=ModelType.OFFLINE)
 
 

diff --git a/src/khoj/migrations/migrate_offline_chat_default_model_2.py b/src/khoj/migrations/migrate_offline_chat_default_model_2.py
@@ -0,0 +1,71 @@
+"""
+Current format of khoj.yml
+---
+app:
+    ...
+content-type:
+    ...
+processor:
+  conversation:
+    offline-chat:
+        enable-offline-chat: false
+        chat-model: mistral-7b-instruct-v0.1.Q4_0.gguf
+    ...
+search-type:
+    ...
+
+New format of khoj.yml
+---
+app:
+    ...
+content-type:
+    ...
+processor:
+  conversation:
+    offline-chat:
+        enable-offline-chat: false
+        chat-model: NousResearch/Hermes-2-Pro-Mistral-7B-GGUF
+    ...
+search-type:
+    ...
+"""
+import logging
+
+from packaging import version
+
+from khoj.utils.yaml import load_config_from_file, save_config_to_file
+
+logger = logging.getLogger(__name__)
+
+
+def migrate_offline_chat_default_model(args):
+    schema_version = "1.7.0"
+    raw_config = load_config_from_file(args.config_file)
+    previous_version = raw_config.get("version")
+
+    if "processor" not in raw_config:
+        return args
+    if raw_config["processor"] is None:
+        return args
+    if "conversation" not in raw_config["processor"]:
+        return args
+    if "offline-chat" not in raw_config["processor"]["conversation"]:
+        return args
+    if "chat-model" not in raw_config["processor"]["conversation"]["offline-chat"]:
+        return args
+
+    if previous_version is None or version.parse(previous_version) < version.parse(schema_version):
+        logger.info(
+            f"Upgrading config schema to {schema_version} from {previous_version} to change default (offline) chat model to mistral GGUF"
+        )
+        raw_config["version"] = schema_version
+
+        # Update offline chat model to use Nous Research's Hermes-2-Pro GGUF in path format suitable for llama-cpp
+        offline_chat_model = raw_config["processor"]["conversation"]["offline-chat"]["chat-model"]
+        if offline_chat_model == "mistral-7b-instruct-v0.1.Q4_0.gguf":
+            raw_config["processor"]["conversation"]["offline-chat"][
+                "chat-model"
+            ] = "NousResearch/Hermes-2-Pro-Mistral-7B-GGUF"
+
+        save_config_to_file(raw_config, args.config_file)
+    return args