janhq · gabrielle-ong · Nov 5, 2024 · Nov 4, 2024 · Nov 4, 2024 · Nov 4, 2024
diff --git a/docs/docs/capabilities/models/index.mdx b/docs/docs/capabilities/models/index.mdx
@@ -30,7 +30,9 @@ For details on each format, see the [Model Formats](/docs/capabilities/models/mo
 :::
 
 ## Built-in Models 
-Cortex.cpp offers a range of built-in models that include popular open-source options. These models, hosted on HuggingFace as [Cortex Model Repositories](/docs/hub/cortex-hub), are pre-compiled for different engines, enabling each model to have multiple branches in various formats.
+Cortex offers a range of [Built-in models](/models) that include popular open-source options. 
+
+These models are hosted on [Cortex's HuggingFace](https://huggingface.co/cortexso) and are pre-compiled for different engines, enabling each model to have multiple branches in various formats.
 
 ### Built-in Model Variants
 Built-in models are made available across the following variants: 
@@ -39,10 +41,7 @@ Built-in models are made available across the following variants:
 - **By Size**: `7b`, `13b`, and more.
 - **By quantizations**: `q4`, `q8`, and more.
 
-:::info
-You can see our full list of Built-in Models [here](/models). 
-:::
-
 ## Next steps
-- Cortex requires a `model.yaml` file to run a model. Find out more [here](/docs/capabilities/models/model-yaml).
-- Cortex supports multiple model hubs hosting built-in models. See details [here](/docs/model-sources).
+- See Cortex's list of [Built-in Models](/models). 
+- Cortex supports multiple model hubs hosting built-in models. See details [here](/docs/capabilities/models/sources).
+- Cortex requires a `model.yaml` file to run a model. Find out more [here](/docs/capabilities/models/model-yaml).
diff --git a/docs/docs/capabilities/models/model-yaml.mdx b/docs/docs/capabilities/models/model-yaml.mdx
@@ -179,7 +179,7 @@ Model load parameters include the options that control how Cortex.cpp runs the m
 | `prompt_template`      | Template for formatting the prompt, including system messages and instructions.      | Yes          |
 | `engine`      | The engine that run model, default to `llama-cpp` for local model with gguf format.      | Yes          |
 
-All parameters from the `model.yml` file are used for running the model via the [CLI chat command](/docs/cli/chat) or [CLI run command](/docs/cli/run). These parameters also act as defaults when using the [model start API](/api-reference#tag/models/post/v1/models/start) through cortex.cpp.
+All parameters from the `model.yml` file are used for running the model via the [CLI run command](/docs/cli/run). These parameters also act as defaults when using the [model start API](/api-reference#tag/models/post/v1/models/start) through cortex.cpp.
 
 ## Runtime parameters
 
@@ -217,8 +217,8 @@ The API is accessible at the `/v1/chat/completions` URL and accepts all paramete
 
 With the `llama-cpp` engine, cortex.cpp accept all parameters from [`model.yml` inference section](#Inference Parameters) and accept all parameters from the chat completion API.
 
-:::info
+<!-- :::info
 You can download all the supported model formats from the following:
 - [Cortex Model Repos](/docs/hub/cortex-hub)
 - [HuggingFace Model Repos](/docs/hub/hugging-face)
-:::
+::: -->
diff --git a/docs/docs/hub/cortex-hub.mdx → ...apabilities/models/sources/cortex-hub.mdx b/docs/docs/hub/cortex-hub.mdx → ...apabilities/models/sources/cortex-hub.mdx
diff --git a/docs/docs/capabilities/models/sources/hugging-face.mdx b/docs/docs/capabilities/models/sources/hugging-face.mdx
@@ -0,0 +1,66 @@
+---
+title: Hugging Face
+description: Cortex supports all `GGUF` and `ONNX` models available in Huggingface repositories, providing access to a wide range of models.
+---
+
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
+
+Cortex.cpp supports all `GGUF` from the [Hugging Face Hub](https://huggingface.co).
+
+You can pull HuggingFace models via:
+- repository handle: eg `author/model_id`
+- direct url: eg `https://huggingface.co/QuantFactory/OpenMath2-Llama3.1-8B-GGUF/blob/main/OpenMath2-Llama3.1-8B.Q4_0.gguf`
+
+
+## GGUF
+To view all available `GGUF` models on HuggingFace, select the `GGUF` tag in the Libraries section.
+
+![HF GGUF](/img/docs/gguf.png)
+<Tabs>
+  <TabItem value="MacOs/Linux" label="MacOs/Linux">
+  ```sh
+    ## Pull the Codestral-22B-v0.1-GGUF model from the bartowski organization
+    cortex pull bartowski/Codestral-22B-v0.1-GGUF
+
+    # Pull the gemma-7b model from the google organization
+    cortex pull https://huggingface.co/QuantFactory/OpenMath2-Llama3.1-8B-GGUF/blob/main/OpenMath2-Llama3.1-8B.Q4_0.gguf
+  ```
+  </TabItem>
+  <TabItem value="Windows" label="Windows">
+  ```sh
+    ## Pull the Codestral-22B-v0.1-GGUF model from the bartowski organization
+    cortex.exe pull bartowski/Codestral-22B-v0.1-GGUF
+
+    # Pull the gemma-7b model from the google organization
+    cortex.exe pull google/gemma-7b
+  ```
+  </TabItem>
+</Tabs>
+
+<!-- ## ONNX
+![HF ONNX](/img/docs/onnx.png)
+To view all available `ONNX` models on HuggingFace, select the `ONNX` tag in the Libraries section.
+<Tabs>
+  <TabItem value="MacOs/Linux" label="MacOs/Linux">
+  ```sh
+    ## Pull the XLM-Roberta-Large-Vit-B-16Plus model from the immich-app organization
+    cortex pull immich-app/XLM-Roberta-Large-Vit-B-16Plus
+
+    # Pull the mt0-base model from the bigscience organization
+    cortex pull bigscience/mt0-base
+  ```
+  </TabItem>
+  <TabItem value="Windows" label="Windows">
+  ```sh
+    ## Pull the XLM-Roberta-Large-Vit-B-16Plus model from the immich-app organization
+    cortex.exe pull immich-app/XLM-Roberta-Large-Vit-B-16Plus
+
+    # Pull the mt0-base model from the bigscience organization
+    cortex.exe pull bigscience/mt0-base
+  ```
+  </TabItem>
+</Tabs>
+
+## TensorRT-LLM
+We are still working to support all available `TensorRT-LLM` models on HuggingFace. For now, Cortex.cpp only supports built-in `TensorRT-LLM` models, which can be downloaded from the [Cortex Model Repos](/docs/capabilities/models/sources/cortex-hub). -->
diff --git a/docs/docs/hub/index.mdx → ...ocs/capabilities/models/sources/index.mdx b/docs/docs/hub/index.mdx → ...ocs/capabilities/models/sources/index.mdx
@@ -1,14 +1,8 @@
 ---
-slug: /model-sources
 title: Model Sources
+description: Model
 ---
 
-import DocCardList from "@theme/DocCardList";
-
-:::warning
-🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
-:::
-
 # Pulling Models in Cortex
 
 Cortex provides a streamlined way to pull (download) machine learning models from Hugging Face and other third-party sources, as well as import models from local storage. This functionality allows users to easily access a variety of pre-trained models to enhance their applications.
@@ -348,6 +342,4 @@ Response:
 }
 ```
 
-With Cortex, pulling and managing models is simplified, allowing you to focus more on building your applications!
-
-<DocCardList />
+With Cortex, pulling and managing models is simplified, allowing you to focus more on building your applications!
diff --git a/docs/docs/hub/nvidia-ngc.mdx → ...apabilities/models/sources/nvidia-ngc.mdx b/docs/docs/hub/nvidia-ngc.mdx → ...apabilities/models/sources/nvidia-ngc.mdx
diff --git a/docs/docs/chat-completions.mdx b/docs/docs/chat-completions.mdx
@@ -146,5 +146,5 @@ Cortex also acts as an aggregator for remote inference requests from a single en
 :::note
 Learn more about Chat Completions capabilities:
 - [Chat Completions API Reference](/api-reference#tag/inference/post/chat/completions)
-- [Chat Completions CLI command](/docs/cli/chat)
+- [`cortex run` CLI command](/docs/cli/run)
 :::
diff --git a/docs/docs/cli/chat.mdx b/docs/docs/cli/chat.mdx
diff --git a/docs/docs/cli/cortex.mdx b/docs/docs/cli/cortex.mdx
@@ -7,12 +7,8 @@ slug: /cli
 import Tabs from "@theme/Tabs";
 import TabItem from "@theme/TabItem";
 
-:::warning
-🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
-:::
-
-# Cortex
-This command list all the available commands within the Cortex.cpp commands.
+# `cortex`
+This command list all the available commands within the Cortex commands.
 
 ## Usage
 :::info
@@ -21,48 +17,23 @@ You can use the `--verbose` flag to display more detailed output of the internal
 <Tabs>
   <TabItem value="MacOs/Linux" label="MacOs/Linux">
   ```sh
-  # Stable
   cortex
-
-  # Beta
-  cortex-beta
-
-  # Nightly
-  cortex-nightly
   ```
   </TabItem>
   <TabItem value="Windows" label="Windows">
   ```sh
-  # Stable
   cortex.exe
-
-  # Beta
-  cortex-beta.exe
-
-  # Nightly
-  cortex-nightly.exe
   ```
   </TabItem>
 </Tabs>
 
-
-## Command Chaining
-Cortex CLI's command chaining support allows multiple commands to be executed in sequence with a simplified syntax.
-
-For example:
-
-- [cortex run](/docs/cli/run)
-- [cortex chat](/docs/cli/chat)
-
 ## Sub Commands
 
+- [cortex start](/docs/cli/start): Start the Cortex API server (starts automatically with other commands)
+- [cortex run](/docs/cli/run): Shortcut for `cortex models start`. Pull a remote model or start a local model, and start chatting.
+- [cortex pull](/docs/cli/pull): Download a model.
 - [cortex models](/docs/cli/models): Manage and configure models.
-- [cortex chat](/docs/cli/chat): Send a chat request to a model.
 - [cortex ps](/docs/cli/ps): Display active models and their operational status.
-- [cortex embeddings](/docs/cli/embeddings): Create an embedding vector representing the input text.
-- [cortex engines](/docs/cli/engines): Manage Cortex.cpp engines.
-- [cortex pull|download](/docs/cli/pull): Download a model.
-- [cortex run](/docs/cli/run): Shortcut to pull, start and chat with a model.
-- [cortex update](/docs/cli/update): Update the Cortex.cpp version.
-- [cortex start](/docs/cli/start): Start the Cortex.cpp API server.
-- [cortex stop](/docs/cli/stop): Stop the Cortex.cpp API server.
+- [cortex engines](/docs/cli/engines): Manage Cortex engines.
+- [cortex update](/docs/cli/update): Update the Cortex version.
+- [cortex stop](/docs/cli/stop): Stop the Cortex API server.
diff --git a/docs/docs/cli/ps.mdx b/docs/docs/cli/ps.mdx
@@ -7,59 +7,42 @@ slug: "ps"
 import Tabs from "@theme/Tabs";
 import TabItem from "@theme/TabItem";
 
-:::warning
-🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
-:::
-
 # `cortex ps`
 
-This command shows the running model and its status.
-
-
+This command shows the running model and its status (Engine, RAM, VRAM, and Uptime).
 
 ## Usage
-:::info
-You can use the `--verbose` flag to display more detailed output of the internal processes. To apply this flag, use the following format: `cortex --verbose [subcommand]`.
-:::
 <Tabs>
   <TabItem value="MacOs/Linux" label="MacOs/Linux">
   ```sh
-  # Stable
   cortex ps [options]
-
-  # Beta
-  cortex-beta ps [options]
-
-  # Nightly
-  cortex-nightly ps [options]
   ```
   </TabItem>
   <TabItem value="Windows" label="Windows">
   ```sh
-  # Stable
   cortex.exe ps [options]
-
-  # Beta
-  cortex-beta.exe ps [options]
-
-  # Nightly
-  cortex-nightly.exe ps [options]
   ```
   </TabItem>
 </Tabs>
 
-
 For example, it returns the following table:
 
 ```bash
-+----------------+-----------+----------+-----------+-----------+
-| Model          | Engine    | RAM      | VRAM      | Up time   |
-+----------------+-----------+----------+-----------+-----------+
-| tinyllama:gguf | llama-cpp | 35.16 MB | 601.02 MB | 5 seconds |
-+----------------+-----------+----------+-----------+-----------+
+> cortex ps
++------------------------+-----------+-----------+-----------+-------------------------------+
+| Model                  | Engine    | RAM       | VRAM      | Uptime                        |
++------------------------+-----------+-----------+-----------+-------------------------------+
+| llama3.2:3b-gguf-q4-km | llama-cpp | 308.23 MB | 1.87 GB   | 7 seconds                     |
++------------------------+-----------+-----------+-----------+-------------------------------+
+| tinyllama:1b-gguf      | llama-cpp | 35.16 MB  | 636.18 MB | 1 hour, 5 minutes, 45 seconds |
++------------------------+-----------+-----------+-----------+-------------------------------+
 ```
 ## Options
 
 | Option            | Description                                           | Required | Default value | Example     |
 |-------------------|-------------------------------------------------------|----------|---------------|-------------|
-| `-h`, `--help`      | Display help information for the command.             | No       | -             | `-h`    |
+| `-h`, `--help`      | Display help information for the command.             | No       | -             | `-h`    |
+
+:::info
+You can use the `--verbose` flag to display more detailed output of the internal processes. To apply this flag, use the following format: `cortex --verbose [subcommand]`.
+:::