Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 6 additions & 7 deletions docs/docs/capabilities/models/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ For details on each format, see the [Model Formats](/docs/capabilities/models/mo
:::

## Built-in Models
Cortex.cpp offers a range of built-in models that include popular open-source options. These models, hosted on HuggingFace as [Cortex Model Repositories](/docs/hub/cortex-hub), are pre-compiled for different engines, enabling each model to have multiple branches in various formats.
Cortex offers a range of [Built-in models](/models) that include popular open-source options.

These models are hosted on [Cortex's HuggingFace](https://huggingface.co/cortexso) and are pre-compiled for different engines, enabling each model to have multiple branches in various formats.

### Built-in Model Variants
Built-in models are made available across the following variants:
Expand All @@ -39,10 +41,7 @@ Built-in models are made available across the following variants:
- **By Size**: `7b`, `13b`, and more.
- **By quantizations**: `q4`, `q8`, and more.

:::info
You can see our full list of Built-in Models [here](/models).
:::

## Next steps
- Cortex requires a `model.yaml` file to run a model. Find out more [here](/docs/capabilities/models/model-yaml).
- Cortex supports multiple model hubs hosting built-in models. See details [here](/docs/model-sources).
- See Cortex's list of [Built-in Models](/models).
- Cortex supports multiple model hubs hosting built-in models. See details [here](/docs/capabilities/models/sources).
- Cortex requires a `model.yaml` file to run a model. Find out more [here](/docs/capabilities/models/model-yaml).
6 changes: 3 additions & 3 deletions docs/docs/capabilities/models/model-yaml.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ Model load parameters include the options that control how Cortex.cpp runs the m
| `prompt_template` | Template for formatting the prompt, including system messages and instructions. | Yes |
| `engine` | The engine that run model, default to `llama-cpp` for local model with gguf format. | Yes |

All parameters from the `model.yml` file are used for running the model via the [CLI chat command](/docs/cli/chat) or [CLI run command](/docs/cli/run). These parameters also act as defaults when using the [model start API](/api-reference#tag/models/post/v1/models/start) through cortex.cpp.
All parameters from the `model.yml` file are used for running the model via the [CLI run command](/docs/cli/run). These parameters also act as defaults when using the [model start API](/api-reference#tag/models/post/v1/models/start) through cortex.cpp.

## Runtime parameters

Expand Down Expand Up @@ -217,8 +217,8 @@ The API is accessible at the `/v1/chat/completions` URL and accepts all paramete

With the `llama-cpp` engine, cortex.cpp accept all parameters from [`model.yml` inference section](#Inference Parameters) and accept all parameters from the chat completion API.

:::info
<!-- :::info
You can download all the supported model formats from the following:
- [Cortex Model Repos](/docs/hub/cortex-hub)
- [HuggingFace Model Repos](/docs/hub/hugging-face)
:::
::: -->
66 changes: 66 additions & 0 deletions docs/docs/capabilities/models/sources/hugging-face.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: Hugging Face
description: Cortex supports all `GGUF` and `ONNX` models available in Huggingface repositories, providing access to a wide range of models.
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

Cortex.cpp supports all `GGUF` from the [Hugging Face Hub](https://huggingface.co).

You can pull HuggingFace models via:
- repository handle: eg `author/model_id`
- direct url: eg `https://huggingface.co/QuantFactory/OpenMath2-Llama3.1-8B-GGUF/blob/main/OpenMath2-Llama3.1-8B.Q4_0.gguf`


## GGUF
To view all available `GGUF` models on HuggingFace, select the `GGUF` tag in the Libraries section.

![HF GGUF](/img/docs/gguf.png)
<Tabs>
<TabItem value="MacOs/Linux" label="MacOs/Linux">
```sh
## Pull the Codestral-22B-v0.1-GGUF model from the bartowski organization
cortex pull bartowski/Codestral-22B-v0.1-GGUF

# Pull the gemma-7b model from the google organization
cortex pull https://huggingface.co/QuantFactory/OpenMath2-Llama3.1-8B-GGUF/blob/main/OpenMath2-Llama3.1-8B.Q4_0.gguf
```
</TabItem>
<TabItem value="Windows" label="Windows">
```sh
## Pull the Codestral-22B-v0.1-GGUF model from the bartowski organization
cortex.exe pull bartowski/Codestral-22B-v0.1-GGUF

# Pull the gemma-7b model from the google organization
cortex.exe pull google/gemma-7b
```
</TabItem>
</Tabs>

<!-- ## ONNX
![HF ONNX](/img/docs/onnx.png)
To view all available `ONNX` models on HuggingFace, select the `ONNX` tag in the Libraries section.
<Tabs>
<TabItem value="MacOs/Linux" label="MacOs/Linux">
```sh
## Pull the XLM-Roberta-Large-Vit-B-16Plus model from the immich-app organization
cortex pull immich-app/XLM-Roberta-Large-Vit-B-16Plus

# Pull the mt0-base model from the bigscience organization
cortex pull bigscience/mt0-base
```
</TabItem>
<TabItem value="Windows" label="Windows">
```sh
## Pull the XLM-Roberta-Large-Vit-B-16Plus model from the immich-app organization
cortex.exe pull immich-app/XLM-Roberta-Large-Vit-B-16Plus

# Pull the mt0-base model from the bigscience organization
cortex.exe pull bigscience/mt0-base
```
</TabItem>
</Tabs>

## TensorRT-LLM
We are still working to support all available `TensorRT-LLM` models on HuggingFace. For now, Cortex.cpp only supports built-in `TensorRT-LLM` models, which can be downloaded from the [Cortex Model Repos](/docs/capabilities/models/sources/cortex-hub). -->
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
---
slug: /model-sources
title: Model Sources
description: Model
---

import DocCardList from "@theme/DocCardList";

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

# Pulling Models in Cortex

Cortex provides a streamlined way to pull (download) machine learning models from Hugging Face and other third-party sources, as well as import models from local storage. This functionality allows users to easily access a variety of pre-trained models to enhance their applications.
Expand Down Expand Up @@ -348,6 +342,4 @@ Response:
}
```

With Cortex, pulling and managing models is simplified, allowing you to focus more on building your applications!

<DocCardList />
With Cortex, pulling and managing models is simplified, allowing you to focus more on building your applications!
2 changes: 1 addition & 1 deletion docs/docs/chat-completions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -146,5 +146,5 @@ Cortex also acts as an aggregator for remote inference requests from a single en
:::note
Learn more about Chat Completions capabilities:
- [Chat Completions API Reference](/api-reference#tag/inference/post/chat/completions)
- [Chat Completions CLI command](/docs/cli/chat)
- [`cortex run` CLI command](/docs/cli/run)
:::
71 changes: 0 additions & 71 deletions docs/docs/cli/chat.mdx

This file was deleted.

45 changes: 8 additions & 37 deletions docs/docs/cli/cortex.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,8 @@ slug: /cli
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

# Cortex
This command list all the available commands within the Cortex.cpp commands.
# `cortex`
This command list all the available commands within the Cortex commands.

## Usage
:::info
Expand All @@ -21,48 +17,23 @@ You can use the `--verbose` flag to display more detailed output of the internal
<Tabs>
<TabItem value="MacOs/Linux" label="MacOs/Linux">
```sh
# Stable
cortex

# Beta
cortex-beta

# Nightly
cortex-nightly
```
</TabItem>
<TabItem value="Windows" label="Windows">
```sh
# Stable
cortex.exe

# Beta
cortex-beta.exe

# Nightly
cortex-nightly.exe
```
</TabItem>
</Tabs>


## Command Chaining
Cortex CLI's command chaining support allows multiple commands to be executed in sequence with a simplified syntax.

For example:

- [cortex run](/docs/cli/run)
- [cortex chat](/docs/cli/chat)

## Sub Commands

- [cortex start](/docs/cli/start): Start the Cortex API server (starts automatically with other commands)
- [cortex run](/docs/cli/run): Shortcut for `cortex models start`. Pull a remote model or start a local model, and start chatting.
- [cortex pull](/docs/cli/pull): Download a model.
- [cortex models](/docs/cli/models): Manage and configure models.
- [cortex chat](/docs/cli/chat): Send a chat request to a model.
- [cortex ps](/docs/cli/ps): Display active models and their operational status.
- [cortex embeddings](/docs/cli/embeddings): Create an embedding vector representing the input text.
- [cortex engines](/docs/cli/engines): Manage Cortex.cpp engines.
- [cortex pull|download](/docs/cli/pull): Download a model.
- [cortex run](/docs/cli/run): Shortcut to pull, start and chat with a model.
- [cortex update](/docs/cli/update): Update the Cortex.cpp version.
- [cortex start](/docs/cli/start): Start the Cortex.cpp API server.
- [cortex stop](/docs/cli/stop): Stop the Cortex.cpp API server.
- [cortex engines](/docs/cli/engines): Manage Cortex engines.
- [cortex update](/docs/cli/update): Update the Cortex version.
- [cortex stop](/docs/cli/stop): Stop the Cortex API server.
45 changes: 14 additions & 31 deletions docs/docs/cli/ps.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,59 +7,42 @@ slug: "ps"
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

# `cortex ps`

This command shows the running model and its status.


This command shows the running model and its status (Engine, RAM, VRAM, and Uptime).

## Usage
:::info
You can use the `--verbose` flag to display more detailed output of the internal processes. To apply this flag, use the following format: `cortex --verbose [subcommand]`.
:::
<Tabs>
<TabItem value="MacOs/Linux" label="MacOs/Linux">
```sh
# Stable
cortex ps [options]

# Beta
cortex-beta ps [options]

# Nightly
cortex-nightly ps [options]
```
</TabItem>
<TabItem value="Windows" label="Windows">
```sh
# Stable
cortex.exe ps [options]

# Beta
cortex-beta.exe ps [options]

# Nightly
cortex-nightly.exe ps [options]
```
</TabItem>
</Tabs>


For example, it returns the following table:

```bash
+----------------+-----------+----------+-----------+-----------+
| Model | Engine | RAM | VRAM | Up time |
+----------------+-----------+----------+-----------+-----------+
| tinyllama:gguf | llama-cpp | 35.16 MB | 601.02 MB | 5 seconds |
+----------------+-----------+----------+-----------+-----------+
> cortex ps
+------------------------+-----------+-----------+-----------+-------------------------------+
| Model | Engine | RAM | VRAM | Uptime |
+------------------------+-----------+-----------+-----------+-------------------------------+
| llama3.2:3b-gguf-q4-km | llama-cpp | 308.23 MB | 1.87 GB | 7 seconds |
+------------------------+-----------+-----------+-----------+-------------------------------+
| tinyllama:1b-gguf | llama-cpp | 35.16 MB | 636.18 MB | 1 hour, 5 minutes, 45 seconds |
+------------------------+-----------+-----------+-----------+-------------------------------+
```
## Options

| Option | Description | Required | Default value | Example |
|-------------------|-------------------------------------------------------|----------|---------------|-------------|
| `-h`, `--help` | Display help information for the command. | No | - | `-h` |
| `-h`, `--help` | Display help information for the command. | No | - | `-h` |

:::info
You can use the `--verbose` flag to display more detailed output of the internal processes. To apply this flag, use the following format: `cortex --verbose [subcommand]`.
:::
Loading
Loading