janhq · irfanpena · Jun 26, 2024 · Jun 25, 2024 · Jun 26, 2024 · Jun 26, 2024
diff --git a/docs/cortex-onnx.mdx b/docs/cortex-onnx.mdx
@@ -4,6 +4,10 @@ description: Onnx Architecture
 slug: "cortex-onnx"
 ---
 
+:::warning
+🚧 Cortex is under construction.
+:::
+
 ## Introduction
 Cortex.onnx is a C++ inference library for Windows that relies on [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai), utilizing DirectML for hardware acceleration. [DirectML](https://github.com/microsoft/DirectML) is a high-performance DirectX 12 library for machine learning, providing GPU acceleration across various hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs. It integrates and sometimes upstreams [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) for inference tasks.
 

diff --git a/docs/cortex-tensorrt-llm.mdx b/docs/cortex-tensorrt-llm.mdx
@@ -4,6 +4,10 @@ description: NVIDIA TensorRT-LLM Architecture
 slug: "cortex-tensorrt-llm"
 ---
 
+:::warning
+🚧 Cortex is under construction.
+:::
+
 ## Introduction
 
 [Cortex.tensorrt-llm](https://github.com/janhq/cortex.tensorrt-llm) is a C++ inference library for NVIDIA GPUs. It submodules NVIDIA’s [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) for GPU accelerated inference.

diff --git a/docs/hardware.md b/docs/hardware.md
@@ -4,6 +4,10 @@ description: Get started quickly with Jan, a ChatGPT-alternative that runs on yo
 slug: "/hardware"
 ---
 
+:::warning
+🚧 Cortex is under construction.
+:::
+
 To run LLMs on device, Cortex has the following hardware requirements:
 
 :::info
@@ -16,6 +20,21 @@ These are the general hardware requirements for running Cortex on your system. P
 - Windows 10 or higher.
 - Ubuntu 12.04 and later.
 
+## CPU
+- Mac:
+    - Support Intel and Mac Apple Silicon CPUs.
+- Windows & Linux:
+    - Haswell processors (Q2 2013) and newer.
+    - Tiger Lake (Q3 2020) and newer for Celeron and Pentium processors.
+:::info
+Cortex supports multiple CPU instructions: AVX, AVX2, and AVX512.
+:::
+## GPU
+- At least 6GB VRAM is recommended when using NVIDIA, AMD, or Intel Arc GPUs.
+
+:::info
+Cortex Supports NVIDIA GPU acceleration (Support for other GPUs will come soon!)
+:::
 ## RAM (CPU Mode)
 
 - 8GB for running up to 3B models.

diff --git a/docs/installation/linux.mdx b/docs/installation/linux.mdx
@@ -110,7 +110,7 @@ Install NPM on your machine before proceeding with this step.
 
 ```sh
 # Install globally on your system
-npm i -g @janhq/cortex
+npm i -g cortexso
 ```
 :::info
 Cortex automatically detects your CPU and GPU, downloading the appropriate CPU instruction sets and required dependencies to optimize GPU performance.
@@ -160,5 +160,5 @@ cortex -h
 Run the following command to uninstall Cortex globally on your machine:
 ```sh
 # Uninstall globally on your system
-npm uninstall -g @janhq/cortex
+npm uninstall -g cortexso
 ```
diff --git a/docs/installation/mac.mdx b/docs/installation/mac.mdx
@@ -66,7 +66,7 @@ Install NPM on your machine before proceeding with this step.
 
 ```sh
 # Install globally on your system
-npm i -g @janhq/cortex
+npm i -g cortexso
 ```
 :::info
 Cortex automatically detects your CPU and GPU, downloading the appropriate CPU instruction sets and required dependencies to optimize GPU performance.
@@ -115,5 +115,5 @@ cortex -h
 Run the following command to uninstall Cortex globally on your machine:
 ```sh
 # Uninstall globally using NPM
-npm uninstall -g @janhq/cortex
+npm uninstall -g cortexso
 ```
diff --git a/docs/installation/windows.mdx b/docs/installation/windows.mdx
@@ -88,7 +88,7 @@ Install NPM on your machine before proceeding with this step.
 
 ```sh
 # Install globally on your system
-npm i -g @janhq/cortex
+npm i -g cortexso
 ```
 :::info
 Cortex automatically detects your CPU and GPU, downloading the appropriate CPU instruction sets and required dependencies to optimize GPU performance.
@@ -122,7 +122,7 @@ Install NPM on your machine before proceeding with this step.
 
 ```sh
 # Install globally on your system
-npm i -g @janhq/cortex
+npm i -g cortexso
 ```
 :::info
 Cortex automatically detects your CPU and GPU, downloading the appropriate CPU instruction sets and required dependencies to optimize GPU performance.
@@ -171,5 +171,5 @@ cortex -h
 Run the following command to uninstall Cortex globally on your machine:
 ```sh
 # Uninstall globally on your system
-npm uninstall -g @janhq/cortex
+npm uninstall -g cortexso
 ```
diff --git a/docs/model-operations.mdx b/docs/model-operations.mdx
@@ -194,6 +194,69 @@ If a `model.yaml` is not available, Cortex autogenerates it from model metadata.
 
   </TabItem>
   <TabItem value="TensorRT-LLM" label="TensorRT-LLM">
+  **Example `model.yaml` for `llama3 8B-tensorrt-llm-windows-ada`**
+  ```yaml
+    name: Llama 3
+    model: llama3:8B
+    version: 1
+
+    # files:
+
+    # Results Preferences
+    top_p: 0.95
+    temperature: 0.7
+    frequency_penalty: 0
+    presence_penalty: 0
+    max_tokens: 8192 # Infer from base config.json -> max_position_embeddings
+    stream: true # true | false
+
+    # Engine / Model Settings
+    engine: cortex.tensorrtllm
+    os: windows # from CI env var
+    gpu_arch: ada # from CI env var
+    quantization_method: awq # from CI env var
+    precision: int4 # from CI env var
+    tp: 1 # from CI env var
+    trtllm_version: 0.9.0 # From CI env var
+    ctx_len: 8192 # Infer from base config.json -> max_position_embeddings
+    text_model: false # Fixed value - https://github.com/janhq/jan/blob/dev/extensions/tensorrt-llm-extension/resources/models.json#L41C7-L41C26
+    prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
+    # Prompt template: Can only be retrieved from instruct model
+    # - https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/tokenizer_config.json#L2053
+    # - Requires jinja format parser
+    ```
+    **Example `model.yaml` for `llama3 8B-tensorrt-llm-linux-ada`**
+
+    ```yaml
+    name: Llama 3
+    model: llama3:8B
+    version: 1
+
+    # files:
+
+    # Results Preferences
+    top_p: 0.95
+    temperature: 0.7
+    frequency_penalty: 0
+    presence_penalty: 0
+    max_tokens: 8192 # Infer from base config.json -> max_position_embeddings
+    stream: true # true | false
+
+    # Engine / Model Settings
+    engine: cortex.tensorrtllm
+    os: linux # from CI env var
+    gpu_arch: ada # from CI env var
+    quantization_method: awq # from CI env var
+    precision: int4 # from CI env var
+    tp: 1 # from CI env var
+    trtllm_version: 0.9.0 # From CI env var
+    ctx_len: 8192 # Infer from base config.json -> max_position_embeddings
+    text_model: false # Fixed value - https://github.com/janhq/jan/blob/dev/extensions/tensorrt-llm-extension/resources/models.json#L41C7-L41C26
+    prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
+    # Prompt template: Can only be retrieved from instruct model
+    # - https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/tokenizer_config.json#L2053
+    # - Requires jinja format parser
+    ```
   </TabItem>
 </Tabs>
 

diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -58,8 +58,9 @@ cortex pull bartowski/Hermes-2-Theta-Llama-3-70B-GGUF
 # Download a preconfigured model from https://huggingface.co/cortexhub
 cortex pull llama3
 ```
-
+:::info
 Read more about [model operations](./model-operations).
+:::
 
 ## Get Help
 
@@ -69,9 +70,11 @@ cortex
 cortex CMD -h
 ```
 
+:::info
 Cortex is still in early development, so if you have any questions, please reach out to us: 
 - [GitHub](https://github.com/janhq/cortex)
 - [Discord](https://discord.gg/YFKKeuVu)
+:::
 
 ## Next Steps
 

diff --git a/docs/telemetry.mdx b/docs/telemetry.mdx
@@ -8,6 +8,9 @@ slug: "telemetry"
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
+:::warning
+🚧 Cortex is under construction.
+:::
 
 ## Introduction
 We collect anonymous usage data to enhance our product development.