diff --git a/docs/docs/demos/chatbox-vid.mdx b/docs/docs/demos/chatbox-vid.mdx
deleted file mode 100644
index 00c04a007..000000000
--- a/docs/docs/demos/chatbox-vid.mdx
+++ /dev/null
@@ -1,24 +0,0 @@
----
-title: Run local chatbox under 1 minute on MacOS with Nitro
----
-
-
-
-## Links
-
-- [Download Nitro](https://github.com/janhq/nitro/releases)
-- [Download Chatbox](https://github.com/Bin-Huang/chatbox)
-
-## Commands
-
-```bash title="Load model"
-curl http://localhost:3928/inferences/llamacpp/loadmodel \
- -H 'Content-Type: application/json' \
- -d '{
- "llama_model_path": "model/llama-2-7b-chat.Q5_K_M.gguf",
- "ctx_len": 512,
- "ngl": 100,
- }'
-```
-
-For more information, please refer to the [Nitro with Chatbox](examples/chatbox.md) documentation.
\ No newline at end of file
diff --git a/docs/docs/examples/chatbox.md b/docs/docs/examples/chatbox.md
deleted file mode 100644
index 965ba80e5..000000000
--- a/docs/docs/examples/chatbox.md
+++ /dev/null
@@ -1,63 +0,0 @@
----
-title: Nitro with Chatbox
----
-
-This guide demonstrates how to integrate Nitro with Chatbox, showcasing the compatibility of Nitro with various platforms.
-
-## What is Chatbox?
-Chatbox is a versatile desktop client that supports multiple cutting-edge Large Language Models (LLMs). It is available for Windows, Mac, and Linux operating systems.
-
-For more information, please visit the [Chatbox official GitHub page](https://github.com/Bin-Huang/chatbox).
-
-
-## Downloading and Installing Chatbox
-
-To download and install Chatbox, follow the instructions available at this [link](https://github.com/Bin-Huang/chatbox#download).
-
-## Using Nitro as a Backend
-
-**1. Start Nitro server**
-
-Open your command line tool and enter:
-```
-nitro
-```
-
-**2. Download Model**
-
-Use these commands to download and save the [Llama2 7B chat model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main):
-
-```bash
-mkdir model && cd model
-wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
-```
-
-> For more GGUF model, please look at [The Bloke](https://huggingface.co/TheBloke).
-
-**3. Run the Model**
-
-To load the model, use the following command:
-
-```
-curl http://localhost:3928/inferences/llamacpp/loadmodel \
- -H 'Content-Type: application/json' \
- -d '{
- "llama_model_path": "model/llama-2-7b-chat.Q5_K_M.gguf",
- "ctx_len": 512,
- "ngl": 100,
- }'
-```
-
-**4. Config chatbox**
-
-Adjust the `settings` in Chatbox to connect with Nitro. Change your settings to match the configuration shown in the image below:
-
-
-
-**5. Chat with the Model**
-
-Once the setup is complete, you can start chatting with the model using Chatbox. All functions of Chatbox are now enabled with Nitro as the backend.
-
-## Futher Usage
-
-For convenient usage, you can utilize [Jan](https://jan.ai/), as it is integrated with Nitro.
\ No newline at end of file
diff --git a/docs/docs/examples/jan.md b/docs/docs/examples/jan.md
index ec22d6fab..365050737 100644
--- a/docs/docs/examples/jan.md
+++ b/docs/docs/examples/jan.md
@@ -1,5 +1,6 @@
---
title: Nitro with Jan
+description: Nitro integrates with Jan to enable a ChatGPT-like functional app, optimized for local AI.
---
You can effortlessly utilize Nitro through [Jan](https://jan.ai/), as it is fully integrated with all its functions. With Jan, using Nitro becomes straightforward without the need for any coding.
diff --git a/docs/docs/examples/openai-node.md b/docs/docs/examples/openai-node.md
index 6a75982de..f12539e0f 100644
--- a/docs/docs/examples/openai-node.md
+++ b/docs/docs/examples/openai-node.md
@@ -1,9 +1,10 @@
---
title: Nitro with openai-node
+description: Nitro intergration guide for Node.js.
---
You can migrate from OAI API or Azure OpenAI to Nitro using your existing NodeJS code quickly
-> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
+> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
- NodeJS OpenAI SDK: https://www.npmjs.com/package/openai
## Chat Completion
@@ -240,17 +241,23 @@ embedding();
## Audio
-Coming soon
+
+:::info Coming soon
+:::
## How to reproduce
-1. Step 1: Dependencies installation
-```
+
+**Step 1:** Dependencies installation
+
+```bash
npm install --save openai typescript
# or
yarn add openai
```
-2. Step 2: Fill `tsconfig.json`
-```json
+
+**Step 2:** Fill `tsconfig.json`
+
+```js
{
"compilerOptions": {
"moduleResolution": "node",
@@ -263,7 +270,9 @@ yarn add openai
"lib": ["es2015"]
}
```
-3. Step 3: Fill `index.ts` file with code
-3. Step 4: Build with `npx tsc`
-4. Step 5: Run the code with `node dist/index.js`
-5. Step 6: Enjoy!
\ No newline at end of file
+
+**Step 3:** Fill `index.ts` file with code.
+
+**Step 4:** Build with `npx tsc`.
+
+**Step 5:** Run the code with `node dist/index.js`.
\ No newline at end of file
diff --git a/docs/docs/examples/openai-python.md b/docs/docs/examples/openai-python.md
index e3082078c..be36d6d43 100644
--- a/docs/docs/examples/openai-python.md
+++ b/docs/docs/examples/openai-python.md
@@ -1,10 +1,11 @@
---
title: Nitro with openai-python
+description: Nitro intergration guide for Python.
---
You can migrate from OAI API or Azure OpenAI to Nitro using your existing Python code quickly
-> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
+> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
- Python OpenAI SDK: https://pypi.org/project/openai/
## Chat Completion
@@ -22,7 +23,10 @@ import asyncio
from openai import AsyncOpenAI
# gets API Key from environment variable OPENAI_API_KEY
-client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
+client = AsyncOpenAI(
+ base_url="http://localhost:3928/v1/",
+ api_key="sk-xxx"
+)
async def main() -> None:
@@ -74,22 +78,16 @@ asyncio.run(main())
```python
from openai import AzureOpenAI
-openai.api_key = '...' # Default is environment variable AZURE_OPENAI_API_KEY
+openai.api_key = '...' # Default is AZURE_OPENAI_API_KEY
stream = AzureOpenAI(
api_version=api_version,
- # https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource
azure_endpoint="https://example-endpoint.openai.azure.com",
)
completion = client.chat.completions.create(
model="deployment-name", # e.g. gpt-35-instant
- messages=[
- {
- "role": "user",
- "content": "How do I output all files in a directory using Python?",
- },
- ],
+ messages=[{"role": "user", "content": "Say this is a test"}],
stream=True,
)
for part in stream:
@@ -115,11 +113,15 @@ import asyncio
from openai import AsyncOpenAI
# gets API Key from environment variable OPENAI_API_KEY
-client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
+client = AsyncOpenAI(base_url="http://localhost:3928/v1/",
+ api_key="sk-xxx")
async def main() -> None:
- embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
+ embedding = await client.embeddings.create(
+ input='Hello How are you?',
+ model='text-embedding-ada-002'
+ )
print(embedding)
asyncio.run(main())
@@ -140,7 +142,10 @@ client = AsyncOpenAI(api_key="sk-xxx")
async def main() -> None:
- embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
+ embedding = await client.embeddings.create(
+ input='Hello How are you?',
+ model='text-embedding-ada-002'
+ )
print(embedding)
asyncio.run(main())
@@ -173,13 +178,17 @@ print(embeddings)
## Audio
-Coming soon
+
+:::info Coming soon
+:::
## How to reproduce
-1. Step 1: Dependencies installation
-```
+**Step 1:** Dependencies installation.
+
+```bash title="Install OpenAI"
pip install openai
```
-3. Step 2: Fill `index.py` file with code
-4. Step 3: Run the code with `python index.py`
-5. Step 5: Enjoy!
\ No newline at end of file
+
+**Step 2:** Fill `index.py` file with code.
+
+**Step 3:** Run the code with `python index.py`.
\ No newline at end of file
diff --git a/docs/docs/examples/palchat.md b/docs/docs/examples/palchat.md
index 2f6b7de6c..fd675eb81 100644
--- a/docs/docs/examples/palchat.md
+++ b/docs/docs/examples/palchat.md
@@ -1,5 +1,6 @@
---
title: Nitro with Pal Chat
+description: Nitro intergration guide for mobile device usage.
---
This guide demonstrates how to use Nitro with Pal Chat, enabling local AI chat capabilities on mobile devices.
@@ -15,7 +16,7 @@ Pal is a mobile app available on the App Store. It offers a customizable chat pl
**1. Start Nitro server**
Open your terminal:
-```
+```bash title="Run Nitro"
nitro
```
@@ -23,7 +24,7 @@ nitro
Use these commands to download and save the [Llama2 7B chat model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main):
-```bash
+```bash title="Get a model"
mkdir model && cd model
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
```
@@ -34,7 +35,7 @@ wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GG
To load the model, use the following command:
-```
+```bash title="Load model to the server"
curl http://localhost:3928/inferences/llamacpp/loadmodel \
-H 'Content-Type: application/json' \
-d '{
@@ -44,11 +45,13 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
}'
```
-**4. Config Pal Chat**
+**4. Configure Pal Chat**
+
+In the `OpenAI API Key` field, just type any random text (e.g. key-xxxxxx).
-Adjust the `provide custom host` setting under `advanced settings` in Pal Chat to connect with Nitro. Enter your LAN IPv4 address (It should be something like 192.xxx.x.xxx).
+Adjust the `provide custom host` setting under `advanced settings` in Pal Chat with your LAN IPv4 address (a series of numbers like 192.xxx.x.xxx).
-> For instruction read: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)
+> For instruction: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)

diff --git a/docs/docs/features/chat.md b/docs/docs/features/chat.md
index 36955b035..229fb8b0e 100644
--- a/docs/docs/features/chat.md
+++ b/docs/docs/features/chat.md
@@ -1,10 +1,11 @@
---
title: Chat Completion
+description: Inference engine for chat completion, the same as OpenAI's
---
The Chat Completion feature in Nitro provides a flexible way to interact with any local Large Language Model (LLM).
-## Single Request Example
+### Single Request Example
To send a single query to your chosen LLM, follow these steps:
diff --git a/docs/docs/features/cont-batch.md b/docs/docs/features/cont-batch.md
index 537c50dd4..65a5f950f 100644
--- a/docs/docs/features/cont-batch.md
+++ b/docs/docs/features/cont-batch.md
@@ -1,20 +1,19 @@
---
title: Continuous Batching
+description: Nitro's continuous batching combines multiple requests, enhancing throughput.
---
-## What is continous batching?
+Continuous batching boosts throughput and minimizes latency in large language model (LLM) inference. This technique groups multiple inference requests, significantly improving GPU utilization.
-Continuous batching is a powerful technique that significantly boosts throughput in large language model (LLM) inference while minimizing latency. This process dynamically groups multiple inference requests, allowing for more efficient GPU utilization.
+**Key Advantages:**
-## Why Continuous Batching?
+- Increased Throughput.
+- Reduced Latency.
+- Efficient GPU Use.
-Traditional static batching methods can lead to underutilization of GPU resources, as they wait for all sequences in a batch to complete before moving on. Continuous batching overcomes this by allowing new sequences to start processing as soon as others finish, ensuring more consistent and efficient GPU usage.
+**Implementation Insight:**
-## Benefits of Continuous Batching
-
-- **Increased Throughput:** Improvement over traditional batching methods.
-- **Reduced Latency:** Lower p50 latency, leading to faster response times.
-- **Efficient Resource Utilization:** Maximizes GPU memory and computational capabilities.
+To evaluate its effectiveness, compare continuous batching with traditional methods. For more details on benchmarking, refer to this [article](https://www.anyscale.com/blog/continuous-batching-llm-inference).
## How to use continous batching
Nitro's `continuous batching` feature allows you to combine multiple requests for the same model execution, enhancing throughput and efficiency.
@@ -30,8 +29,4 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
}'
```
-For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.
-
-### Benchmark and Compare
-
-To understand the impact of continuous batching on your system, perform benchmarks comparing it with traditional batching methods. This [article](https://www.anyscale.com/blog/continuous-batching-llm-inference) will help you quantify improvements in throughput and latency.
\ No newline at end of file
+For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.
\ No newline at end of file
diff --git a/docs/docs/features/embed.md b/docs/docs/features/embed.md
index 0925c6a6d..9e19cd125 100644
--- a/docs/docs/features/embed.md
+++ b/docs/docs/features/embed.md
@@ -1,10 +1,9 @@
---
title: Embedding
+description: Inference engine for embedding, the same as OpenAI's
---
-## What are embeddings?
-
-Embeddings are lists of numbers (floats). To find how similar two embeddings are, we measure the [distance](https://en.wikipedia.org/wiki/Cosine_similarity) between them. Shorter distances mean they're more similar; longer distances mean less similarity.
+Embeddings are lists of numbers (floats). To find how similar two embeddings are, we measure the [distance](https://en.wikipedia.org/wiki/Cosine_similarity) between them.
## Activating Embedding Feature
@@ -43,7 +42,7 @@ curl https://api.openai.com/v1/embeddings \
-## Embedding Reponse
+### Embedding Reponse
The example response used the output from model [llama2 Chat 7B Q5 (GGUF)](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main) loaded to Nitro server.
@@ -51,18 +50,14 @@ The example response used the output from model [llama2 Chat 7B Q5 (GGUF)](https
```js title="Nitro"
{
- "data": [
- {
- "embedding": [
- -0.9874749,
- 0.2965493,
- ...
- -0.253227
- ],
- "index": 0,
- "object": "embedding"
- }
- ]
+ "embedding": [
+ -0.9874749,
+ 0.2965493,
+ ...
+ -0.253227
+ ],
+ "index": 0,
+ "object": "embedding"
}
```
@@ -75,18 +70,14 @@ The example response used the output from model [llama2 Chat 7B Q5 (GGUF)](https
"embedding": [
0.0023064255,
-0.009327292,
- .... (1536 floats total for ada-002)
+ ....
-0.0028842222,
],
"index": 0,
"object": "embedding"
}
-
-
-
-
```
-The embedding feature in Nitro demonstrates a high level of compatibility with OpenAI, simplifying the transition between using OpenAI and local AI models. For more detailed information and advanced use cases, refer to the comprehensive [API Reference](https://nitro.jan.ai/api-reference).
+The embedding feature in Nitro demonstrates a high level of compatibility with OpenAI. For more detailed information and advanced use cases, refer to the comprehensive [API Reference](https://nitro.jan.ai/api-reference).
diff --git a/docs/docs/features/feat.md b/docs/docs/features/feat.md
index 334a0daa7..51a526331 100644
--- a/docs/docs/features/feat.md
+++ b/docs/docs/features/feat.md
@@ -1,5 +1,6 @@
---
title: Nitro Features
+description: What Nitro supports
---
Nitro enhances the `llama.cpp` research base, optimizing it for production environments with advanced features:
diff --git a/docs/docs/features/load-unload.md b/docs/docs/features/load-unload.md
index 536a13690..167e554b1 100644
--- a/docs/docs/features/load-unload.md
+++ b/docs/docs/features/load-unload.md
@@ -1,5 +1,6 @@
---
-title: Load and Unload models
+title: Load and Unload models
+description: Nitro loads and unloads local AI models (local LLMs).
---
## Load model
@@ -68,7 +69,8 @@ In case you got error while loading models. Please check for the correct model p
| `ngl` | Integer | The number of GPU layers to use. |
| `ctx_len` | Integer | The context length for the model operations. |
| `embedding` | Boolean | Whether to use embedding in the model. |
-| `n_parallel` | Integer | The number of parallel operations. Uses Drogon thread count if not set. |
+| `n_parallel` | Integer | The number of parallel operations.|
+|`cpu_threads`|Integer|The number of threads for CPU inference.|
| `cont_batching` | Boolean | Whether to use continuous batching. |
| `user_prompt` | String | The prompt to use for the user. |
| `ai_prompt` | String | The prompt to use for the AI assistant. |
diff --git a/docs/docs/features/multi-thread.md b/docs/docs/features/multi-thread.md
index 5ea4328ef..a2ba2583b 100644
--- a/docs/docs/features/multi-thread.md
+++ b/docs/docs/features/multi-thread.md
@@ -1,39 +1,26 @@
---
title: Multithreading
+description: Nitro utilizes multithreading to optimize hardware usage.
---
-## What is Multithreading?
+Multithreading in programming allows concurrent task execution, improving efficiency and responsiveness. It's key for optimizing hardware and application performance.
-Multithreading is a programming concept where a process executes multiple threads simultaneously, improving efficiency and performance. It allows concurrent execution of tasks, such as data processing or user interface updates. This technique is crucial for optimizing hardware usage and enhancing application responsiveness.
+**Effective multithreading offers:**
-## Drogon's Threading Model
+- Faster Performance.
+- Responsive IO.
+- Deadlock Prevention.
+- Resource Optimization.
+- Asynchronous Programming Support.
+- Scalability Enhancement.
-Nitro powered by Drogon, a high-speed C++ web application framework, utilizes a thread pool where each thread possesses its own event loop. These event loops are central to Drogon's functionality:
+For more information on threading, visit [Drogon's Documentation](https://github.com/drogonframework/drogon/wiki/ENG-FAQ-1-Understanding-drogon-threading-model).
-- **Main Loop**: Runs on the main thread, responsible for starting worker loops.
-- **Worker Loops**: Handle tasks and network events, ensuring efficient task execution without blocking.
-
-## Why it's important
-
-Understanding and effectively using multithreading in Drogon is crucial for several reasons:
-
-1. **Optimized Performance**: Multithreading enhances application efficiency by enabling simultaneous task execution for faster response times.
-
-2. **Non-blocking IO Operations**: Utilizing multiple threads prevents long-running tasks from blocking the entire application, ensuring high responsiveness.
-
-3. **Deadlock Avoidance**: Event loops and threads helps prevent deadlocks, ensuring smoother and uninterrupted application operation.
-
-4. **Effective Resource Utilization**: Distributing tasks across multiple threads leads to more efficient use of server resources, improving overall performance.
-
-5. **Async Programming**
-
-6. **Scalability**
-
-## Enabling More Threads on Nitro
+## Enabling Multi-Threads on Nitro
To increase the number of threads used by Nitro, use the following command syntax:
-```js
+```bash title="Nitro deploy server format"
nitro [thread_num] [host] [port]
```
@@ -42,11 +29,8 @@ nitro [thread_num] [host] [port]
- **port:** The port number where Nitro is to be deployed.
To launch Nitro with 4 threads, enter this command in the terminal:
-```js
+```bash title="Example"
nitro 4 127.0.0.1 5000
```
-> After enabling multithreading, monitor your system's performance. Adjust the `thread_num` as needed to optimize throughput and latency based on your workload.
-
-## Acknowledgements
-For more information on Drogon's threading, visit [Drogon's Documentation](https://github.com/drogonframework/drogon/wiki/ENG-FAQ-1-Understanding-drogon-threading-model).
\ No newline at end of file
+> After enabling multithreading, monitor your system's performance. Adjust the `thread_num` as needed to optimize throughput and latency based on your workload.
\ No newline at end of file
diff --git a/docs/docs/features/prompt.md b/docs/docs/features/prompt.md
index 0dbefc663..99418f8ac 100644
--- a/docs/docs/features/prompt.md
+++ b/docs/docs/features/prompt.md
@@ -1,5 +1,6 @@
---
title: Prompt Role Support
+description: Setting up Nitro prompts to build an AI assistant.
---
System, user, and assistant prompt is crucial for effectively utilizing the Large Language Model. These prompts work together to create a coherent and functional conversational flow.
diff --git a/docs/docs/features/warmup.md b/docs/docs/features/warmup.md
index d39c16193..b709cfd7f 100644
--- a/docs/docs/features/warmup.md
+++ b/docs/docs/features/warmup.md
@@ -1,18 +1,13 @@
---
-title: Warming Up Model
+title: Warming Up Model
+description: Nitro warms up the model to optimize delays.
---
-## What is Model Warming Up?
-
-Model warming up is the process of running pre-requests through a model to optimize its components for production use. This step is crucial for reducing initialization and optimization delays during the first few inference requests.
-
-## What are the Benefits?
-
-Warming up an AI model offers several key benefits:
-
-- **Enhanced Initial Performance:** Unlike in `llama.cpp`, where the first inference can be very slow, warming up reduces initial latency, ensuring quicker response times from the start.
-- **Consistent Response Times:** Especially beneficial for systems updating models frequently, like those with real-time training, to avoid performance lags with new snapshots.
+Model warming up involves pre-running requests through an AI model to fine-tune its components for production. This step minimizes delays during initial inferences, ensuring readiness for immediate use.
+**Key Advantages:**
+- Improved Initial Performance.
+- Stable Response Times.
## How to Enable Model Warming Up?
On the Nitro server, model warming up is automatically enabled whenever a new model is loaded. This means that the server handles the warm-up process behind the scenes, ensuring that the model is ready for efficient and effective performance from the first inference request.
diff --git a/docs/docs/new/about.md b/docs/docs/new/about.md
index ef6e32228..202336b1e 100644
--- a/docs/docs/new/about.md
+++ b/docs/docs/new/about.md
@@ -1,6 +1,7 @@
---
title: About Nitro
slug: /docs
+description: Efficient LLM inference engine for edge computing
---
Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.
@@ -12,12 +13,12 @@ Learn more on [GitHub](https://github.com/janhq/nitro).
- **Fast Inference:** Built on top of the cutting-edge inference library `llama.cpp`, modified to be production ready.
- **Lightweight:** Only 3MB, ideal for resource-sensitive environments.
- **Easily Embeddable:** Simple integration into existing applications, offering flexibility.
-- **Quick Setup:** Approximately 10-second initialization for swift deployment.
+- **Quick Setup:** Approximately 10-second initialization.
- **Enhanced Web Framework:** Incorporates `drogon cpp` to boost web service efficiency.
### OpenAI-compatible API
-One of the significant advantages of using Nitro is its compatibility with OpenAI's API structure. The command format for making inference calls with Nitro is very similar to that used with OpenAI's API. This similarity ensures a transition for users who are already familiar with OpenAI's system.
+Nitro's compatibility with OpenAI's API structure is a notable advantage. Its command format for inference calls closely mirrors that of OpenAI, facilitating an easy transition for users.
For instance, compare the Nitro inference call:
diff --git a/docs/docs/new/build-source.md b/docs/docs/new/build-source.md
index 819a141f1..62e4e55b2 100644
--- a/docs/docs/new/build-source.md
+++ b/docs/docs/new/build-source.md
@@ -1,6 +1,7 @@
---
title: Build From Source
slug: /build-source
+description: Install Nitro manually
---
This guide provides step-by-step instructions for building Nitro from source on Linux, macOS, and Windows systems.
diff --git a/docs/docs/new/faq.md b/docs/docs/new/faq.md
index 1f7c1541f..0bd25f1a8 100644
--- a/docs/docs/new/faq.md
+++ b/docs/docs/new/faq.md
@@ -1,6 +1,7 @@
---
title: FAQs
slug: /faq
+description: Frequently Asked Questions about Nitro
---
diff --git a/docs/docs/new/install.md b/docs/docs/new/install.md
index 4fd4ecff1..4b737c9dd 100644
--- a/docs/docs/new/install.md
+++ b/docs/docs/new/install.md
@@ -1,6 +1,7 @@
---
title: Installation
slug: /install
+description: How to install Nitro
---
# Nitro Installation Guide
diff --git a/docs/docs/new/quickstart.md b/docs/docs/new/quickstart.md
index 2ca248cd2..3f2ea0d21 100644
--- a/docs/docs/new/quickstart.md
+++ b/docs/docs/new/quickstart.md
@@ -1,33 +1,32 @@
---
title: Quickstart
slug: /quickstart
+description: How to use Nitro
---
## Step 1: Install Nitro
+Download and install Nitro on your system.
+
### For Linux and MacOS
-Open your terminal and enter the following command. This will download and install Nitro on your system.
```bash
curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash -
```
### For Windows
-Open PowerShell and execute the following command. This will perform the same actions as for Linux and MacOS but is tailored for Windows.
```bash
powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }"
```
-> **NOTE:**Installing Nitro will add new files and configurations to your system to enable it to run.
+> Installing Nitro will add new files and configurations to your system to enable it to run.
For a manual installation process, see: [Install from Source](install.md)
## Step 2: Downloading a Model
-Next, we need to download a model. For this example, we'll use the [Llama2 7B chat model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main).
-
-- Create a `/model` and navigate into it:
+For this example, we'll use the [Llama2 7B chat model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main).
```bash
mkdir model && cd model
@@ -36,8 +35,6 @@ wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GG
## Step 3: Run Nitro server
-To start using Nitro, you need to run its server.
-
```bash title="Run Nitro server"
nitro
```
@@ -50,7 +47,7 @@ curl http://localhost:3928/healthz
## Step 4: Load model
-To load the model to Nitro server, you need to run:
+To load the model to Nitro server, run:
```bash title="Load model"
curl http://localhost:3928/inferences/llamacpp/loadmodel \
@@ -64,9 +61,7 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
## Step 5: Making an Inference
-Finally, let's make an actual inference call using Nitro.
-
-- In your terminal, execute:
+Finally, let's chat with the model using Nitro.
```bash title="Nitro Inference"
curl http://localhost:3928/v1/chat/completions \
@@ -81,6 +76,4 @@ curl http://localhost:3928/v1/chat/completions \
}'
```
-This command sends a request to Nitro, asking it about the 2020 World Series winner.
-
-- As you can see, A key benefit of Nitro is its alignment with [OpenAI's API structure](https://platform.openai.com/docs/guides/text-generation?lang=curl). Its inference call syntax closely mirrors that of OpenAI's API, facilitating an easier shift for those accustomed to OpenAI's framework.
+As you can see, a key benefit of Nitro is its alignment with [OpenAI's API structure](https://platform.openai.com/docs/guides/text-generation?lang=curl). Its inference call syntax closely mirrors that of OpenAI's API, facilitating an easier shift for those accustomed to OpenAI's framework.
diff --git a/docs/docusaurus.config.js b/docs/docusaurus.config.js
index 291637bd3..3ef4a1b8f 100644
--- a/docs/docusaurus.config.js
+++ b/docs/docusaurus.config.js
@@ -124,6 +124,43 @@ const config = {
liveCodeBlock: {
playgroundPosition: "bottom",
},
+ metadata: [
+ { name: 'description', content: 'Nitro is a high-efficiency Large Language Model inference engine for edge computing.'},
+ { name: 'keywords', content: 'Nitro, OpenAI compatible, fast inference, local AI, llm, small AI, free, open source, production ready' },
+ { property: 'og:title', content: 'Embeddable AI | Nitro' },
+ { property: 'og:description', content: 'Nitro is a high-efficiency Large Language Model inference engine for edge computing.' },
+ { property: 'twitter:card', content: 'summary_large_image' },
+ { property: 'twitter:site', content: '@janhq_' },
+ { property: 'twitter:title', content: 'Embeddable AI | Nitro' },
+ { property: 'twitter:description', content: 'Nitro is a high-efficiency Large Language Model inference engine for edge computing.' },
+ ],
+ headTags: [
+ // Declare a preconnect tag
+ {
+ tagName: 'link',
+ attributes: {
+ rel: 'preconnect',
+ href: 'https://nitro.jan.ai/',
+ },
+ },
+ // Declare some json-ld structured data
+ {
+ tagName: 'script',
+ attributes: {
+ type: 'application/ld+json',
+ },
+ innerHTML: JSON.stringify({
+ '@context': 'https://schema.org/',
+ '@type': 'LLMInference',
+ name: 'Nitro',
+ description: "Nitro is a high-efficiency Large Language Model inference engine for edge computing.",
+ keywords: "Nitro, OpenAI compatible, fast inference, local AI, llm, small AI, free, open source, production ready",
+ applicationCategory: "BusinessApplication",
+ operatingSystem: "Multiple",
+ url: 'https://nitro.jan.ai/',
+ }),
+ },
+ ],
navbar: {
title: "Nitro",
logo: {
diff --git a/docs/openapi/NitroAPI.yaml b/docs/openapi/NitroAPI.yaml
index 595dbd1f6..51a932423 100644
--- a/docs/openapi/NitroAPI.yaml
+++ b/docs/openapi/NitroAPI.yaml
@@ -235,6 +235,11 @@ components:
example: 4
nullable: true
description: The number of parallel operations. Only set when enable continuous batching.
+ cpu_threads:
+ type: integer
+ example: 4
+ nullable: true
+ description: The number of threads for CPU-based inference.
pre_prompt:
type: string
default: A chat between a curious user and an artificial intelligence assistant. The assistant follows the given rules no matter what.
@@ -255,7 +260,6 @@ components:
default: "ASSISTANT:"
nullable: true
description: The prefix for assistant prompt.
-
required:
- llama_model_path
diff --git a/docs/static/robots.txt b/docs/static/robots.txt
new file mode 100644
index 000000000..6f27bb66a
--- /dev/null
+++ b/docs/static/robots.txt
@@ -0,0 +1,2 @@
+User-agent: *
+Disallow:
\ No newline at end of file