Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d96b995
add robots.txt + sitemaps
hahuyhoang411 Nov 24, 2023
564c713
add plugin sitemap
hahuyhoang411 Nov 24, 2023
60351e5
fix plugin sitemap
hahuyhoang411 Nov 24, 2023
7761be2
fix plugin sitemap version
hahuyhoang411 Nov 24, 2023
76f4a5a
update the plugin sitemap
hahuyhoang411 Nov 24, 2023
eb9196b
Merge branch 'main' into update-seo
hahuyhoang411 Nov 24, 2023
a98a2c9
update pal chat instruct
hahuyhoang411 Nov 27, 2023
76a6572
Merge branch 'main' into update-seo
hahuyhoang411 Nov 27, 2023
a665d3f
draft SEO content
hahuyhoang411 Nov 27, 2023
7e2e6a0
update jsonld for SEO
hahuyhoang411 Nov 27, 2023
f9f11c5
update description for pages on Nitro
hahuyhoang411 Nov 27, 2023
1c04018
fix the null description
hahuyhoang411 Nov 27, 2023
0de3d0f
Merge branch 'update-seo' into docs-fix-27-11
hahuyhoang411 Nov 27, 2023
5135656
add cpu_threads api + docs
hahuyhoang411 Nov 27, 2023
693ba46
fix table views in openai-python
hahuyhoang411 Nov 27, 2023
925a482
fix typo in Palchat
hahuyhoang411 Nov 27, 2023
5b432e8
succinct voice tone for features
hahuyhoang411 Nov 27, 2023
8245cb1
update the quickstart
hahuyhoang411 Nov 27, 2023
e6c5290
update the about
hahuyhoang411 Nov 27, 2023
ce8b162
update the embeddings
hahuyhoang411 Nov 27, 2023
d91670f
multithread update docs
hahuyhoang411 Nov 27, 2023
5f2d7c3
change palchat
hahuyhoang411 Nov 27, 2023
03e7a46
update openai-py
hahuyhoang411 Nov 27, 2023
0d5c546
update openai-node
hahuyhoang411 Nov 27, 2023
70bde1a
update openai-node docs
hahuyhoang411 Nov 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 0 additions & 24 deletions docs/docs/demos/chatbox-vid.mdx

This file was deleted.

63 changes: 0 additions & 63 deletions docs/docs/examples/chatbox.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/docs/examples/jan.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Nitro with Jan
description: Nitro integrates with Jan to enable a ChatGPT-like functional app, optimized for local AI.
---

You can effortlessly utilize Nitro through [Jan](https://jan.ai/), as it is fully integrated with all its functions. With Jan, using Nitro becomes straightforward without the need for any coding.
Expand Down
29 changes: 19 additions & 10 deletions docs/docs/examples/openai-node.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
---
title: Nitro with openai-node
description: Nitro intergration guide for Node.js.
---

You can migrate from OAI API or Azure OpenAI to Nitro using your existing NodeJS code quickly
> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
- NodeJS OpenAI SDK: https://www.npmjs.com/package/openai

## Chat Completion
Expand Down Expand Up @@ -240,17 +241,23 @@ embedding();
</table>

## Audio
Coming soon

:::info Coming soon
:::

## How to reproduce
1. Step 1: Dependencies installation
```

**Step 1:** Dependencies installation

```bash
npm install --save openai typescript
# or
yarn add openai
```
2. Step 2: Fill `tsconfig.json`
```json

**Step 2:** Fill `tsconfig.json`

```js
{
"compilerOptions": {
"moduleResolution": "node",
Expand All @@ -263,7 +270,9 @@ yarn add openai
"lib": ["es2015"]
}
```
3. Step 3: Fill `index.ts` file with code
3. Step 4: Build with `npx tsc`
4. Step 5: Run the code with `node dist/index.js`
5. Step 6: Enjoy!

**Step 3:** Fill `index.ts` file with code.

**Step 4:** Build with `npx tsc`.

**Step 5:** Run the code with `node dist/index.js`.
47 changes: 28 additions & 19 deletions docs/docs/examples/openai-python.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
title: Nitro with openai-python
description: Nitro intergration guide for Python.
---


You can migrate from OAI API or Azure OpenAI to Nitro using your existing Python code quickly
> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
- Python OpenAI SDK: https://pypi.org/project/openai/

## Chat Completion
Expand All @@ -22,7 +23,10 @@ import asyncio
from openai import AsyncOpenAI

# gets API Key from environment variable OPENAI_API_KEY
client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
client = AsyncOpenAI(
base_url="http://localhost:3928/v1/",
api_key="sk-xxx"
)


async def main() -> None:
Expand Down Expand Up @@ -74,22 +78,16 @@ asyncio.run(main())
```python
from openai import AzureOpenAI

openai.api_key = '...' # Default is environment variable AZURE_OPENAI_API_KEY
openai.api_key = '...' # Default is AZURE_OPENAI_API_KEY

stream = AzureOpenAI(
api_version=api_version,
# https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource
azure_endpoint="https://example-endpoint.openai.azure.com",
)

completion = client.chat.completions.create(
model="deployment-name", # e.g. gpt-35-instant
messages=[
{
"role": "user",
"content": "How do I output all files in a directory using Python?",
},
],
messages=[{"role": "user", "content": "Say this is a test"}],
stream=True,
)
for part in stream:
Expand All @@ -115,11 +113,15 @@ import asyncio
from openai import AsyncOpenAI

# gets API Key from environment variable OPENAI_API_KEY
client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
client = AsyncOpenAI(base_url="http://localhost:3928/v1/",
api_key="sk-xxx")


async def main() -> None:
embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
embedding = await client.embeddings.create(
input='Hello How are you?',
model='text-embedding-ada-002'
)
print(embedding)

asyncio.run(main())
Expand All @@ -140,7 +142,10 @@ client = AsyncOpenAI(api_key="sk-xxx")


async def main() -> None:
embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
embedding = await client.embeddings.create(
input='Hello How are you?',
model='text-embedding-ada-002'
)
print(embedding)

asyncio.run(main())
Expand Down Expand Up @@ -173,13 +178,17 @@ print(embeddings)
</table>

## Audio
Coming soon

:::info Coming soon
:::

## How to reproduce
1. Step 1: Dependencies installation
```
**Step 1:** Dependencies installation.

```bash title="Install OpenAI"
pip install openai
```
3. Step 2: Fill `index.py` file with code
4. Step 3: Run the code with `python index.py`
5. Step 5: Enjoy!

**Step 2:** Fill `index.py` file with code.

**Step 3:** Run the code with `python index.py`.
15 changes: 9 additions & 6 deletions docs/docs/examples/palchat.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Nitro with Pal Chat
description: Nitro intergration guide for mobile device usage.
---

This guide demonstrates how to use Nitro with Pal Chat, enabling local AI chat capabilities on mobile devices.
Expand All @@ -15,15 +16,15 @@ Pal is a mobile app available on the App Store. It offers a customizable chat pl
**1. Start Nitro server**

Open your terminal:
```
```bash title="Run Nitro"
nitro
```

**2. Download Model**

Use these commands to download and save the [Llama2 7B chat model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main):

```bash
```bash title="Get a model"
mkdir model && cd model
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
```
Expand All @@ -34,7 +35,7 @@ wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GG

To load the model, use the following command:

```
```bash title="Load model to the server"
curl http://localhost:3928/inferences/llamacpp/loadmodel \
-H 'Content-Type: application/json' \
-d '{
Expand All @@ -44,11 +45,13 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
}'
```

**4. Config Pal Chat**
**4. Configure Pal Chat**

In the `OpenAI API Key` field, just type any random text (e.g. key-xxxxxx).

Adjust the `provide custom host` setting under `advanced settings` in Pal Chat to connect with Nitro. Enter your LAN IPv4 address (It should be something like 192.xxx.x.xxx).
Adjust the `provide custom host` setting under `advanced settings` in Pal Chat with your LAN IPv4 address (a series of numbers like 192.xxx.x.xxx).

> For instruction read: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)
> For instruction: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)

![PalChat](img/pal.png)

Expand Down
3 changes: 2 additions & 1 deletion docs/docs/features/chat.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
---
title: Chat Completion
description: Inference engine for chat completion, the same as OpenAI's
---

The Chat Completion feature in Nitro provides a flexible way to interact with any local Large Language Model (LLM).

## Single Request Example
### Single Request Example

To send a single query to your chosen LLM, follow these steps:

Expand Down
23 changes: 9 additions & 14 deletions docs/docs/features/cont-batch.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,19 @@
---
title: Continuous Batching
description: Nitro's continuous batching combines multiple requests, enhancing throughput.
---

## What is continous batching?
Continuous batching boosts throughput and minimizes latency in large language model (LLM) inference. This technique groups multiple inference requests, significantly improving GPU utilization.

Continuous batching is a powerful technique that significantly boosts throughput in large language model (LLM) inference while minimizing latency. This process dynamically groups multiple inference requests, allowing for more efficient GPU utilization.
**Key Advantages:**

## Why Continuous Batching?
- Increased Throughput.
- Reduced Latency.
- Efficient GPU Use.

Traditional static batching methods can lead to underutilization of GPU resources, as they wait for all sequences in a batch to complete before moving on. Continuous batching overcomes this by allowing new sequences to start processing as soon as others finish, ensuring more consistent and efficient GPU usage.
**Implementation Insight:**

## Benefits of Continuous Batching

- **Increased Throughput:** Improvement over traditional batching methods.
- **Reduced Latency:** Lower p50 latency, leading to faster response times.
- **Efficient Resource Utilization:** Maximizes GPU memory and computational capabilities.
To evaluate its effectiveness, compare continuous batching with traditional methods. For more details on benchmarking, refer to this [article](https://www.anyscale.com/blog/continuous-batching-llm-inference).

## How to use continous batching
Nitro's `continuous batching` feature allows you to combine multiple requests for the same model execution, enhancing throughput and efficiency.
Expand All @@ -30,8 +29,4 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
}'
```

For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.

### Benchmark and Compare

To understand the impact of continuous batching on your system, perform benchmarks comparing it with traditional batching methods. This [article](https://www.anyscale.com/blog/continuous-batching-llm-inference) will help you quantify improvements in throughput and latency.
For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.
Loading