Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 178 additions & 57 deletions docs/docs/engines/engine-extension.mdx
Original file line number Diff line number Diff line change
@@ -1,89 +1,210 @@
---
title: Building Engine Extensions
title: Adding a Third-Party Engine to Cortex
description: Cortex supports Engine Extensions to integrate both :ocal inference engines, and Remote APIs.
---

:::info
🚧 Cortex is currently under development, and this page is a stub for future development.
:::

<!--
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

:::warning
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
:::

# Guide to Adding a Third-Party Engine to Cortex

## Introduction

This guide outlines the steps to integrate a custom engine with Cortex. We hope this helps developers understand the integration process.

## Implementation Steps

### 1. Implement the Engine Interface

First, create an engine that implements the `EngineI.h` interface. Here's the interface definition:

```cpp
class EngineI {
public:
struct RegisterLibraryOption {
std::vector<std::filesystem::path> paths;
};

struct EngineLoadOption {
// engine
std::filesystem::path engine_path;
std::filesystem::path cuda_path;
bool custom_engine_path;

// logging
std::filesystem::path log_path;
int max_log_lines;
trantor::Logger::LogLevel log_level;
};

struct EngineUnloadOption {
bool unload_dll;
};

virtual ~EngineI() {}

This document provides a step-by-step guide to adding a new engine to the Cortex codebase, similar to the `OpenAIEngineExtension`.
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;

virtual void Load(EngineLoadOption opts) = 0;

## Integrate a New Remote Engine
virtual void Unload(EngineUnloadOption opts) = 0;

### Step 1: Create the New Engine Extension
// Cortex.llamacpp interface methods
virtual void HandleChatCompletion(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

1. Navigate to the `cortex-js/src/extensions` directory.
2. Create a new file named `<new-engine>.engine.ts` (replace `<new-engine>` with the name of your engine).
3. Implement your new engine extension class using the following template:
virtual void HandleEmbedding(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

```typescript
class <NewEngine>EngineExtension extends OAIEngineExtension {
apiUrl = 'https://api.<new-engine>.com/v1/chat/completions';
name = '<new-engine>';
productName = '<New Engine> Inference Engine';
description = 'This extension enables <New Engine> chat completion API calls';
version = '0.0.1';
apiKey?: string;
}
virtual void LoadModel(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

virtual void UnloadModel(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

virtual void GetModelStatus(
std::shared_ptr<Json::Value> json_body,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

// Compatibility and model management
virtual bool IsSupported(const std::string& f) = 0;

virtual void GetModels(
std::shared_ptr<Json::Value> jsonBody,
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;

// Logging configuration
virtual bool SetFileLogger(int max_log_lines,
const std::string& log_path) = 0;
virtual void SetLogLevel(trantor::Logger::LogLevel logLevel) = 0;
};
```

:::info
Be sure to replace all placeholders with the appropriate values for your engine.
:::
#### Lifecycle Management

##### RegisterLibraryPath

```cpp
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;
```

This method is called during engine initialization to set up dynamic library search paths. For example, in Linux, we still have to use `LD_LIBRARY_PATH` to add CUDA dependencies to the search path.

**Parameters:**

- `opts.paths`: Vector of filesystem paths that the engine should register

### Step 2: Register the New Engine
**Implementation Requirements:**

1. Open the `extensions.module.ts` located at `cortex-js/src/extensions/`.
- Register provided paths for dynamic library loading
- Handle invalid paths gracefully
- Thread-safe implementation
- No exceptions should escape the method

2. Register your new engine in the provider array using the following code:
##### Load

```typescript
[
new OpenAIEngineExtension(httpService, configUsecases, eventEmitter),
//... other remote engines
new <NewEngine>EngineExtension(httpService, configUsecases, eventEmitter),
]
```cpp
virtual void Load(EngineLoadOption opts) = 0;
```

## Explanation of Key Properties and Methods
| **Value** | **Description** |
|------------------------------------|--------------------------------------------------------------------------------------------------|
| `apiUrl` | This is the URL endpoint for the new engine's API. It is used to make chat completion requests. |
| `name` | This is a unique identifier for the engine. It is used internally to reference the engine. |
| `productName` | This is a human-readable name for the engine. It is used for display purposes. |
| `description` | This provides a brief description of what the engine does. It is used for documentation and display purposes. |
| `version` | This indicates the version of the engine extension. It is used for version control and display purposes. |
| `eventEmmitter.on('config.updated')` | This is an event listener that listens for configuration updates. When the configuration for the engine is updated, this listener updates the `apiKey` and the engine's status. |
| `onLoad` | This method is called when the engine extension is loaded. It retrieves the engine's configuration (such as the `apiKey`) and sets the engine's status based on whether the `apiKey` is available. |
Initializes the engine with the provided configuration options.

## Advanced: Transforming Payloads and Responses
**Parameters:**

Some engines require custom transformations for the payload sent to the API and the response received from the API. This is achieved using the `transformPayload` and `transformResponse` methods. These methods allow you to modify the data structure to match the specific requirements of the engine.
- `engine_path`: Base path for engine files
- `cuda_path`: Path to CUDA installation
- `custom_engine_path`: Flag for using custom engine location
- `log_path`: Location for log files
- `max_log_lines`: Maximum number of lines per log file
- `log_level`: Logging verbosity level

### `transformPayload`
**Implementation Requirements:**

- Validate all paths before use
- Initialize engine components
- Set up logging configuration
- Handle missing dependencies gracefully
- Clean initialization state in case of failures

##### Unload

```cpp
virtual void Unload(EngineUnloadOption opts) = 0;
```

Performs cleanup and shutdown of the engine.

**Parameters:**

- `unload_dll`: Boolean flag indicating whether to unload dynamic libraries

**Implementation Requirements:**

- Clean up all allocated resources
- Close file handles and connections
- Release memory
- Ensure proper shutdown of running models
- Handle cleanup in a thread-safe manner

### 2. Create a Dynamic Library

We recommend using the [dylib library](https://github.com/martin-olivier/dylib) to build your dynamic library. This library provides helpful tools for creating cross-platform dynamic libraries.

### 3. Package Dependencies

Please ensure all dependencies are included with your dynamic library. This allows us to create a single, self-contained package for distribution.

### 4. Publication and Integration

#### 4.1 Publishing Your Engine (Optional)

If you wish to make your engine publicly available, you can publish it through GitHub. For reference, examine the [cortex.llamacpp releases](https://github.com/janhq/cortex.llamacpp/releases) structure:

- Each release tag should represent your version
- Include all variants within the same release
- Cortex will automatically select the most suitable variant or allow users to specify their preferred variant

#### 4.2 Integration with Cortex

Once your engine is ready, we encourage you to:

1. Notify the Cortex team about your engine for potential inclusion in our default supported engines list
2. Allow us to help test and validate your implementation

### 5. Local Testing Guide

To test your engine locally:

1. Create a directory structure following this hierarchy:

```bash
engines/
└── cortex.llamacpp/
└── mac-arm64/
└── v0.1.40/
├── libengine.dylib
└── version.txt
```

The `transformPayload` method is used to transform the data before sending it to the engine's API. This method takes the original payload and modifies it as needed.
1. Configure your engine:

**Example: Anthropic Engine**
- Edit the `~/.cortexrc` file to register your engine name
- Add your model with the appropriate engine field in `model.yaml`

In the Anthropic Engine, the `transformPayload` method extracts the system message and other messages, and includes additional parameters like `model`, `stream`, and `max_tokens`.
2. Testing:
- Start the engine
- Load your model
- Verify functionality

### `transformResponse`
## Future Development

The `transformResponse` method is used to transform the data received from the engine's API. This method processes the response and converts it into a format that the application can use.
We're currently working on expanding support for additional release sources to make distribution more flexible.

**Example: Anthropic Engine**
## Contributing

In the Anthropic Engine, the `transformResponse` method handles both stream and non-stream responses. It processes the response data and converts it into a standardized format.
-->
We welcome suggestions and contributions to improve this integration process. Please feel free to submit issues or pull requests through our repository.
22 changes: 6 additions & 16 deletions engine/cli/commands/server_start_cmd.cc
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
#include "server_start_cmd.h"
#include "commands/cortex_upd_cmd.h"
#include "services/engine_service.h"
#include "utils/cortex_utils.h"
#include "utils/engine_constants.h"
#include "utils/file_manager_utils.h"

#if defined(_WIN32) || defined(_WIN64)
#include "utils/widechar_conv.h"
#endif

namespace commands {

Expand Down Expand Up @@ -108,22 +111,9 @@ bool ServerStartCmd::Exec(const std::string& host, int port,
std::cerr << "Could not start server: " << std::endl;
return false;
} else if (pid == 0) {
// No need to configure LD_LIBRARY_PATH for macOS
#if !defined(__APPLE__) || !defined(__MACH__)
const char* name = "LD_LIBRARY_PATH";
auto data = getenv(name);
std::string v;
if (auto g = getenv(name); g) {
v += g;
}
CTL_INF("LD_LIBRARY_PATH: " << v);
auto llamacpp_path = file_manager_utils::GetCudaToolkitPath(kLlamaRepo);
auto trt_path = file_manager_utils::GetCudaToolkitPath(kTrtLlmRepo);
// Some engines requires to add lib search path before process being created
EngineService().RegisterEngineLibPath();

auto new_v = trt_path.string() + ":" + llamacpp_path.string() + ":" + v;
setenv(name, new_v.c_str(), true);
CTL_INF("LD_LIBRARY_PATH: " << getenv(name));
#endif
std::string p = cortex_utils::GetCurrentPath() + "/" + exe;
execl(p.c_str(), exe.c_str(), "--start-server", "--config_file_path",
get_config_file_path().c_str(), "--data_folder_path",
Expand Down
5 changes: 2 additions & 3 deletions engine/controllers/engines.cc
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,9 @@ std::string NormalizeEngine(const std::string& engine) {
void Engines::ListEngine(
const HttpRequestPtr& req,
std::function<void(const HttpResponsePtr&)>&& callback) const {
std::vector<std::string> supported_engines{kLlamaEngine, kOnnxEngine,
kTrtLlmEngine};
Json::Value ret;
for (const auto& engine : supported_engines) {
auto engine_names = engine_service_->GetSupportedEngineNames().value();
for (const auto& engine : engine_names) {
auto installed_engines =
engine_service_->GetInstalledEngineVariants(engine);
if (installed_engines.has_error()) {
Expand Down
30 changes: 30 additions & 0 deletions engine/cortex-common/EngineI.h
Original file line number Diff line number Diff line change
@@ -1,14 +1,44 @@
#pragma once

#include <filesystem>
#include <functional>
#include <memory>

#include "json/value.h"
#include "trantor/utils/Logger.h"
class EngineI {
public:
struct RegisterLibraryOption {
std::vector<std::filesystem::path> paths;
};

struct EngineLoadOption {
// engine
std::filesystem::path engine_path;
std::filesystem::path cuda_path;
bool custom_engine_path;

// logging
std::filesystem::path log_path;
int max_log_lines;
trantor::Logger::LogLevel log_level;
};

struct EngineUnloadOption {
bool unload_dll;
};

virtual ~EngineI() {}

/**
* Being called before starting process to register dependencies search paths.
*/
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;

virtual void Load(EngineLoadOption opts) = 0;

virtual void Unload(EngineUnloadOption opts) = 0;

// cortex.llamacpp interface
virtual void HandleChatCompletion(
std::shared_ptr<Json::Value> json_body,
Expand Down
Loading
Loading