Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

QA: release 1.0.11 #2134

@vansangpfiev

Description

@vansangpfiev

QA details:

Version: v1.0.x-xxx

OS (select one)

  • Windows 11 (online & offline)
  • Ubuntu 24, 22 (online & offline)
  • Mac Silicon OS 14/15 (online & offline)
  • Mac Intel (online & offline)

1. Manual QA (CLI)

Installation

  • it should install with local installer (default; no internet required during installation, all dependencies bundled)
  • it should install with network installer
  • it should install 2 binaries (cortex and cortex-server) [mac: binaries in /usr/local/bin]
  • it should install with correct folder permissions
  • it should install with folders: /engines /logs (no /models folder until model pull)
  • It should install with Docker image https://cortex.so/docs/installation/docker/

Data/Folder structures

  • cortex.so models are stored in cortex.so/model_name/variants/, with .gguf and model.yml file
  • huggingface models are stored huggingface.co/author/model_name with .gguf and model.yml file
  • downloaded models are saved in cortex.db with the right fields: model, author_repo_id, branch_name, path_to_model_yaml (view via SQL)

Cortex Update

  • cortex -v should check output current version and check for updates
  • cortex update replaces the app, installer, uninstaller and binary file (without installing cortex.llamacpp)
  • cortex update should update from ~3-5 versions ago to latest (+3 to 5 bump)
  • cortex update should update from the previous version to latest (+1 bump)
  • cortex update -v 1.x.x-xxx should update from the previous version to specified version
  • cortex update should update from previous stable version to latest
  • it should gracefully update when server is actively running

Overall / App Shell

  • cortex returns helpful text in a timely* way (< 5s)
  • cortex or cortex -h displays help commands
  • CLI commands should start the API server, if not running [except
  • it should correctly log to cortex-cli.log and cortex.log
  • There should be no stdout from inactive shell session

Engines

  • llama.cpp should be installed by default
  • it should run gguf models on llamacpp
  • it should list engines
  • it should get engines
  • it should install engines (latest version if not specified)
  • it should install engines (with specified variant and version)
  • it should get default engine
  • it should set default engine (with specified variant/version)
  • it should load engine
  • it should unload engine
  • it should update engine (to latest version)
  • it should update engine (to specified version)
  • it should uninstall engines
  • it should gracefully continue engine installation if interrupted halfway (partial download)
  • it should gracefully handle when users try to CRUD incompatible engines (No variant found for xxx)
  • it should run trtllm models on trt-llm [WIP, not tested]
  • it shoud handle engine variants [WIP, not tested]
  • it should update engines versions [WIP, not tested]

Server

  • cortex start should start server and output localhost URL & port number
  • users can access API Swagger documentation page at localhost URL & port number
  • cortex start can be configured with parameters (port, logLevel [WIP]) https://cortex.so/docs/cli/start/
  • it should correctly log to cortex logs (logs/cortex.log, logs/cortex-cli.log)
  • cortex ps should return server status and running models (or no model loaded)
  • cortex stop should stop server

Model Pulling

  • Pulling a model should pull .gguf and model.yml file
  • Model download progress should appear as download bars for each file
  • Model download progress should be accurate (%, total time, download size, speed)

cortex.so

  • it should pull by built in model_ID
  • pull by model_ID should recommend default variant at the top (set in HF model.yml)
  • it should pull by built-in model_id:variant

huggingface.co

  • it should pull by HF repo/model ID
  • it should pull by full HF url (ending in .gguf)

Interrupted Download

  • it should allow user to interrupt / stop download
  • pulling again after interruption should accurately calculates remainder of model file size neeed to be downloaded (Found unfinished download! Additional XGB needs to be downloaded)
  • it should allow to continue downloading the remainder after interruption

Model Management

  • it should list downloaded models
  • it should get a local model
  • it should update model parameters in model.yaml
  • it should delete a model
  • it should import models with model_id and model_path

Model Running

  • cortex run <cortexso model> - if no local models detected, shows pull model menu
  • cortex run - if local model detected, runs the local model
  • cortex run - if multiple local models detected, shows list of local models (from multiple model sources eg cortexso, HF authors) for users to select (via regex search)
  • cortex run <invalid model id> should return gracefully Model not found!
  • run should autostart server
  • cortex run <model> starts interactive chat (by default)
  • cortex run <model> -d runs in detached mode
  • cortex models start <model>
  • terminate StdIn or exit() should exit interactive chat

Hardware Detection / Acceleration [WIP, no need to QA]

  • it should auto offload max ngl
  • it should correctly detect available GPUs
  • it should gracefully detect missing dependencies/drivers
    CPU Extension (e.g. AVX-2, noAVX, AVX-512)
    GPU Acceleration (e.g. CUDA11, CUDA12, Vulkan, sycl, etc)

Uninstallation / Reinstallation

  • it should uninstall 2 binaries (cortex and cortex-server)
  • it should uninstall with 2 options to delete or not delete data folder
  • it should gracefully uninstall when server is still running
  • uninstalling should not leave any dangling files
  • uninstalling should not leave any dangling processes
  • it should reinstall without having conflict issues with existing cortex data folders

--

2. API QA

Checklist for each endpoint

  • Upon cortex start, API page is displayed at localhost:port endpoint
  • Endpoints should support the parameters stated in API reference (towards OpenAI Compatibility)
  • https://cortex.so/api-reference is updated

Endpoints

Chat Completions

Engines

  • List engines: GET /v1/engines
  • Get engine: GET /v1/engines/{name}
  • Install engine: POST /v1/engines/install/{name}
  • Get default engine variant/version: GET v1/engines/{name}/default
  • Set default engine variant/version: POST v1/engines/{name}/default
  • Load engine: POST v1/engines/{name}/load
  • Unload engine: DELETE v1/engines/{name}/load
  • Update engine: POST v1/engines/{name}/update
  • uninstall engine: DELETE /v1/engines/install/{name}

Pulling Models

  • Pull model: POST /v1/models/pull starts download (websockets)
  • Pull model: websockets /events emitted
  • Stop model download: DELETE /v1/models/pull (websockets)
  • Stop model download: websockets /events stopped
  • Import model: POST v1/models/import

Running Models

  • List models: GET v1/models
  • Start model: POST /v1/models/start
  • Stop model: POST /v1/models/stop
  • Get model: GET /v1/models/{id}
  • Delete model: DELETE /v1/models/{id}
  • Update model: PATCH /v1/models/{model} updates model.yaml params

Server

  • CORs [WIP]
  • health: GET /healthz
  • terminate server: DELETE /processManager/destroy

Test list for reference:

Metadata

Metadata

Assignees

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions