This repository was archived by the owner on Jul 4, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 181
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
QA: release 1.0.11 #2134
Copy link
Copy link
Closed
Labels
type: QA checklistQA checklistQA checklist
Description
QA details:
Version: v1.0.x-xxx
OS (select one)
- Windows 11 (online & offline)
- Ubuntu 24, 22 (online & offline)
- Mac Silicon OS 14/15 (online & offline)
- Mac Intel (online & offline)
1. Manual QA (CLI)
Installation
- it should install with local installer (default; no internet required during installation, all dependencies bundled)
- it should install with network installer
- it should install 2 binaries (cortex and cortex-server) [mac: binaries in
/usr/local/bin] - it should install with correct folder permissions
- it should install with folders: /engines /logs (no /models folder until model pull)
- It should install with Docker image https://cortex.so/docs/installation/docker/
Data/Folder structures
- cortex.so models are stored in
cortex.so/model_name/variants/, with .gguf and model.yml file - huggingface models are stored
huggingface.co/author/model_namewith .gguf and model.yml file - downloaded models are saved in cortex.db with the right fields:
model,author_repo_id,branch_name,path_to_model_yaml(view via SQL)
Cortex Update
- cortex -v should check output current version and check for updates
- cortex update replaces the app, installer, uninstaller and binary file (without installing cortex.llamacpp)
-
cortex updateshould update from ~3-5 versions ago to latest (+3 to 5 bump) -
cortex updateshould update from the previous version to latest (+1 bump) -
cortex update -v 1.x.x-xxxshould update from the previous version to specified version -
cortex updateshould update from previous stable version to latest - it should gracefully update when server is actively running
Overall / App Shell
- cortex returns helpful text in a timely* way (< 5s)
-
cortexorcortex -hdisplays help commands - CLI commands should start the API server, if not running [except
- it should correctly log to cortex-cli.log and cortex.log
- There should be no stdout from inactive shell session
Engines
- llama.cpp should be installed by default
- it should run gguf models on llamacpp
- it should list engines
- it should get engines
- it should install engines (latest version if not specified)
- it should install engines (with specified variant and version)
- it should get default engine
- it should set default engine (with specified variant/version)
- it should load engine
- it should unload engine
- it should update engine (to latest version)
- it should update engine (to specified version)
- it should uninstall engines
- it should gracefully continue engine installation if interrupted halfway (partial download)
- it should gracefully handle when users try to CRUD incompatible engines (No variant found for xxx)
- it should run trtllm models on trt-llm [WIP, not tested]
- it shoud handle engine variants [WIP, not tested]
- it should update engines versions [WIP, not tested]
Server
-
cortex startshould start server and output localhost URL & port number - users can access API Swagger documentation page at localhost URL & port number
-
cortex startcan be configured with parameters (port, logLevel [WIP]) https://cortex.so/docs/cli/start/ - it should correctly log to cortex logs (logs/cortex.log, logs/cortex-cli.log)
-
cortex psshould return server status and running models (or no model loaded) -
cortex stopshould stop server
Model Pulling
- Pulling a model should pull .gguf and model.yml file
- Model download progress should appear as download bars for each file
- Model download progress should be accurate (%, total time, download size, speed)
cortex.so
- it should pull by built in model_ID
- pull by model_ID should recommend default variant at the top (set in HF model.yml)
- it should pull by built-in model_id:variant
huggingface.co
- it should pull by HF repo/model ID
- it should pull by full HF url (ending in .gguf)
Interrupted Download
- it should allow user to interrupt / stop download
- pulling again after interruption should accurately calculates remainder of model file size neeed to be downloaded (
Found unfinished download! Additional XGB needs to be downloaded) - it should allow to continue downloading the remainder after interruption
Model Management
- it should list downloaded models
- it should get a local model
- it should update model parameters in model.yaml
- it should delete a model
- it should import models with model_id and model_path
Model Running
-
cortex run <cortexso model>- if no local models detected, showspullmodel menu -
cortex run- if local model detected, runs the local model -
cortex run- if multiple local models detected, shows list of local models (from multiple model sources eg cortexso, HF authors) for users to select (via regex search) -
cortex run <invalid model id>should return gracefullyModel not found! - run should autostart server
-
cortex run <model>starts interactive chat (by default) -
cortex run <model> -druns in detached mode -
cortex models start <model> - terminate StdIn or
exit()should exit interactive chat
Hardware Detection / Acceleration [WIP, no need to QA]
- it should auto offload max ngl
- it should correctly detect available GPUs
- it should gracefully detect missing dependencies/drivers
CPU Extension (e.g. AVX-2, noAVX, AVX-512)
GPU Acceleration (e.g. CUDA11, CUDA12, Vulkan, sycl, etc)
Uninstallation / Reinstallation
- it should uninstall 2 binaries (cortex and cortex-server)
- it should uninstall with 2 options to delete or not delete data folder
- it should gracefully uninstall when server is still running
- uninstalling should not leave any dangling files
- uninstalling should not leave any dangling processes
- it should reinstall without having conflict issues with existing cortex data folders
--
2. API QA
Checklist for each endpoint
- Upon
cortex start, API page is displayed at localhost:port endpoint - Endpoints should support the parameters stated in API reference (towards OpenAI Compatibility)
- https://cortex.so/api-reference is updated
Endpoints
Chat Completions
- POST
v1/chat/completions - Cortex supports Function Calling feat: Cortex supports Function Calling #295
Engines
- List engines: GET
/v1/engines - Get engine: GET
/v1/engines/{name} - Install engine: POST
/v1/engines/install/{name} - Get default engine variant/version: GET
v1/engines/{name}/default - Set default engine variant/version: POST
v1/engines/{name}/default - Load engine: POST
v1/engines/{name}/load - Unload engine: DELETE
v1/engines/{name}/load - Update engine: POST
v1/engines/{name}/update - uninstall engine: DELETE
/v1/engines/install/{name}
Pulling Models
- Pull model: POST
/v1/models/pullstarts download (websockets) - Pull model:
websockets /eventsemitted - Stop model download: DELETE
/v1/models/pull(websockets) - Stop model download:
websockets /eventsstopped - Import model: POST
v1/models/import
Running Models
- List models: GET
v1/models - Start model: POST
/v1/models/start - Stop model: POST
/v1/models/stop - Get model: GET
/v1/models/{id} - Delete model: DELETE
/v1/models/{id} - Update model: PATCH
/v1/models/{model}updates model.yaml params
Server
- CORs [WIP]
- health: GET
/healthz - terminate server: DELETE
/processManager/destroy
Test list for reference:
- feat: e2e tests for APIs #1357 e2e tests for APIs in CI
- epic: Basic Automated test for Installation & Inference on compatible Hardware & OS #1147, epic: Structure Manual QA for cortex.cpp #1225 for starting QA list
Metadata
Metadata
Assignees
Labels
type: QA checklistQA checklistQA checklist