Skip to content

mitkox/SkillOpt

Repository files navigation

SkillOpt: local-first skill optimization for agent workflows

Optimize a skill document / system prompt like you would optimize code: iterate, evaluate, and keep the model weights frozen.

Python 3.10+ License: MIT Paper

This open-source fork is focused on local AI workflows:

  • run SkillOpt against any OpenAI-compatible local server
  • use the included DotNetDebug example for cheap end-to-end smoke tests
  • keep private configs, outputs, and secrets out of git
  • document the local path first, while still supporting cloud backends

Important: SkillOpt optimizes a skill document / system prompt, not model weights. Your model stays frozen; SkillOpt improves the instructions it receives.

What you can do with this repo

  • optimize prompts/skills for benchmarked agent tasks
  • compare skill revisions with validation-gated training loops
  • run local experiments through openai_compat
  • inspect generated skills, histories, patches, and evaluation summaries

Local-first quick start

1) Clone and install

Requirements: Python 3.10+

git clone https://github.com/mitkox/SkillOpt.git
cd SkillOpt

python -m venv .venv
source .venv/bin/activate

pip install -e .

# Optional extras:
pip install -e ".[webui]"
pip install -e ".[alfworld]"

If you install the ALFWorld extra, also download its assets:

alfworld-download

2) Point SkillOpt at your local model server

Copy the environment template and load it:

cp .env.example .env
set -a
source .env
set +a

The default local workflow expects an OpenAI-compatible endpoint such as llama.cpp server, vLLM, LM Studio, Ollama's OpenAI bridge, or your own local server.

export OPENAI_COMPAT_BASE_URL="http://localhost:8000/v1"
export OPENAI_COMPAT_API_KEY="local"

The included local sample config is:

  • config: configs/dotnetdebug/local_mitko.yaml
  • backend: openai_compat
  • default model name: mitko
  • sample dataset: data/dotnetdebug/tasks.json
  • seed skill: skillopt/envs/dotnetdebug/skills/initial.md

If your server exposes a different model name, change model.optimizer and model.target in the config or override them with --cfg-options.

3) Run the included smoke test

This is the fastest way to verify the local setup end to end.

python scripts/train.py \
  --config configs/dotnetdebug/local_mitko.yaml \
  --cfg-options \
    train.num_epochs=1 \
    train.batch_size=2 \
    gradient.minibatch_size=2 \
    gradient.analyst_workers=1 \
    env.workers=1 \
    env.limit=2 \
    optimizer.learning_rate=2 \
    env.out_root=outputs/dotnetdebug_smoke

Inspect the main artifact at:

  • outputs/dotnetdebug_smoke/best_skill.md

Other useful artifacts:

  • outputs/dotnetdebug_smoke/history.json
  • outputs/dotnetdebug_smoke/summary.json (if present)
  • outputs/dotnetdebug_smoke/steps/

4) Evaluate a trained skill

python scripts/eval_only.py \
  --config configs/dotnetdebug/local_mitko.yaml \
  --skill outputs/dotnetdebug_smoke/best_skill.md \
  --split test \
  --cfg-options \
    env.limit=2 \
    env.workers=1 \
    env.out_root=outputs/dotnetdebug_eval_smoke

Optional cloud backends

Local is the default path in this fork, but SkillOpt also supports hosted backends.

Azure OpenAI

export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"

# Option 1: API key auth
export AZURE_OPENAI_API_KEY="your-key"

# Option 2: Azure CLI auth
export AZURE_OPENAI_AUTH_MODE="azure_cli"

OpenAI

export OPENAI_API_KEY="sk-..."

Anthropic Claude

export ANTHROPIC_API_KEY="sk-ant-..."

Qwen-compatible local backend

export QWEN_CHAT_BASE_URL="http://localhost:8000/v1"
export QWEN_CHAT_MODEL="Qwen/Qwen3.5-4B"

Supported benchmarks

Benchmark Type Config
SearchQA QA configs/searchqa/default.yaml
ALFWorld Embodied agent configs/alfworld/default.yaml
DocVQA Document QA configs/docvqa/default.yaml
LiveMathematicianBench Math configs/livemathematicianbench/default.yaml
SpreadsheetBench Code generation configs/spreadsheetbench/default.yaml
OfficeQA Tool-augmented QA configs/officeqa/default.yaml
DotNetDebug C# debugging example configs/dotnetdebug/default.yaml

Data preparation

SkillOpt expects data in a split directory with train/, val/, and test/ subdirectories, each containing a JSON file such as items.json.

data/my_split/
├── train/items.json
├── val/items.json
└── test/items.json

Each JSON file is an array of task items. The exact schema depends on the benchmark. For example, SearchQA items look like:

[
  {
    "id": "unique_item_id",
    "question": "Who wrote the novel ...",
    "context": "[DOC] relevant passage text ...",
    "answers": ["expected answer"]
  }
]

See skillopt/envs/<benchmark>/dataloader.py for benchmark-specific formats.

Note: Most benchmark datasets are not included in this repository. The bundled exception is data/dotnetdebug/tasks.json, which exists specifically to support a runnable local smoke test.

Common CLI arguments

Argument Description Example
--config Benchmark config YAML configs/dotnetdebug/local_mitko.yaml
--split_dir Path to data split directory /path/to/split
--skill Skill document to evaluate outputs/my_run/best_skill.md
--split Split to evaluate test
--cfg-options Inline config overrides env.limit=2 env.workers=1

Output structure

Each run writes to a structured output directory:

outputs/<run_name>/
├── config.json             # Flattened runtime config
├── history.json            # Per-step training history
├── runtime_state.json      # Resume checkpoint
├── best_skill.md           # Best validated skill document
├── skills/skill_vXXXX.md   # Skill snapshot per step
├── steps/step_XXXX/        # Per-step artifacts
├── slow_update/epoch_XX/   # Slow-update logs
└── meta_skill/epoch_XX/    # Meta-skill logs

Re-running the same command resumes from the last completed step when possible.

WebUI

Launch the optional monitoring dashboard:

python -m skillopt_webui.app

Common flags:

Flag Default Description
--port 7860 Server port
--host 0.0.0.0 Bind address
--share off Create a public Gradio share link

Research background

This repo is grounded in the original SkillOpt research. If you want the paper/demo context, see:

Citation

@article{skillopt2026,
  title={SKILLOPT: Executive Strategy for Self-Evolving Agent Skills},
  author={SkillOpt Team},
  year={2026}
}

About

SkillOpt with local AI is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_skill.md artifacts.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors