Skip to content

jieyefriic/nbcraft

nbcraft

PyPI version Python versions License: MIT CI

nb — one CLI for image and video generation across Gemini, DashScope (wan), Volcengine Ark (Seedream / Seedance), and OpenAI gpt-image-2. Switch backends with --model, keep one history, no SDK juggling.

pip install nbcraft
nb config set api_key $GEMINI_API_KEY
nb gen -p "a tiny astronaut surfing a banana wave, neon dusk" --ar 16:9 --res 2K

Why

Each provider ships a different SDK, request shape, polling protocol, and quirk. nbcraft papers over that:

  • One nb gen for 8 image models — pick speed, price, region, or text-rendering strength.
  • One nb video gen for Seedance 2.0 with submit/poll/download in a single call.
  • One history (~/.nbcraft/history.json) tracks prompts, models, outputs across all backends.
  • No accounts you don't already have — bring whichever API keys you have; unused backends stay dormant.

Models

Image — nb gen --model <alias>

Alias Backend Model Notes
pro (default) Gemini gemini-3-pro-image-preview High quality
flash Gemini gemini-3.1-flash-image-preview Faster, cheaper
wan-pro DashScope 阿里云 wan2.7-image-pro T2I 4K, edit ≤2K, 9 refs
wan DashScope 阿里云 wan2.7-image ≤2K, faster
seedream Volcengine Ark 火山方舟 doubao-seedream-4-5-251128 4K, 14 refs, top text rendering
seedream-lite Volcengine Ark 火山方舟 doubao-seedream-5.0-lite Latest 5.0, faster/cheaper
seedream-legacy Volcengine Ark 火山方舟 doubao-seedream-4-0-250828 4.0 fallback
gpt OpenAI gpt-image-2 4K, ~99% character-level text accuracy

Video — nb video gen --model <alias>

Alias Backend Model Notes
seedance (default) Volcengine Ark doubao-seedance-2-0-260128 High quality, ~2-3 min/clip
seedance-fast Volcengine Ark doubao-seedance-2-0-fast-260128 ~30-60s/clip, ~36% cheaper

Install

# Stable
pip install nbcraft

# Or from source
git clone https://github.com/jieyefriic/nbcraft
cd nbcraft
pip install -e .

Requires Python 3.10+.

Configure API keys

nbcraft reads keys from ~/.nbcraft/config.json first, then environment variables. Configure only the backends you actually use.

# Gemini (Google AI Studio: https://aistudio.google.com/app/apikey)
nb config set api_key <key>          # or export GEMINI_API_KEY

# DashScope 阿里云 (https://dashscope.console.aliyun.com)
nb config set dashscope_api_key <key>  # or export DASHSCOPE_API_KEY

# Volcengine Ark 火山方舟 (https://console.volcengine.com/ark)
nb config set ark_api_key <key>        # or export ARK_API_KEY

# OpenAI (https://platform.openai.com/api-keys)
nb config set openai_api_key <key>     # or export OPENAI_API_KEY

Volcengine Ark needs extra setup ⚠️

API key alone is not enough. In the Ark console:

  1. Activate the model service: 系统管理 → 开通管理 → 视觉大模型 → activate Doubao-Seedream (and Seedance for video)
  2. (Optional) Create an inference endpoint if you prefer endpoint IDs: 在线推理 → 创建推理接入点 → copy ep-xxx, then nb config set ark_endpoint_id ep-xxx
  3. Generate API key: API Key 管理 → 创建

If you skip step 1, you'll get InvalidEndpointOrModel.NotFound on first call.

OpenAI proxy / custom base URL

nb config set openai_base_url https://your-proxy/v1

Quickstart

Image

# Gemini default
nb gen -p "a watercolor portrait of a librarian cat"

# Reference image + aspect ratio + resolution
nb gen -p "make this scene at sunset" -i photo.jpg --ar 16:9 --res 2K -o sunset.png

# Seedream 4.5 — best for posters with embedded Chinese text
nb gen -p "promotional poster, headline 春日上新, soft pastel" --model seedream --ar 3:4 --res 4K

# GPT Image 2 — strong character-level text + agentic reasoning
nb gen -p "minimalist logo with the words Stay Curious" --model gpt --ar 1:1 --res 2K

# Multi-reference fusion (Seedream up to 14)
nb gen -p "outfit from img 1, pose from img 2" -i a.png -i b.png --model seedream

# Generate 4 variations in parallel
nb gen -p "a quiet study room" --repeat 4 -o studies.png
# → studies_1.png, studies_2.png, studies_3.png, studies_4.png

# Prompt from a file
nb gen -p @prompt.txt --model wan-pro --res 4K

Video (Seedance 2.0)

# Text-to-video, 5s, 720p, no audio (default)
nb video gen -p "a shiba inu turning under cherry blossoms, slow motion" --ar 16:9

# Image-to-video with first-frame
nb video gen -p "camera dollies in, then orbits the product" -i product.png --duration 8

# Cinema-grade 1080p with synced audio
nb video gen -p "a barista narrates over morning espresso" --res 1080p --duration 10 --audio

# Fast iteration draft
nb video gen -p "rough concept" --model seedance-fast --duration 4

# Submit only (don't wait); check later
nb video gen -p "..." --no-poll
nb video status <task_id>
nb video download <task_id> -o final.mp4

⚠️ Generated videos expire 24 hours after they finish. With --poll (default) they're downloaded immediately; with --no-poll, run nb video download before the window closes.

Batch image jobs (Gemini only)

# tasks.jsonl
# {"key": "img1", "prompt": "...", "images": ["ref.png"], "ar": "3:4", "res": "2K"}
# {"key": "img2", "prompt": "..."}

nb batch submit -f tasks.jsonl --poll -o ./output/
nb batch list
nb batch status <job_name>
nb batch download <job_name> -o ./output/

Reverse-analyze an image (Gemini only)

# Infer HTML structure from a UI screenshot
nb rev struct -i screenshot.png -o structure.md

# Describe an image
nb rev desc -i photo.png -p "analyze the color palette"

History and stats

nb history                    # last 20
nb history -s "poster"        # filter by prompt
nb history <record_id>        # full JSON detail
nb stats                      # totals, success/fail, monthly/daily

Command reference

Command What it does
nb gen Generate image(s) from a text prompt
nb video gen Generate video (Seedance 2.0) — submit + poll + download
nb video status <id> Check Seedance task status
nb video download <id> Download a finished Seedance video
nb batch submit/status/download/list Gemini Batch API
nb rev struct UI screenshot → inferred HTML structure (Gemini)
nb rev desc General image description (Gemini)
nb history Browse past generations
nb config show / set Manage configuration
nb stats Usage statistics

Run nb <command> --help for full options.

Defaults you can preset

nb config set default_model flash       # Picked when --model is omitted
nb config set default_ar 3:4
nb config set default_res 2K
nb config set output_dir ./out
nb config set poll_interval 20          # Batch poll seconds

Backend caveats at a glance

  • Reference image limits: Gemini 14, DashScope wan* 9, Seedream 14, OpenAI gpt 16
  • nb batch is Gemini-only (other providers don't expose comparable batch APIs)
  • nb rev is Gemini-only (the image-gen models don't do image→text)
  • 4K: Gemini, wan-pro (T2I only), all Seedream, gpt — native; wan 4K downgrades to 2K
  • Negative prompt: native on DashScope, folded into the main prompt for Seedream/gpt, ignored by Gemini
  • Volcengine Ark: API key alone is not enough — activate the model service in the console first

For Claude Code users

This repo also ships SKILL.md, a Claude Code skill description that lets Claude invoke nb automatically when you ask it to generate images. Install the skill once and Claude will pick the right model and arguments based on the request.

Development

git clone https://github.com/jieyefriic/nbcraft
cd nbcraft
pip install -e ".[dev]"

pytest                   # run tests
ruff check .             # lint
ruff format .            # format

See CONTRIBUTING.md for the full workflow.

License

MIT

About

Multi-backend image and video generation CLI — Gemini, DashScope (wan), Volcengine Ark (Seedream/Seedance), OpenAI gpt-image-2.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages