Releases: genli-ai/market-research-skills
v1.7.1 — strict-YAML-safe descriptions (skills.sh detects all 4)
Fixed
- `verifying` & `analyst-research`: frontmatter `description` is now valid under strict YAML parsers. Both previously used a single-line unquoted scalar containing `: ` (colon-space) at `Triggers:` / `Covers five scenarios:` / `Not for:`, which strict parsers reject (`mapping values are not allowed here`). Claude Code's own loader is lenient — so all four skills always worked as a plugin — but skills.sh / `npx skills` (and other strict-YAML terminals) detected only 2 of 4. Converted both to `>-` folded block scalars (same style as `local-vault`). Description text is byte-identical; triggering behavior unchanged.
Added
- README Option 6 — `npx skills` / skills.sh cross-terminal install:
```bash
npx skills add genli-ai/market-research-skills -a claude-code -s '*'
```
The repo is now indexed by skills.sh (GitHub topics `skills-sh` / `claude-skill`). This path resolves the latest from `main` — no version to pin.
All four skill zips are attached (repo-wide snapshot at v1.7.1).
🤖 Generated with Claude Code
v1.7.0 — local-vault: PDF plain-text fallback + cached-whisper reuse
local-vault
Added
- PDF plain-text fallback. When
pymupdf4llmcrashes mid-extraction (commonly a missing-font error like MuPDFno font file for digest), the converter now falls back to a local PyMuPDF plain-text pass (page.get_text()) before resorting to cloud MinerU. A font glitch no longer forces a cross-border cloud round-trip. The result is markedconverted_by: pymupdf (plain text)so it's identifiable as a degraded artifact. - Reuse an already-downloaded whisper model without re-asking. When
.envhas no model choice yet but a model's weights are already in the user-global HuggingFace cache, the tool auto-selects the best cached tier (turbo > large-v3 > small > tiny) and proceeds — no menu, no re-download, works non-interactively. The 'show download sizes + ask' prompt only appears when there is actually something to download.
Changed
- A
pymupdf4llmerror on a digital PDF now logs 'trying plain-text fallback first' instead of going straight to MinerU.
Full snapshot of the repo at v1.7.0. All four skill zips attached: analyst-research, local-vault, topic-brief, verifying.
v1.6.0 — local-vault: platform-aware whisper engine + first-run model consent
Added
local-vault: whisper engine auto-selected by platform —mlx-whisper(GPU-native) on Apple Silicon,faster-whisper(CTranslate2, cross-platform CPU/CUDA) everywhere else. Both paths normalize to one transcript shape, so the rest of the pipeline is engine-agnostic. Override withKB_WHISPER_ENGINE(auto/mlx/faster).local-vault: first-run consent + model-size picker. When audio/video is detected and no model has been chosen yet, the tool (in a terminal) shows each model tier with its download size — tiny ~75 MB / small ~480 MB / turbo ~1.6 GB (recommended) / large-v3 ~3 GB — and lets the user pick or skip. The choice is saved to.env(KB_WHISPER_MODEL) so it never re-asks. On consent the engine is auto-installed (pip install --user). Non-interactive runs (e.g.claude -p) never download a multi-GB model unattended — they report a hint to run once in a terminal first.
Changed
- Audio/video no longer needs a manual
pip installof the transcription engine — the tool installs the right one for the platform after you consent. ffmpeg remains a prerequisite (system package; the tool prints the platform-appropriate install command if missing and skips, rather than auto-installing a system package).
Setup: brew install ffmpeg (or distro pkg) — that's it; the engine installs itself on first consent.
v1.5.0 — local-vault: audio/video local transcription (whisper)
Added
local-vault: audio & video now transcribe locally via whisper (mlx-whisper)..mp3/.m4a/.wav/.aac/.flac/.oggand.mp4/.mov/.m4vwere previously skipped; they now route toprocess_transcribe, producing a Markdown transcript with the same source backlink + retrieval frontmatter as every other converter. Fully local — no token, no quota, no API key (first run downloads the model ~1.6 GB once, then offline). Per-segment[mm:ss]timestamps (audio/video equivalent of page-number traceback) + auto-detected language. Video is audio-track only — ffmpeg pulls the audio stream from the container, so video and audio share one path (no keyframe OCR in this version).- New config knobs in
scripts/config.py:TRANSCRIBE_EXTS,WHISPER_MODEL(defaultmlx-community/whisper-large-v3-turbo),WHISPER_LANGUAGE,TRANSCRIBE_TIMESTAMPS. - Fail-soft dependency preflight — when
ffmpegormlx-whisperis missing, the whole audio/video batch is reported as skipped with a one-line install hint instead of repeated tracebacks.
Notes
- mlx-whisper is Apple-Silicon-only. On Intel/Linux the preflight degrades gracefully (files skipped with hint, no crash) — swap in a portable engine (e.g. faster-whisper) for cross-platform use.
- Best on clear speech (lectures, interviews, meetings, podcasts). Songs/music transcribe poorly — an ASR limitation, not a bug; use the
sourcebacklink to verify against the original.
Setup for the new feature: brew install ffmpeg && python3 -m pip install --user mlx-whisper
v1.4.0 — local-vault: local HTML conversion + PDF image tightening
local-vault
Changed
.html/.htmnow convert locally via pandoc instead of MinerU cloud (no token/quota, faster). HTML is pre-cleaned —style/class/idattributes and layout-onlydiv/section/spanwrappers stripped — so the vault gets the article content, not inline-CSS noise. Raw HTML is kept so complex tables survive losslessly (no[TABLE]placeholder loss).- Digital-PDF image extraction tightened. Size floor 5% → 12% of page area, plus a min-bytes drop (
PYMUPDF4LLM_IMAGE_MIN_BYTES, default 6000) and content de-duplication (a logo repeated on every page is stored once). Fixes the slowdown / no-value-image clutter from v1.3.0.
Added
- Master switch
PYMUPDF4LLM_WRITE_IMAGES(default on;.env:KB_PDF_NO_IMAGES=1) to turn digital-PDF image extraction off entirely for a text-only, fastest run.
Release bundles all four skills (verifying / topic-brief / analyst-research / local-vault). Full notes: CHANGELOG.md.
v1.3.0 — local-vault: digital-PDF images + launcher relocation
local-vault
Added
- Digital PDFs now keep their images. The pymupdf4llm path previously dropped every image (text-only output). It now extracts images (≥
PYMUPDF4LLM_IMAGE_SIZE_LIMIT, default 5% of page area) intoattachments/<stem>/, renaming them to ASCII-safe names so source filenames with spaces/CJK don't break the Markdown links. Discarded on a sparse→MinerU fallback. - New tuning knob
PYMUPDF4LLM_IMAGE_SIZE_LIMIT(scripts/config.py, default0.05).
Changed
sync.commandis now placed in the knowledge-base root (parent of SOURCE), not inside SOURCE. A stale auto-generated copy left in SOURCE by an older version is removed automatically (user-written launchers untouched).- When a different
sync.commandalready exists at the root, an interactive terminal prompts update / skip; non-interactively our own out-of-date launcher self-heals while a user-customized one is left alone.
Release bundles all four skills (verifying / topic-brief / analyst-research / local-vault). Full notes: CHANGELOG.md.
v1.2.0 — local-vault clickable launcher in SOURCE folder
local-vault
- The sync pipeline now drops a clickable
sync.commandinto the user's SOURCE folder (macOS), with the absolute path tosync.pybaked in. Tool and data live apart — under/plugin installthe script sits in~/.claude/plugins/cache/…, so the old relative-path launcher was effectively unreachable for plugin users. The new launcher makes the daily loop drop files → double-click → read the.mdwork regardless of install path. Created by the setup wizard and idempotently on every run, and self-heals (re-points itself when a plugin upgrade movessync.py). - New
KB_NO_LAUNCHER=1.envflag to opt out.
中文:同步管线现在会往用户原始文件目录放一个可双击的 sync.command(macOS),硬编码 sync.py 绝对路径,解决插件安装时工具埋在 cache 深处、相对路径 launcher 不可达的问题。向导 + 每次运行幂等补放,升级自愈。KB_NO_LAUNCHER=1 可关。
All four skills (verifying / topic-brief / analyst-research / local-vault) are attached as a full snapshot of this version.
🤖 Generated with Claude Code
v1.1.0 — local-vault skill
Added
New skill: local-vault — build and query a local Markdown knowledge base ("vault").
- Convert / sync: turn raw files (PDF, Word/docx, PowerPoint/pptx, Excel/xlsx, csv/tsv, images, html, md/txt, code) into clean Markdown with retrieval-friendly frontmatter (abstract / tags / synonyms / key_data + a source backlink). Local-first (pandoc / pymupdf4llm / openpyxl / python-pptx); cloud OCR (MinerU) only as a fallback.
- Retrieve / answer: answer questions over the vault with retrieval discipline — vault health check, coverage self-monitoring, lossy-content flags, and Maps-of-Content (MOC) proposals that grow from real usage.
- Incremental sync, orphan staging (deleted sources never hard-deleted), frontmatter-only enrichment (document bodies never rewritten → zero content-loss risk from the tool).
The collection is now four skills: verifying / topic-brief / analyst-research / local-vault.
This release attaches the full set of skill zips (repo-wide snapshot at v1.1.0).
新增
新 skill:local-vault —— 构建并查询本地 Markdown 知识库。
- 转换 / 同步:把 PDF / Office / 图片 / 代码等原始文件转成带检索 frontmatter 的干净 Markdown,本地优先,云端 OCR(MinerU)仅兜底。
- 检索 / 回答:基于 vault 负责任地回答 —— 健康检查、覆盖度自检、有损内容标注、按真实使用沉淀 MOC。
- 增量同步、孤儿暂存、仅改 frontmatter(正文永不改写,零内容丢失风险)。
合集现为四个 skill:verifying / topic-brief / analyst-research / local-vault。本次 Release 挂载全部 skill 的 zip(v1.1.0 全量快照)。
v1.0.0 — analyst-research 3-mode skill + full bilingual sources
First stable release. Bumps from v0.5.0 → v1.0.0, folding the internal 0.6.x analyst-research development series (never published) into one milestone.
Highlights
- New skill
analyst-research— three-mode end-to-end research workflow: light (4-5p memo, 0 charts, 60-80 min) / medium (12-15p, 6-10 charts, half-day) / heavy (30-40p flagship, 25-35+ charts, multi-LLM optional). Mode picked at trigger time. - BREAKING:
light-researchremoved as a standalone skill — preserved verbatim asanalyst-researchlight mode. MODE_REGISTRY.mdsingle source of truth for the 3 modes; SessionStart announce hook; AI-disclosure footer convention (spec §八).- Full Chinese mirrors (
.zh.md) for analyst-research; bilingual-sync notes added to topic-brief / verifying. - Report-language policy: reports default English, AI replies in the user's chat language.
- Fixed repo URL (reagan475614947 → genli-ai) in README + announce hook.
Skills in this release
verifying— fact-check against primary sourcestopic-brief— single-piece HTML briefing generatoranalyst-research— 3-mode research workflow
Full changelog: see CHANGELOG.md.
v0.5.0 — new light-research skill
Added
- New skill:
light-research— lightweight research workflow that produces a 5-page decision memo / executive brief (PDF + Word) in 60–80 min. Single LLM, 0 hard stops, BLUF consulting-style summary, plain text no charts, inline footnote citations.- 6-step skeleton: hypothesis → search → plan → draft → self-check → freeze
- 1-question onboarding (hypothesis only), everything else locked to sensible defaults
- 6 source categories, 20–30 source ledger + 5–8 core PDFs in full text
- 12-item grep self-check enforces writing discipline (dashes, headings, bold-in-body, colon ratio, filler words, meta-language, page count, etc.)
- Self-check failure auto-rolls back to step 4 — the only quality gate, no user-confirmation hard stop
Files
light-research.zip— new skill (this release)topic-brief.zip— unchanged since v0.4.0, re-attached per release-completeness policyverifying.zip— unchanged since v0.2.0, re-attached per release-completeness policy
Install (Claude Code)
/plugin marketplace add https://github.com/reagan475614947/market-research-skills.git
/plugin update market-research-skills@market-research-skills
See CHANGELOG.md for full details (English + 中文).