diff --git a/.github/workflows/dictation-prompt.md b/.github/workflows/dictation-prompt.md index dbb8386c168..e9e45c01ccd 100644 --- a/.github/workflows/dictation-prompt.md +++ b/.github/workflows/dictation-prompt.md @@ -47,7 +47,7 @@ Extract technical vocabulary from documentation files and create a concise dicta ## Your Mission Create a concise dictation instruction file at `skills/dictation/SKILL.md` that: -1. Contains a glossary of approximately 1000 project-specific terms extracted from documentation +1. Contains a glossary of exactly 256 project-specific terms extracted from documentation 2. Provides instructions for fixing speech-to-text errors (ambiguous terms, spacing, hyphenation) 3. Provides instructions for "agentifying" text: removing filler words (humm, you know, um, uh, like, etc.), improving clarity, and making text more professional 4. Does NOT include planning guidelines or examples (keep it short and focused on error correction and text cleanup) @@ -55,7 +55,34 @@ Create a concise dictation instruction file at `skills/dictation/SKILL.md` that: ## Task Steps -### 1. Scan Documentation for Project-Specific Glossary +### 1. Run NLP Word-Frequency Histogram + +Run the following Python script to compute a word-frequency histogram of code-formatted tokens across all documentation files. Use the output as the **primary source** for selecting the 256 glossary terms — prefer tokens with high frequency that are project-specific (not generic English words). + +```bash +python3 - <<'EOF' +import re +from pathlib import Path +from collections import Counter + +docs = Path("docs/src/content/docs") +tokens = Counter() + +for md_file in docs.rglob("*.md"): + text = md_file.read_text(errors="replace") + # Collect backtick-quoted technical tokens + tokens.update(re.findall(r'`([^`\n]+)`', text)) + # Also collect hyphenated/dotted/underscored identifiers + tokens.update(re.findall(r'\b([\w][\w\-\.]{2,}[\w])\b', text)) + +print("Frequency histogram — top 500 project tokens:") +for tok, n in tokens.most_common(500): + if len(tok) > 2: + print(f" {n:5d} {tok}") +EOF +``` + +### 2. Scan Documentation for Project-Specific Glossary Use `search` to efficiently discover documentation covering different areas of the project, then read the returned files to extract vocabulary. This is more targeted than scanning all files with `find`: @@ -68,7 +95,7 @@ Read each returned file path for its content, then also scan any remaining docum **Focus areas for extraction:** - Configuration: safe-outputs, permissions, tools, cache-memory, toolset, frontmatter -- Engines: copilot, claude, codex, custom +- Engines: @copilot, claude, codex, custom - Bot mentions: @copilot (for GitHub issue assignment) - Commands: compile, audit, logs, mcp, recompile - GitHub concepts: workflow_dispatch, pull_request, issues, discussions @@ -79,13 +106,13 @@ Read each returned file path for its content, then also scan any remaining docum **Exclude**: makefile, Astro, starlight (tooling-specific, not user-facing) -### 2. Create the Dictation Instructions File +### 3. Create the Dictation Instructions File Create `skills/dictation/SKILL.md` with: - Frontmatter with name and description fields - Title: Dictation Instructions - Technical Context: Brief description of gh-aw -- Project Glossary: ~1000 terms, alphabetically sorted, one per line +- Project Glossary: 256 terms, alphabetically sorted, one per line - Fix Speech-to-Text Errors: Common misrecognitions → correct terms - Clean Up and Improve Text: Instructions for removing filler words and improving clarity - Guidelines: General instructions as follows @@ -100,7 +127,7 @@ You do not have enough background information to plan or provide code examples. - maintain the user's intended meaning ``` -### 3. Create Pull Request +### 4. Create Pull Request Use the create-pull-request tool to submit your changes with: - Title: "[docs] Update dictation skill instructions" @@ -109,9 +136,9 @@ Use the create-pull-request tool to submit your changes with: ## Guidelines - Scan only `docs/src/content/docs/**/*.md` files -- Extract ~1000 terms (950-1050 acceptable) +- Extract 256 terms (240-270 acceptable) - Exclude tooling-specific terms (makefile, Astro, starlight) -- Prioritize frequently used project-specific terms +- Prioritize frequently used project-specific terms (use NLP histogram from Step 1) - Alphabetize the glossary - No descriptions in glossary (just term names) - Focus on fixing speech-to-text errors, not planning or examples @@ -120,7 +147,7 @@ Use the create-pull-request tool to submit your changes with: - ✅ File `skills/dictation/SKILL.md` exists - ✅ Contains proper SKILL.md frontmatter (name, description) -- ✅ Contains ~1000 project-specific terms (950-1050 acceptable) +- ✅ Contains 256 project-specific terms (240-270 acceptable) - ✅ Terms extracted from documentation only - ✅ Focuses on fixing speech-to-text errors - ✅ Includes instructions for removing filler words and improving text clarity diff --git a/skills/dictation/SKILL.md b/skills/dictation/SKILL.md index 918b7cd57e4..1ee0e5784aa 100644 --- a/skills/dictation/SKILL.md +++ b/skills/dictation/SKILL.md @@ -20,1085 +20,254 @@ The following project-specific technical terms should be corrected when encounte @copilot ACTIONS_STEP_DEBUG ANTHROPIC_API_KEY -ANTHROPIC_BASE_URL CLAUDE_CODE_OAUTH_TOKEN CODEX_API_KEY COPILOT_GITHUB_TOKEN DEBUG FUZZY:BI-WEEKLY FUZZY:DAILY -FUZZY:HOURLY -FUZZY:TRI-WEEKLY FUZZY:WEEKLY GEMINI_API_KEY -GH_AW_ACTION_MODE GH_AW_AGENT_TOKEN GH_AW_ALLOWED_DOMAINS -GH_AW_CI_TRIGGER_TOKEN -GH_AW_GITHUB_MCP_SERVER_TOKEN GH_AW_GITHUB_TOKEN -GH_AW_PHASE -GH_AW_PROJECT_GITHUB_TOKEN GH_AW_PROMPT -GH_AW_READ_PROJECT_TOKEN GH_AW_SAFE_OUTPUTS -GH_AW_SAFE_OUTPUTS_PORT -GH_AW_SAFE_OUTPUTS_STAGED GH_AW_VERSION GH_AW_WORKFLOW_ID -GH_AW_WRITE_PROJECT_TOKEN GH_TOKEN -GITHUB_ACTIONS -GITHUB_ACTOR -GITHUB_COPILOT_BASE_URL -GITHUB_PERSONAL_ACCESS_TOKEN -GITHUB_REF -GITHUB_REPOSITORY -GITHUB_SERVER_URL -GITHUB_STEP_SUMMARY GITHUB_TOKEN -GITHUB_WORKFLOW -GITHUB_WORKSPACE -ITERATION -MEMBER -NDJSON -NONE OPENAI_API_KEY -OPENAI_BASE_URL -OWNER RUNNER_TEMP SARIF -SLACK_WEBHOOK -access -accessible -action -action-mode -action-pins.json -action-repo -action-version -action.yml -action_pins.json -actionlint -actions -actions-lock.json -actions-read -actions/cache -actions/checkout -actions/github-script -actions/setup-dotnet -actions/setup-go -actions/setup-java -actions/setup-node -actions/setup-python activation -activation-app-token activation-job -active -actor -actual -add-comment -add-comment.discussions -add-labels -add-labels.allowed -add-reviewer -additional -after_run_id agent-job -agent_output.json agentic agentic-workflows -agentics-maintenance.yml -agents -ai-generated -ai-moderator -allow allow-workflows allowed allowed-domains -allowed-events -allowed-extensions allowed-files -allowed-github-references allowed-labels -allowed-pull-request-repos -allowed-reasons allowed-repos -allowed_repositories allowlist -analysis -analyze_imports -api-key -api-target api.github.com -apiKey -app-id -append -append-only-comments -apply -approval -approval-labels -approved -architecture -args artifact -artifacts -assign -assign-milestone assign-to-agent -assign-to-bot assign-to-copilot -assign-to-user -assignee -assignees -assignment audit -audit-finding audit-workflows -audits -auto-close auto-merge -auto-triage-issues -autofix-code-scanning-alert automation -automation-enabled -availability -aw-patch -aw.patch -aw_info.json -awf-diagnostic-logs -background -banner -base-branch bash -before_run_id -bi-weekly -blob -blocked -blocked-users -board -boards -body -boolean -boolean-field branch branch-name -branch-prefix -branch_name -branches -branches-ignore -breaking -breaking-change -bug build -builds bun cache cache-key cache-memory -cached -caches -call-workflow -capabilities -capture -category -category-filter -changes -characters -chart -charts -charts-with-trending -chat-ops -chatops checkout -checkouts checks -choice claude -claude-haiku-4-5 -claude-sonnet-4.5 -cleanup -close-discussion -close-issue -close-older-discussions -close-older-issues -close-pull-request -close_older -closed -cluster code-scanning -codespaces codex -coding coding-agent -collaboration -collection -command -commands comment -comment-body -comment-thread -comment_id -comment_repo -commit -commit-changes -commits compile -compile-stable -compile-time compile-workflow -compileWorkflow compiled -compiled_file compiler -completion -compliance -component -component-spec -components -conclusion -conclusion-check concurrency concurrency-group -concurrency.job-discriminator config -configuration-file -configurations -configure -configured -conflicts -connection -console -constraints -container -container.env -containerized -containers -content -content-security -content.message -content.path -content.size -content.type contents -context -context-variable -contributors -control -copilot -copy-from -core -count-limit -coverage -create-agent-session -create-code-scanning-alert create-discussion -create-discussion.labels create-issue -create-issue.labels -create-project -create-project-status-update create-pull-request -create-pull-request-review-comment -create-pull-request.fallback-as-issue -create-pull-request.labels -create_discussion -create_fields -create_issue -create_labels -create_project -create_pull_request -create_pull_request_review_comment -create_views -created_issue_number -created_issue_url -created_pr_number -created_pr_url -creation -credentials -criteria cron -cross-repo -cross-repository custom custom-agent -customSchemas -customServerConfig daily -daily-ops -data-analysis -data-ops -data-server -database -date -day-of-week -days debug -debug-logging -deep -deepwiki -default default-branch defaults -definition -definitions -delete_symbol -demand deno dependabot -dependencies -deploy -deploy-app -deploy-preview -deployment -deployment-check -deployments description -description-field -destination -details -detection -devcontainer.json -development -directory -disable -disabled -discussion -discussion_comment -discussion_number -discussions -dispatch -dispatch-ops dispatch-workflow -dispatch_repository -dispatched -dispatching +discussions docker documentation -domains dotnet -download -downloaded_files -downstream draft -duplicate -echo-command -ecosystem edit -effective-tokens -empty -enabled -encoding/json -end_date -endpoints -enforce -enforcement engine engine-config -engine.concurrency -engine.env engines -entity -entrypoint -entrypointArgs environment environment-variables -ephemerals -error -errors -event-filter -event-trigger events -exec -execute -execution -execution-context -executor -experiments -expiration-date -expire expires -explicit -explore -expressions -extension -extraction fail-fast -failures -fallback fallback-as-issue -fallback-to-issue -false -faster -feature feature-flag features -feedback -fetch -file-glob -file-path -filesystem -filter -find_anomalies -find_files -find_referencing_symbols -find_symbol -firewall -firewall-audit-logs -flag -flags -fmt -footer -footer-install -footer-text -force-push -force-update -forks -formatted frontmatter -frontmatter-field -function -functionality -functions fuzzy fuzzy-schedule gateway -gateway.apiKey -gateway.jsonl -gateway.trustedBots -gatewayConfig -gatewayVersion gemini -generate -generate-report -generated -generation -get_file_contents -get_me -get_project_structure -get_pull_request -get_repository -get_symbol_definition -get_symbols_overview -get_team_members -get_teams -get_user gh-aw -gh-aw-as-mcp-server git-branch git-commit -git-diff -git-status github github-actions github-app -github-app-token -github-context -github-graphql -github-script github-token -github.actor -github.base_ref -github.event.issue.number -github.event.pull_request.number -github.job -github.owner -github.ref_name github.repository github.run_id -github.run_number -github.server_url -github.workflow -github.workspace github/gh-aw -global -glossary -go-mod-file -guard-policy -guidance -guides -handlers -hash-check -headers -health -health-check -hide-comment -high-priority -host -host-network -hour hourly -hourly-schedule -hours -http -http-request -https id-token -identifier -identifiers -if-condition -implement -implementation -implements -import-path -import-schema -importResolver -imported imports -incremental -injection-protection input -input-field -input-validation inputs -insert_after_symbol -insert_before_symbol inspect -install -install-gh installation -instructions integration -integrity-proxy -integrity-reactions -interactive-mode -interface -isolation issue issue-ops -issue-triage issue_comment -issue_number issueops issues -iterations -java javascript job-discriminator -job-output jobs -jq json json-schema -keys -keyword -keyword-search -knowledge label -label-filter label-ops -label_command -labeled -labeling labelops labels -language -language-detection latest -layer -libraries -limit -limit-per-run -limits -line -lines -link-sub-issue -linking -list_code_scanning_alerts -list_codemods -list_commits -list_discussions -list_issues -list_pull_requests -list_symbols_in_file -list_users -list_workflow_runs -list_workflows -load -local -lock -lock-file -lockdown -lockdown-mode lockfile -logic logs -loops -machine main -maintainer -management -managing -manual -manual-approval -manually markdown -match -matching max-continuations -max-file-count -max-file-size -max-patch-size max-turns -max_tokens -maximize mcp mcp-gateway mcp-inspect mcp-list mcp-registry mcp-scripts -mcp-scripts-mode-removal -mcp-scripts.mode mcp-server mcp-servers -mcp.port -mcp_failures -mechanism -mechanisms -member -memory -mention merge -merged -messages metadata -metadata-read -migrate milestone min-integrity -minimal mode model -module -modules -monitoring -monthly-report -multi-repo -multirepo -multirepoops needs.activation network -network-firewall-migration network.allowed network.firewall -nightly-run node -none noop -null -observability.otlp on-demand -operations -operations-log -operator -optional -orchestration -orchestrator org organization -organization-projects -organizations -organize -outdated -output-field -output-variable outputs override -overrides owner -owner-name package -package.json parallel -parameters -parent -parent_issue_number parsing -patch -patch-update -path -paths -patterns -payloadDir -payloadPath -payloadPathPrefix -payloadSizeThreshold permissions phase -phases -php -pin-versions -pinned -pinning pip pipeline playwright -plugins -plugins.github-token -post-steps -poutine -powershell -pr-comment -pr-event -pr-fix -pr-label -pr-merge -pr-review -pr-title -pre-activation -pre-check -pre_activation -prepend -preview.first_item -preview.item_count -preview.schema -priority -private -private-key -privilege -problem -process -processes -project -project-board -project-column -project-field -project-id -project-item -project-number -project-status -project-title -project-url -project-view -projectops -projects prompt prompt-injection protected-files -protection -protocol -provider -public -public_repo -pull-request-repo pull-requests -pull_number pull_request -pull_request_comment -pull_request_number -pull_request_review_comment pull_request_target -pull_requests -purpose -push-ref -push-to-branch -push-to-pull-request-branch -py python python3 -qmd -quality -quality-gate -query -quick-start -ranges -rate -rate-limit -rate-limiting -reactions -read -read-all -read-only -read-permission -ready-for-review -recommendations recompile -reference -references -regenerate registry -related -relationships release -remote -remote-server -remove-labels -replace -replace-island -replace_symbol_body -reply-to-pull-request-review-comment -reply_to_pull_request_review_comment repo repo-memory -repo-ops -repomix -report-as-issue report-summary -report_diagnostics_to_pull_request -reporting -repos repository repository_dispatch -repository_features_validation -repository_slug -requirements.txt -research -resolution -resolve-pull-request-review-thread -resolveReviewThread -respond -response -retention-days -retrieve -reusable -reuse review -reviewer -reviewers -reviews -roles -ruby -run-context -run-failure -run-id -run-name -run-started -run-success -run_id_or_url -runner -running -runs runs-on -runs-on-slim runtime -runtime-env runtimes -rust -safe safe-inputs -safe-inputs-mode-removal -safe-inputs.mode -safe-mode -safe-output-app safe-outputs -safe-outputs.app -safe-outputs.concurrency-group -safe-outputs.env -safe-outputs.footer -safe-outputs.jobs -safe-outputs.messages -safe-outputs.runs-on -safe-outputs.staged -safe-outputs.threat-detection -safe_outputs sandbox -sandbox-agent-false-removal -sandbox.agent.env -sandbox.agent.mounts -sandbox.mcp.env -sandbox.mcp.trusted-bots sanitized -scale -scenarios schedule -schedule-cron scheduled -schedules schema -schemas -scope -script -script-step scripts search -secret-key -secret-masking secrets security -security-events -semantic -sensitive -separate -serena -services session -session-analysis -session-insights setup -severity -share shared shared-workflow -shared/common-tools -shared/file -shared/gh -shared/mcp -sharing shell -shows -sidebar -similar -size -skillz -skip -skip-bots -skip-if-match -skip-if-no-match -skip-roles slack -slash -slash_command -small -software -source -source-destination -spec-ops -specVersion -specfile -specifications -specified -specifies -specify staged -staged-description staged-mode -staged-title -staging stale -standard -start_date -startupTimeout -state -states -static -status-update-id -stdio steps.sanitized.outputs.body steps.sanitized.outputs.text steps.sanitized.outputs.title -stop-after -strict -string -structured -sub-issues -sub_issue_number -submit-pull-request-review -submit_pull_request_review -summary -system -target-repo -task-ops -tavily -tavily-search -template -template-file -temporary-id -temporary_id -testing -threat-detection -time_remaining timeout timeout-minutes -timeout-minutes-migration -timeout_minutes -timestamp -timezone -timezone-offset -title-prefix -todo token-weights -tokens -toolTimeout -tool_usage toolsets -traceparent -tracker-id -tracking -tracking-issue -transform -translation -trial -trial-ops -trials trigger -trigger-event triggers trusted -trusted-domain trusted-users -trustedBots ubuntu -ubuntu-22.04 -ubuntu-24.04 ubuntu-latest -ubuntu-slim -unassign-first -unassign-from-user -unique -unstructured -update-discussion update-issue -update-project update-pull-request -update-release -upgrade -upload-asset -users -utc-N -valid validate -validated validation -values -variables -variations -vars -verbose -verbose-mode version version-bump -version-check -views -visibility -visible-fields -volume vulnerability-scan -wasm -wasm-compilation -watch web-fetch web-search webhook -webhook-notify -webhook_notify weekly -weekly-ops -weekly-research -weekly-summary -worker workflow -workflow-compile -workflow-compiler workflow-dispatch -workflow-generator -workflow-health-manager workflow-run workflow-status -workflow-trigger workflow_call workflow_dispatch -workflow_file -workflow_file_path -workflow_name workflow_run workflows -workflows/ workspace write-all -write-permission yaml zizmor -zsh ## Fix Speech-to-Text Errors @@ -1138,7 +307,7 @@ When fixing dictated text, correct these common misrecognitions: - "effective tokens" → effective-tokens ### AI Engines -- "co-pilot" → copilot +- "co-pilot" → @copilot - "code x" → codex - "cloud" → claude (when referring to the AI engine) - "gem ini" → gemini (when referring to the AI engine)