feat: add scheduled issue labeler for type/domain triage by williamfzc · Pull Request #251 · larksuite/cli

williamfzc · 2026-04-03T11:41:40Z

Summary

This PR adds an automated issue labeling workflow that polls issues and applies Type and Domain labels based on heuristics.

Changes

Add GitHub Actions workflow: .github/workflows/issue-labels.yml
- Runs hourly on schedule and writes labels by default
- Supports workflow_dispatch with dry_run (default: true) and lookback_hours override
- Upgrade to actions/checkout@v6 and actions/setup-node@v6 for Node 24 runtime alignment
Add issue labeler script: scripts/issue-labels/index.js
- Type labels (keyword-based); does not override existing type labels by default
- Domain labels domain/<service>; add-only by default, optional strict sync via --sync-domains
- Optional --override-type to enforce computed type labels
Add regression dataset and runner:
- scripts/issue-labels/samples.json
- scripts/issue-labels/test.js
Add usage documentation: scripts/issue-labels/README.md

Safety / Behavior Notes

Type labels are only applied when keywords match; otherwise no type changes are made.
Existing type labels are preserved by default to avoid reverting manual triage.
Domain labels are add-only by default; no removals unless --sync-domains is explicitly enabled.

Test Plan

node scripts/issue-labels/test.js

Summary by CodeRabbit

New Features
- Automated issue-labeling that assigns a single Type (bug, enhancement, question, documentation, performance, security) and one or more Domain labels with configurable sync/override options.
Chores
- Added a scheduled (hourly) and manual workflow to run label syncs with safe dry-run by default, concurrency controls, limited permissions, and failure warnings.
Documentation
- Added README describing labeling rules, flags, and usage.
Tests
- Added sample dataset and test runner to validate classification.

Add a manual GitHub Actions workflow and script to poll issues and apply type/domain labels.

Improve domain detection and add safeguards to avoid overriding manual type triage by default. Refresh regression samples from real issues and document usage.

Run hourly on schedule with write mode by default while keeping manual dispatch dry-run by default.

Reduce scheduled scan window while keeping overlap for missed runs.

Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 and use Node 24 for the script runtime to avoid upcoming Node 20 deprecation warnings.

Allow workflow_dispatch to override lookback_hours while keeping hourly schedule fixed.

Use actions/checkout@v6 and actions/setup-node@v6 to align with Node 24 runtime and avoid Node 20 deprecation warnings.

coderabbitai · 2026-04-03T11:41:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a scheduled/manual GitHub Actions workflow and a new Node.js issue-labeler with heuristics for assigning single-select Type and multi-select Domain labels, plus sample data, a test runner, and README documentation.

Changes

Cohort / File(s)	Summary
Workflow `/.github/workflows/issue-labels.yml`	New hourly + manual GitHub Actions workflow (`Issue Labels`) with `dry_run` input, minimal permissions, concurrency group, checkout + Node.js setup, runs `node scripts/issue-labels/index.js` and emits warning/summary on failure.
Labeler script & core logic `scripts/issue-labels/index.js`	New executable implementing type/domain classification, regex/heuristic scoring, deterministic tie-breaking, domain extraction rules (explicit mentions, command-style, cautious title matches, service keywords), label-change planning (add-only default, optional `--sync-domains` and `--override-type`), GitHub client with retry/rate-limit handling, CLI flags, and exported helpers/constants.
Samples & tests `scripts/issue-labels/samples.json`, `scripts/issue-labels/test.js`	Adds a dataset of labeled issue samples and a test runner that validates `classifyIssueText` against expected type/domains, logs per-sample results, and exits nonzero on failures.
Documentation `scripts/issue-labels/README.md`	New README describing labeling dimensions, domain precedence rules, CLI flags, execution contexts (Actions/local), and test instructions.

Sequence Diagram

sequenceDiagram
    actor Trigger as Cron / Manual
    participant Workflow as "GitHub Actions"
    participant Script as "Node.js Labeler"
    participant API as "GitHub REST API"
    participant Repo as "GitHub Repo"

    Trigger->>Workflow: start workflow
    Workflow->>Workflow: evaluate inputs (dry_run, max-issues)
    Workflow->>Script: run `node scripts/issue-labels/index.js --max-issues 300 [--dry-run --json]`
    Script->>API: search/list unlabeled issues (paged)
    API-->>Script: issues[]
    loop per issue
        Script->>Script: classifyIssueText(title, body)
        Script->>Script: collectDomainsFromText(...)
        Script->>Script: planIssueLabelChanges(...)
    end
    alt dry-run
        Script-->>Workflow: emit JSON report / summary
    else apply changes
        Script->>API: add labels to issue
        API-->>Script: add responses
        Script->>API: remove labels from issue (handle 404)
        API-->>Script: remove responses
    end
    Script-->>Workflow: final summary (scanned, updated, skipped)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I sniffed the titles, counted each clue,
Types in neat rows, domains I drew,
Hourly I wander and nudge with care,
Adding labels here and there,
A little rabbit — tidy, true.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately and concisely summarizes the main change: adding a scheduled GitHub Actions workflow for automated issue labeling with type and domain classification.
Description check	✅ Passed	The pull request description is comprehensive, covering all required template sections (Summary, Changes, Test Plan) with clear explanations of the new workflow, labeling logic, safety considerations, and testing approach.
Docstring Coverage	✅ Passed	Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-04-03T11:45:58Z

Greptile Summary

This PR adds an automated, hourly issue-labeling workflow that classifies unlabeled GitHub issues into Type (bug, enhancement, question, …) and Domain (domain/<service>) labels using keyword/heuristic scoring, with a safe dry-run default for manual dispatches and add-only domain logic.

All concerns from the prior review thread have been resolved in the current code:

Shell injection via lookback_hours — that input was removed; dry_run is safely routed through an env var.
Rate-limit back-off for secondary limits — the 60-second floor (60_000 ms) is now in place.
parseArgs closure bug — let i = 0 is correctly declared in the outer function scope before readFlagValue, so the closure is valid.
Missing-label 422 risk — listMissingManagedLabels() pre-flight check skips issues whose computed labels don't yet exist in the repo.

Remaining findings are all P2:

Several service names used in the loose title-match heuristic (drive, task, event, base, core, auth) are common English words and may produce false-positive domain labels — consider extending the exclusion list beyond just im.
The bare /\\bconfig(\\.json)?\\b/i keyword in the core domain map is much broader than the rest of the core patterns and likely to match unrelated issues.
The actions/checkout and actions/setup-node steps reference floating major-version tags rather than commit SHAs; with issues: write permission this is a minor supply-chain hardening gap.

Confidence Score: 5/5

Safe to merge; all previously flagged P0/P1 issues are resolved and only P2 style/heuristic suggestions remain.

The three critical issues from the prior review thread (shell injection, secondary rate-limit back-off, parseArgs ReferenceError) are all fixed in the current diff. The missing-label pre-flight guard prevents 422 errors. The remaining findings are heuristic tuning suggestions and an action-pinning best practice — none block correct execution or cause data loss.

scripts/issue-labels/index.js — the loose title-match exclusion list and the broad config keyword pattern are worth a second look before the labeler runs on a large backlog.

Important Files Changed

Filename	Overview
.github/workflows/issue-labels.yml	New hourly scheduled + dispatchable workflow; previously flagged shell-injection and rate-limit issues are resolved; inputs are safely routed through env vars.
scripts/issue-labels/index.js	Core labeler script with heuristic type/domain classification; parseArgs closure issue from prior thread is fixed; rate-limit 60s back-off and missing-label pre-flight checks are in place; loose title matching for common English words may cause domain false positives.
scripts/issue-labels/test.js	Lightweight regression runner; exercises classifyIssueText with a subset-check for domains; does not cover API or CLI-arg parsing paths.
scripts/issue-labels/README.md	Usage documentation; accurately describes flags, dry-run behavior, and managed-labels prerequisite.
scripts/issue-labels/samples.json	Regression dataset with real-world issue excerpts and expected type/domain annotations; no code concerns.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions (hourly / dispatch)
    participant Script as index.js
    participant GH_API as GitHub REST API

    GHA->>Script: node index.js --max-issues 300 [--dry-run --json]
    Script->>GH_API: GET /repos/{repo}/labels (paginated)
    GH_API-->>Script: existing labels
    Script->>GH_API: GET /search/issues?q=repo:X is:issue no:label (paginated)
    GH_API-->>Script: up to 300 unlabeled issues

    loop For each issue
        Script->>Script: classifyIssueText(title, body)
        Script->>Script: planIssueLabelChanges()
        alt Labels missing in repo
            Script->>Script: skip + warn
        else dry-run mode
            Script->>Script: log planned changes only
        else write mode
            Script->>GH_API: POST /issues/{n}/labels
            Script->>GH_API: DELETE /issues/{n}/labels/{name}
        end
    end

    Script->>GHA: stdout summary (text or JSON)

_{Reviews (6): Last reviewed commit: "docs(issue-labels): clarify one-shot unl..." | Re-trigger Greptile}

.github/workflows/issue-labels.yml

scripts/issue-labels/index.js

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/issue-labels.yml:
- Around line 39-42: The job "Sync managed issue labels" (id: sync_issue_labels)
currently sets continue-on-error: true which hides failures; remove that key (or
set it to false) so the scheduled write-mode job surfaces errors instead of
appearing successful, ensuring failed label syncs are visible and can be acted
on.

In `@scripts/issue-labels/index.js`:
- Around line 151-155: The bug keyword matcher array (the "bug" entry in
scripts/issue-labels/index.js) is missing the Chinese term "错误", so add it to
that matcher list (e.g., include a /\b错误\b/ or plain /错误/ pattern alongside the
existing Chinese terms) so the Chinese sample (used by node
scripts/issue-labels/test.js, e.g. sample `#185`) is correctly classified as a
bug; update the "bug" array in scripts/issue-labels/index.js accordingly.

In `@scripts/issue-labels/README.md`:
- Around line 28-30: The README wording for the --sync-domains flag overstates
behavior; update the two lines describing default behavior and --sync-domains to
clarify it is not an exact-sync mode by stating that the script only removes
existing domain/* labels when at least one domain in the issue matches (i.e., it
will not remove all stale domain labels if none of the configured domains
match), and add a short note warning operators that --sync-domains may remove
some manual labels but will not guarantee full cleanup of all stale domain/*
labels.

In `@scripts/issue-labels/test.js`:
- Around line 28-32: The test currently treats only undefined expectations as
assertions; change the logic to treat explicit expectations (including null for
expected_type and empty array for expected_domains) as real assertions by
checking whether the sample actually contains the keys rather than only whether
their values are undefined. For expected_type, use a presence check (e.g.,
sample.hasOwnProperty('expected_type')) and if present assert (result.type ||
null) === sample.expected_type (so explicit null requires result.type to be
null); for expected_domains, check presence of sample.expected_domains and, if
present, assert equality/emptiness by comparing sorted arrays (using sortArray)
or using isSubset appropriately so that an explicit [] requires result.domains
to be empty; update the matchType and matchDomains logic to follow these
presence-based assertions (referencing expectedType, matchType, expectedDomains,
matchDomains, sample.expected_type, sample.expected_domains, result.type,
result.domains, sortArray, isSubset).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a90a35aa-8543-4fb9-8cb1-e50e8951f86b

📥 Commits

Reviewing files that changed from the base of the PR and between 135fde8 and d56d6ca.

📒 Files selected for processing (5)

.github/workflows/issue-labels.yml
scripts/issue-labels/README.md
scripts/issue-labels/index.js
scripts/issue-labels/samples.json
scripts/issue-labels/test.js

.github/workflows/issue-labels.yml

scripts/issue-labels/index.js

scripts/issue-labels/README.md

scripts/issue-labels/test.js

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

scripts/issue-labels/index.js (1)

25-42: Derive domain regex alternations from DOMAIN_SERVICES to prevent drift.

The service alternation is duplicated in regex literals; a future service update can silently desync matching behavior from DOMAIN_SERVICES.

♻️ Suggested refactor

 const DOMAIN_SERVICES = [
@@
 ];
+const DOMAIN_ALIASES = ["docs"];
+const DOMAIN_REGEX_ALTERNATION = [...DOMAIN_SERVICES, ...DOMAIN_ALIASES].join("|");
 const DOMAIN_LABELS = DOMAIN_SERVICES.map((s) => `domain/${s}`);
@@
-  const explicit = /\bdomain\/(im|doc|docs|base|sheets|calendar|mail|task|vc|whiteboard|minutes|wiki|event|auth|core)\b/gi;
+  const explicit = new RegExp(`\\bdomain\\/(${DOMAIN_REGEX_ALTERNATION})\\b`, "gi");
@@
-  const cmd = /\blark[-\s]?cli\s+(im|doc|docs|base|sheets|calendar|mail|task|vc|whiteboard|minutes|wiki|event|auth|core)\b/gi;
+  const cmd = new RegExp(`\\blark[-\\s]?cli\\s+(${DOMAIN_REGEX_ALTERNATION})\\b`, "gi");

Also applies to: 144-152

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/issue-labels/index.js` around lines 25 - 42, The code duplicates the
service alternation in regex literals which can drift from DOMAIN_SERVICES;
update the regex construction to derive alternations from DOMAIN_SERVICES (e.g.,
const serviceAlternation = DOMAIN_SERVICES.join('|')) and use that value to
build the RegExp instances (or dynamic regex strings) instead of hard-coded
alternations; replace the literal alternations found around the regex uses
(including the occurrences referenced by DOMAIN_LABELS/DOMAIN_LABEL_SET and the
second occurrence at lines ~144-152) with RegExp constructed from
serviceAlternation so the domain matching always follows DOMAIN_SERVICES.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/issue-labels/index.js`:
- Around line 589-623: The parsing loop currently consumes argv[++i] for
value-taking flags (e.g., "--token", "--repo", "--max-pages", "--max-issues",
"--state"), which can swallow the next flag; update the logic so before
incrementing you peek at argv[i+1] and only consume it if it exists and does not
start with "-" (treat as missing/keep default or report error otherwise),
reusing toInt for numeric flags (--max-pages, --max-issues) and String() for
string flags; add a small helper or inline check (e.g., next = argv[i+1]; if
(!next || next.startsWith("-")) { /* handle missing */ } else { i++; args.xxx =
... }) to avoid flipping behavior by accidentally ingesting other flags.

---

Nitpick comments:
In `@scripts/issue-labels/index.js`:
- Around line 25-42: The code duplicates the service alternation in regex
literals which can drift from DOMAIN_SERVICES; update the regex construction to
derive alternations from DOMAIN_SERVICES (e.g., const serviceAlternation =
DOMAIN_SERVICES.join('|')) and use that value to build the RegExp instances (or
dynamic regex strings) instead of hard-coded alternations; replace the literal
alternations found around the regex uses (including the occurrences referenced
by DOMAIN_LABELS/DOMAIN_LABEL_SET and the second occurrence at lines ~144-152)
with RegExp constructed from serviceAlternation so the domain matching always
follows DOMAIN_SERVICES.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bca06752-41ea-4dd6-9d9b-6f09fdbc48c8

📥 Commits

Reviewing files that changed from the base of the PR and between 965a537 and 9f921fe.

📒 Files selected for processing (4)

.github/workflows/issue-labels.yml
scripts/issue-labels/README.md
scripts/issue-labels/index.js
scripts/issue-labels/test.js

✅ Files skipped from review due to trivial changes (1)

.github/workflows/issue-labels.yml

scripts/issue-labels/index.js

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/issue-labels/index.js`:
- Around line 604-613: The helper function readFlagValue incorrectly references
and mutates the loop variable i which is block-scoped inside the for loop,
causing a ReferenceError; fix this by declaring let i; (or let i = 0;) in the
outer scope before the for (...) so readFlagValue and the loop share the same i
variable, leaving the rest of readFlagValue (checks, throw, i += 1, return
String(value)) and the for loop intact; alternatively, you can refactor
readFlagValue to accept the current index and return the new index/value pair,
but the simplest fix is moving the i declaration outside the for loop so
readFlagValue can access and increment it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62e268cb-330c-4c8a-9c74-90a8ad23ade6

📥 Commits

Reviewing files that changed from the base of the PR and between 87a4823 and ca5532e.

📒 Files selected for processing (1)

scripts/issue-labels/index.js

scripts/issue-labels/index.js

github-actions · 2026-04-03T14:49:13Z

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@e5f2d2f28e47f388ac931c7a1d5481bd1ff749d9

🧩 Skill update

npx skills add williamfzc/cli#main -y -g

williamfzc added 7 commits April 3, 2026 16:35

ci: add issue labeler workflow

02981b8

Add a manual GitHub Actions workflow and script to poll issues and apply type/domain labels.

feat(issue-labels): refine heuristics and add docs

4f79a5d

Improve domain detection and add safeguards to avoid overriding manual type triage by default. Refresh regression samples from real issues and document usage.

ci(issue-labels): enable hourly scheduled labeling

8242dac

Run hourly on schedule with write mode by default while keeping manual dispatch dry-run by default.

ci(issue-labels): shorten lookback window to 6h

ab5c842

Reduce scheduled scan window while keeping overlap for missed runs.

ci(issue-labels): opt into Node 24 actions runtime

9523863

Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 and use Node 24 for the script runtime to avoid upcoming Node 20 deprecation warnings.

ci(issue-labels): restore lookback input for manual runs

cf92839

Allow workflow_dispatch to override lookback_hours while keeping hourly schedule fixed.

ci(issue-labels): upgrade checkout/setup-node to v6

d56d6ca

Use actions/checkout@v6 and actions/setup-node@v6 to align with Node 24 runtime and avoid Node 20 deprecation warnings.

github-actions bot added the size/L Large or sensitive change across domains or core paths label Apr 3, 2026

greptile-apps bot reviewed Apr 3, 2026

View reviewed changes

.github/workflows/issue-labels.yml Outdated Show resolved Hide resolved

scripts/issue-labels/index.js Show resolved Hide resolved

scripts/issue-labels/index.js Show resolved Hide resolved

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

.github/workflows/issue-labels.yml Outdated Show resolved Hide resolved

scripts/issue-labels/index.js Show resolved Hide resolved

scripts/issue-labels/README.md Show resolved Hide resolved

scripts/issue-labels/test.js Outdated Show resolved Hide resolved

fix(ci): label only unlabeled issues via search api

9f921fe

williamfzc force-pushed the main branch from 965a537 to 9f921fe Compare April 3, 2026 12:28

github-actions bot added size/XL Architecture-level or global-impact change and removed size/L Large or sensitive change across domains or core paths labels Apr 3, 2026

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

scripts/issue-labels/index.js Show resolved Hide resolved

williamfzc added 2 commits April 3, 2026 21:38

fix(ci): refine issue labeling heuristics from live issues

87a4823

fix(ci): address remaining issue label review comments

ca5532e

greptile-apps bot reviewed Apr 3, 2026

View reviewed changes

scripts/issue-labels/index.js Outdated Show resolved Hide resolved

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

scripts/issue-labels/index.js Outdated Show resolved Hide resolved

williamfzc added 2 commits April 3, 2026 22:03

fix(ci): fix issue label arg parsing regression

32ab13b

docs(issue-labels): clarify one-shot unlabeled triage scope

e5f2d2f

fengzhangchi-bytedance self-requested a review April 3, 2026 14:44

fengzhangchi-bytedance approved these changes Apr 3, 2026

View reviewed changes

fengzhangchi-bytedance merged commit 2efadec into larksuite:main Apr 7, 2026
7 checks passed

coderabbitai bot mentioned this pull request Apr 7, 2026

fix(issue-labels): reduce mislabeling and handle missing labels #288

Merged

Conversation

williamfzc commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Safety / Behavior Notes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

greptile-apps bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Apr 3, 2026

🚀 PR Preview Install Guide

🧰 CLI update

🧩 Skill update

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

williamfzc commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

greptile-apps bot commented Apr 3, 2026 •

edited

Loading