Skip to content

feat: add scheduled issue labeler for type/domain triage#251

Merged
fengzhangchi-bytedance merged 12 commits intolarksuite:mainfrom
williamfzc:main
Apr 7, 2026
Merged

feat: add scheduled issue labeler for type/domain triage#251
fengzhangchi-bytedance merged 12 commits intolarksuite:mainfrom
williamfzc:main

Conversation

@williamfzc
Copy link
Copy Markdown
Contributor

@williamfzc williamfzc commented Apr 3, 2026

Summary

This PR adds an automated issue labeling workflow that polls issues and applies Type and Domain labels based on heuristics.

Changes

  • Add GitHub Actions workflow: .github/workflows/issue-labels.yml
    • Runs hourly on schedule and writes labels by default
    • Supports workflow_dispatch with dry_run (default: true) and lookback_hours override
    • Upgrade to actions/checkout@v6 and actions/setup-node@v6 for Node 24 runtime alignment
  • Add issue labeler script: scripts/issue-labels/index.js
    • Type labels (keyword-based); does not override existing type labels by default
    • Domain labels domain/<service>; add-only by default, optional strict sync via --sync-domains
    • Optional --override-type to enforce computed type labels
  • Add regression dataset and runner:
    • scripts/issue-labels/samples.json
    • scripts/issue-labels/test.js
  • Add usage documentation: scripts/issue-labels/README.md

Safety / Behavior Notes

  • Type labels are only applied when keywords match; otherwise no type changes are made.
  • Existing type labels are preserved by default to avoid reverting manual triage.
  • Domain labels are add-only by default; no removals unless --sync-domains is explicitly enabled.

Test Plan

  • node scripts/issue-labels/test.js

Summary by CodeRabbit

  • New Features

    • Automated issue-labeling that assigns a single Type (bug, enhancement, question, documentation, performance, security) and one or more Domain labels with configurable sync/override options.
  • Chores

    • Added a scheduled (hourly) and manual workflow to run label syncs with safe dry-run by default, concurrency controls, limited permissions, and failure warnings.
  • Documentation

    • Added README describing labeling rules, flags, and usage.
  • Tests

    • Added sample dataset and test runner to validate classification.

Add a manual GitHub Actions workflow and script to poll issues and apply type/domain labels.
Improve domain detection and add safeguards to avoid overriding manual type triage by default. Refresh regression samples from real issues and document usage.
Run hourly on schedule with write mode by default while keeping manual dispatch dry-run by default.
Reduce scheduled scan window while keeping overlap for missed runs.
Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 and use Node 24 for the script runtime to avoid upcoming Node 20 deprecation warnings.
Allow workflow_dispatch to override lookback_hours while keeping hourly schedule fixed.
Use actions/checkout@v6 and actions/setup-node@v6 to align with Node 24 runtime and avoid Node 20 deprecation warnings.
@github-actions github-actions bot added the size/L Large or sensitive change across domains or core paths label Apr 3, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a scheduled/manual GitHub Actions workflow and a new Node.js issue-labeler with heuristics for assigning single-select Type and multi-select Domain labels, plus sample data, a test runner, and README documentation.

Changes

Cohort / File(s) Summary
Workflow
/.github/workflows/issue-labels.yml
New hourly + manual GitHub Actions workflow (Issue Labels) with dry_run input, minimal permissions, concurrency group, checkout + Node.js setup, runs node scripts/issue-labels/index.js and emits warning/summary on failure.
Labeler script & core logic
scripts/issue-labels/index.js
New executable implementing type/domain classification, regex/heuristic scoring, deterministic tie-breaking, domain extraction rules (explicit mentions, command-style, cautious title matches, service keywords), label-change planning (add-only default, optional --sync-domains and --override-type), GitHub client with retry/rate-limit handling, CLI flags, and exported helpers/constants.
Samples & tests
scripts/issue-labels/samples.json, scripts/issue-labels/test.js
Adds a dataset of labeled issue samples and a test runner that validates classifyIssueText against expected type/domains, logs per-sample results, and exits nonzero on failures.
Documentation
scripts/issue-labels/README.md
New README describing labeling dimensions, domain precedence rules, CLI flags, execution contexts (Actions/local), and test instructions.

Sequence Diagram

sequenceDiagram
    actor Trigger as Cron / Manual
    participant Workflow as "GitHub Actions"
    participant Script as "Node.js Labeler"
    participant API as "GitHub REST API"
    participant Repo as "GitHub Repo"

    Trigger->>Workflow: start workflow
    Workflow->>Workflow: evaluate inputs (dry_run, max-issues)
    Workflow->>Script: run `node scripts/issue-labels/index.js --max-issues 300 [--dry-run --json]`
    Script->>API: search/list unlabeled issues (paged)
    API-->>Script: issues[]
    loop per issue
        Script->>Script: classifyIssueText(title, body)
        Script->>Script: collectDomainsFromText(...)
        Script->>Script: planIssueLabelChanges(...)
    end
    alt dry-run
        Script-->>Workflow: emit JSON report / summary
    else apply changes
        Script->>API: add labels to issue
        API-->>Script: add responses
        Script->>API: remove labels from issue (handle 404)
        API-->>Script: remove responses
    end
    Script-->>Workflow: final summary (scanned, updated, skipped)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I sniffed the titles, counted each clue,
Types in neat rows, domains I drew,
Hourly I wander and nudge with care,
Adding labels here and there,
A little rabbit — tidy, true.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately and concisely summarizes the main change: adding a scheduled GitHub Actions workflow for automated issue labeling with type and domain classification.
Description check ✅ Passed The pull request description is comprehensive, covering all required template sections (Summary, Changes, Test Plan) with clear explanations of the new workflow, labeling logic, safety considerations, and testing approach.
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR adds an automated, hourly issue-labeling workflow that classifies unlabeled GitHub issues into Type (bug, enhancement, question, …) and Domain (domain/<service>) labels using keyword/heuristic scoring, with a safe dry-run default for manual dispatches and add-only domain logic.

All concerns from the prior review thread have been resolved in the current code:

  • Shell injection via lookback_hours — that input was removed; dry_run is safely routed through an env var.
  • Rate-limit back-off for secondary limits — the 60-second floor (60_000 ms) is now in place.
  • parseArgs closure bug — let i = 0 is correctly declared in the outer function scope before readFlagValue, so the closure is valid.
  • Missing-label 422 risk — listMissingManagedLabels() pre-flight check skips issues whose computed labels don't yet exist in the repo.

Remaining findings are all P2:

  • Several service names used in the loose title-match heuristic (drive, task, event, base, core, auth) are common English words and may produce false-positive domain labels — consider extending the exclusion list beyond just im.
  • The bare /\\bconfig(\\.json)?\\b/i keyword in the core domain map is much broader than the rest of the core patterns and likely to match unrelated issues.
  • The actions/checkout and actions/setup-node steps reference floating major-version tags rather than commit SHAs; with issues: write permission this is a minor supply-chain hardening gap.

Confidence Score: 5/5

Safe to merge; all previously flagged P0/P1 issues are resolved and only P2 style/heuristic suggestions remain.

The three critical issues from the prior review thread (shell injection, secondary rate-limit back-off, parseArgs ReferenceError) are all fixed in the current diff. The missing-label pre-flight guard prevents 422 errors. The remaining findings are heuristic tuning suggestions and an action-pinning best practice — none block correct execution or cause data loss.

scripts/issue-labels/index.js — the loose title-match exclusion list and the broad config keyword pattern are worth a second look before the labeler runs on a large backlog.

Important Files Changed

Filename Overview
.github/workflows/issue-labels.yml New hourly scheduled + dispatchable workflow; previously flagged shell-injection and rate-limit issues are resolved; inputs are safely routed through env vars.
scripts/issue-labels/index.js Core labeler script with heuristic type/domain classification; parseArgs closure issue from prior thread is fixed; rate-limit 60s back-off and missing-label pre-flight checks are in place; loose title matching for common English words may cause domain false positives.
scripts/issue-labels/test.js Lightweight regression runner; exercises classifyIssueText with a subset-check for domains; does not cover API or CLI-arg parsing paths.
scripts/issue-labels/README.md Usage documentation; accurately describes flags, dry-run behavior, and managed-labels prerequisite.
scripts/issue-labels/samples.json Regression dataset with real-world issue excerpts and expected type/domain annotations; no code concerns.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions (hourly / dispatch)
    participant Script as index.js
    participant GH_API as GitHub REST API

    GHA->>Script: node index.js --max-issues 300 [--dry-run --json]
    Script->>GH_API: GET /repos/{repo}/labels (paginated)
    GH_API-->>Script: existing labels
    Script->>GH_API: GET /search/issues?q=repo:X is:issue no:label (paginated)
    GH_API-->>Script: up to 300 unlabeled issues

    loop For each issue
        Script->>Script: classifyIssueText(title, body)
        Script->>Script: planIssueLabelChanges()
        alt Labels missing in repo
            Script->>Script: skip + warn
        else dry-run mode
            Script->>Script: log planned changes only
        else write mode
            Script->>GH_API: POST /issues/{n}/labels
            Script->>GH_API: DELETE /issues/{n}/labels/{name}
        end
    end

    Script->>GHA: stdout summary (text or JSON)
Loading

Reviews (6): Last reviewed commit: "docs(issue-labels): clarify one-shot unl..." | Re-trigger Greptile

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/issue-labels.yml:
- Around line 39-42: The job "Sync managed issue labels" (id: sync_issue_labels)
currently sets continue-on-error: true which hides failures; remove that key (or
set it to false) so the scheduled write-mode job surfaces errors instead of
appearing successful, ensuring failed label syncs are visible and can be acted
on.

In `@scripts/issue-labels/index.js`:
- Around line 151-155: The bug keyword matcher array (the "bug" entry in
scripts/issue-labels/index.js) is missing the Chinese term "错误", so add it to
that matcher list (e.g., include a /\b错误\b/ or plain /错误/ pattern alongside the
existing Chinese terms) so the Chinese sample (used by node
scripts/issue-labels/test.js, e.g. sample `#185`) is correctly classified as a
bug; update the "bug" array in scripts/issue-labels/index.js accordingly.

In `@scripts/issue-labels/README.md`:
- Around line 28-30: The README wording for the --sync-domains flag overstates
behavior; update the two lines describing default behavior and --sync-domains to
clarify it is not an exact-sync mode by stating that the script only removes
existing domain/* labels when at least one domain in the issue matches (i.e., it
will not remove all stale domain labels if none of the configured domains
match), and add a short note warning operators that --sync-domains may remove
some manual labels but will not guarantee full cleanup of all stale domain/*
labels.

In `@scripts/issue-labels/test.js`:
- Around line 28-32: The test currently treats only undefined expectations as
assertions; change the logic to treat explicit expectations (including null for
expected_type and empty array for expected_domains) as real assertions by
checking whether the sample actually contains the keys rather than only whether
their values are undefined. For expected_type, use a presence check (e.g.,
sample.hasOwnProperty('expected_type')) and if present assert (result.type ||
null) === sample.expected_type (so explicit null requires result.type to be
null); for expected_domains, check presence of sample.expected_domains and, if
present, assert equality/emptiness by comparing sorted arrays (using sortArray)
or using isSubset appropriately so that an explicit [] requires result.domains
to be empty; update the matchType and matchDomains logic to follow these
presence-based assertions (referencing expectedType, matchType, expectedDomains,
matchDomains, sample.expected_type, sample.expected_domains, result.type,
result.domains, sortArray, isSubset).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a90a35aa-8543-4fb9-8cb1-e50e8951f86b

📥 Commits

Reviewing files that changed from the base of the PR and between 135fde8 and d56d6ca.

📒 Files selected for processing (5)
  • .github/workflows/issue-labels.yml
  • scripts/issue-labels/README.md
  • scripts/issue-labels/index.js
  • scripts/issue-labels/samples.json
  • scripts/issue-labels/test.js

@github-actions github-actions bot added size/XL Architecture-level or global-impact change and removed size/L Large or sensitive change across domains or core paths labels Apr 3, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/issue-labels/index.js (1)

25-42: Derive domain regex alternations from DOMAIN_SERVICES to prevent drift.

The service alternation is duplicated in regex literals; a future service update can silently desync matching behavior from DOMAIN_SERVICES.

♻️ Suggested refactor
 const DOMAIN_SERVICES = [
@@
 ];
+const DOMAIN_ALIASES = ["docs"];
+const DOMAIN_REGEX_ALTERNATION = [...DOMAIN_SERVICES, ...DOMAIN_ALIASES].join("|");
 const DOMAIN_LABELS = DOMAIN_SERVICES.map((s) => `domain/${s}`);
@@
-  const explicit = /\bdomain\/(im|doc|docs|base|sheets|calendar|mail|task|vc|whiteboard|minutes|wiki|event|auth|core)\b/gi;
+  const explicit = new RegExp(`\\bdomain\\/(${DOMAIN_REGEX_ALTERNATION})\\b`, "gi");
@@
-  const cmd = /\blark[-\s]?cli\s+(im|doc|docs|base|sheets|calendar|mail|task|vc|whiteboard|minutes|wiki|event|auth|core)\b/gi;
+  const cmd = new RegExp(`\\blark[-\\s]?cli\\s+(${DOMAIN_REGEX_ALTERNATION})\\b`, "gi");

Also applies to: 144-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/issue-labels/index.js` around lines 25 - 42, The code duplicates the
service alternation in regex literals which can drift from DOMAIN_SERVICES;
update the regex construction to derive alternations from DOMAIN_SERVICES (e.g.,
const serviceAlternation = DOMAIN_SERVICES.join('|')) and use that value to
build the RegExp instances (or dynamic regex strings) instead of hard-coded
alternations; replace the literal alternations found around the regex uses
(including the occurrences referenced by DOMAIN_LABELS/DOMAIN_LABEL_SET and the
second occurrence at lines ~144-152) with RegExp constructed from
serviceAlternation so the domain matching always follows DOMAIN_SERVICES.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/issue-labels/index.js`:
- Around line 589-623: The parsing loop currently consumes argv[++i] for
value-taking flags (e.g., "--token", "--repo", "--max-pages", "--max-issues",
"--state"), which can swallow the next flag; update the logic so before
incrementing you peek at argv[i+1] and only consume it if it exists and does not
start with "-" (treat as missing/keep default or report error otherwise),
reusing toInt for numeric flags (--max-pages, --max-issues) and String() for
string flags; add a small helper or inline check (e.g., next = argv[i+1]; if
(!next || next.startsWith("-")) { /* handle missing */ } else { i++; args.xxx =
... }) to avoid flipping behavior by accidentally ingesting other flags.

---

Nitpick comments:
In `@scripts/issue-labels/index.js`:
- Around line 25-42: The code duplicates the service alternation in regex
literals which can drift from DOMAIN_SERVICES; update the regex construction to
derive alternations from DOMAIN_SERVICES (e.g., const serviceAlternation =
DOMAIN_SERVICES.join('|')) and use that value to build the RegExp instances (or
dynamic regex strings) instead of hard-coded alternations; replace the literal
alternations found around the regex uses (including the occurrences referenced
by DOMAIN_LABELS/DOMAIN_LABEL_SET and the second occurrence at lines ~144-152)
with RegExp constructed from serviceAlternation so the domain matching always
follows DOMAIN_SERVICES.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bca06752-41ea-4dd6-9d9b-6f09fdbc48c8

📥 Commits

Reviewing files that changed from the base of the PR and between 965a537 and 9f921fe.

📒 Files selected for processing (4)
  • .github/workflows/issue-labels.yml
  • scripts/issue-labels/README.md
  • scripts/issue-labels/index.js
  • scripts/issue-labels/test.js
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/issue-labels.yml

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/issue-labels/index.js`:
- Around line 604-613: The helper function readFlagValue incorrectly references
and mutates the loop variable i which is block-scoped inside the for loop,
causing a ReferenceError; fix this by declaring let i; (or let i = 0;) in the
outer scope before the for (...) so readFlagValue and the loop share the same i
variable, leaving the rest of readFlagValue (checks, throw, i += 1, return
String(value)) and the for loop intact; alternatively, you can refactor
readFlagValue to accept the current index and return the new index/value pair,
but the simplest fix is moving the i declaration outside the for loop so
readFlagValue can access and increment it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62e268cb-330c-4c8a-9c74-90a8ad23ade6

📥 Commits

Reviewing files that changed from the base of the PR and between 87a4823 and ca5532e.

📒 Files selected for processing (1)
  • scripts/issue-labels/index.js

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@e5f2d2f28e47f388ac931c7a1d5481bd1ff749d9

🧩 Skill update

npx skills add williamfzc/cli#main -y -g

@fengzhangchi-bytedance fengzhangchi-bytedance merged commit 2efadec into larksuite:main Apr 7, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Architecture-level or global-impact change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants