pkg/metrics: add auto analyze failed alert rule by 0xPoe · Pull Request #67733 · pingcap/tidb

0xPoe · 2026-04-13T14:32:29Z

What problem does this PR solve?

Issue Number: ref #63934

Problem Summary:
TiDB does not have an alert rule for failed auto-analyze tasks.

What changed and how does it work?

Add an Alertmanager rule that fires when tidb_statistics_auto_analyze_total{type="failed"} increases in the last 10 minutes.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.
- This change only adds an alert rule for an existing metric.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

New Features
- Added alert rule for TiDB auto-analyze failures that triggers when failures are detected over a 10-minute interval, providing diagnostic information including instance details and environment context for faster issue identification and response.

pantheon-ai · 2026-04-13T14:32:35Z

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

_{ℹ️ Learn more details on Pantheon AI.}

coderabbitai · 2026-04-13T14:32:56Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d4a77252-3f35-40d9-ab28-c9ef2bc04065

📥 Commits

Reviewing files that changed from the base of the PR and between 885b297 and 37fac92.

📒 Files selected for processing (1)

pkg/metrics/alertmanager/tidb.rules.yml

📝 Walkthrough

Walkthrough

A new alert rule TiDB_auto_analyze_failed has been added to the TiDB alerting configuration. This rule triggers when the failed auto-analyze statistics counter increases within a 10-minute window and persists for 1 minute, including environment labels and detailed annotations for incident context.

Changes

Cohort / File(s)	Summary
TiDB Auto-Analyze Alert `pkg/metrics/alertmanager/tidb.rules.yml`	Added alert rule `TiDB_auto_analyze_failed` that fires when auto-analyze operations fail, with labels for environment and severity level, plus annotations for dynamic description, summary, and value reporting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A watchful whisker twitches with delight,
New alerts now guard the stats through the night,
When analyze stumbles and fails to complete,
The monitoring bell rings—no defeat!
One rule to alert them, one rule so right! 🔔

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding an alert rule for failed auto-analyze tasks in the metrics package.
Description check	✅ Passed	The description is complete with all required sections: issue reference, problem summary, explanation of changes, properly marked test checkboxes, and release notes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-13T14:45:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.4344%. Comparing base (885b297) to head (37fac92).
⚠️ Report is 15 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #67733        +/-   ##
================================================
- Coverage   77.6100%   77.4344%   -0.1756%     
================================================
  Files          1981       1965        -16     
  Lines        548611     548624        +13     
================================================
- Hits         425777     424824       -953     
- Misses       122024     123798      +1774     
+ Partials        810          2       -808

Flag	Coverage Δ
integration	`40.9241% <ø> (+6.5844%)`	⬆️
unit	`76.6434% <ø> (+0.3059%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`61.5065% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`49.9052% <ø> (-10.5308%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pantheon-ai

✅ Code looks good. No issues found.

0xPoe · 2026-04-15T13:03:43Z

Tested locally:

export WORK_DIR="$HOME/tmp/review-67733-alert"
export PLAY_TAG="case-review-67733-alert-default"
export PORT_OFFSET=38000

export SQL_PORT=42000
export STATUS_PORT=48080
export PROM_PORT=43194

export PROM_RULES="$WORK_DIR/tidb-auto-analyze.rules.yml"
export PROM_CFG="$WORK_DIR/prometheus.yml"
export PROM_DATA="$WORK_DIR/prom-data"
export PROM_BIN="$HOME/.tiup/components/prometheus/v8.5.5/prometheus/prometheus"

1. Start A Real Playground Cluster

tiup playground nightly \
  --db 1 --pd 1 --kv 1 --tiflash 0 --without-monitor \
  --tag "$PLAY_TAG" \
  --port-offset "$PORT_OFFSET" --db.binpath /Users/poe/code/tidb/bin/tidb-server

2. Start Standalone Prometheus With The PR Rule

command mkdir -p -- "$WORK_DIR" "$PROM_DATA"

cat > "$PROM_RULES" <<'RULES'
groups:
  - name: alert.rules
    rules:
      - alert: TiDB_auto_analyze_failed
        expr: increase( tidb_statistics_auto_analyze_total{type="failed"}[10m] ) > 0
        for: 1m
        labels:
          env: ENV_LABELS_ENV
          level: warning
          expr: increase( tidb_statistics_auto_analyze_total{type="failed"}[10m] ) > 0
        annotations:
          description: 'cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}'
          value: '{{ $value }}'
          summary: TiDB auto analyze failed
RULES

cat > "$PROM_CFG" <<EOF_CFG
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - $PROM_RULES

scrape_configs:
  - job_name: "tidb-review-67733-default"
    static_configs:
      - targets:
          - "127.0.0.1:$STATUS_PORT"
EOF_CFG

"$PROM_BIN" \
  --config.file="$PROM_CFG" \
  --web.listen-address="127.0.0.1:$PROM_PORT" \
  --storage.tsdb.path="$PROM_DATA"

Wait until:

curl -sf "http://127.0.0.1:$PROM_PORT/-/ready"

returns:

Prometheus Server is Ready.

3. Python Checker

Create and activate a local venv in the same shell first:

python3 -m venv "$WORK_DIR/.venv"
. "$WORK_DIR/.venv/bin/activate"
python -m pip install PyMySQL

cat > "$WORK_DIR/check_alert_timeout_e2e.py" <<'PY'
import json
import os
import sys
import time
import urllib.parse
import urllib.request

try:
    import pymysql
except ModuleNotFoundError as exc:
    raise SystemExit("Missing dependency: activate the venv and run `python -m pip install PyMySQL`") from exc

HOST = os.getenv("CASE_HOST", "127.0.0.1")
PORT = int(os.getenv("CASE_SQL_PORT", "42000"))
STATUS_PORT = int(os.getenv("CASE_STATUS_PORT", "48080"))
PROM_PORT = int(os.getenv("CASE_PROM_PORT", "43194"))
DB = os.getenv("CASE_DB", "review67733_timeout")
TABLE = os.getenv("CASE_TABLE", "t_auto")
ROW_BATCH = int(os.getenv("CASE_ROW_BATCH", "5000"))
BASE_ROWS = int(os.getenv("CASE_BASE_ROWS", "5000000"))
DELTA_ROWS = int(os.getenv("CASE_DELTA_ROWS", "1000000"))
FAIL_ROUNDS = int(os.getenv("CASE_FAIL_ROUNDS", "2"))
MAX_TIME = int(os.getenv("CASE_MAX_AUTO_ANALYZE_TIME", "1"))
POLL_INTERVAL = float(os.getenv("CASE_POLL_INTERVAL", "0.1"))


def connect(db=None, autocommit=True):
    return pymysql.connect(
        host=HOST,
        port=PORT,
        user="root",
        password="",
        database=db,
        autocommit=autocommit,
        charset="utf8mb4",
        cursorclass=pymysql.cursors.DictCursor,
        read_timeout=120,
        write_timeout=120,
    )


def exec_sql(cur, sql, args=None):
    cur.execute(sql, args)
    try:
        return cur.fetchall()
    except Exception:
        return None


def http_json(url):
    with urllib.request.urlopen(url, timeout=15) as resp:
        return json.loads(resp.read().decode("utf-8"))


def prom_query(expr):
    query = urllib.parse.quote(expr)
    data = http_json(f"http://127.0.0.1:{PROM_PORT}/api/v1/query?query={query}")
    return data["data"]["result"]


def prom_rule_state():
    data = http_json(f"http://127.0.0.1:{PROM_PORT}/api/v1/rules")
    for group in data["data"]["groups"]:
        for rule in group["rules"]:
            if rule.get("name") == "TiDB_auto_analyze_failed":
                return rule
    raise RuntimeError("TiDB_auto_analyze_failed not found")


def prom_alerts():
    return http_json(f"http://127.0.0.1:{PROM_PORT}/api/v1/alerts")["data"]["alerts"]


def wait_for_alert_state(target, timeout_sec):
    deadline = time.time() + timeout_sec
    last_rule = None
    last_alerts = []
    while time.time() < deadline:
        last_rule = prom_rule_state()
        last_alerts = prom_alerts()
        if last_rule.get("state") == target:
            return last_rule, last_alerts
        time.sleep(POLL_INTERVAL)
    return last_rule, last_alerts


def insert_rows(cur, start, count):
    remaining = count
    next_id = start
    sql = f'''
    insert into {DB}.{TABLE}
    (id, a, b, c, d, e, f, g, h, s)
    values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    '''
    while remaining > 0:
        batch = min(ROW_BATCH, remaining)
        values = []
        for i in range(batch):
            v = next_id + i
            row = [v]
            for mod in [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000]:
                row.append(v % mod)
            row.append(f"s{v:08d}")
            values.append(tuple(row))
        cur.executemany(sql, values)
        next_id += batch
        remaining -= batch


def latest_analyze_rows(cur):
    rows = exec_sql(cur, f"show analyze status where table_schema = '{DB}' and table_name = '{TABLE}'")
    return rows or []


def latest_failed_auto_analyze(rows):
    for row in rows:
        txt = json.dumps(row, default=str).lower()
        if "auto analyze" in txt and "failed" in txt:
            return row
    return None


def metric_failed():
    text = urllib.request.urlopen(f"http://{HOST}:{STATUS_PORT}/metrics", timeout=15).read().decode("utf-8", errors="replace")
    for line in text.splitlines():
        if line.startswith('tidb_statistics_auto_analyze_total{type="failed"}'):
            return float(line.split()[-1])
    return 0.0


def main():
    urllib.request.urlopen(f"http://127.0.0.1:{PROM_PORT}/-/ready", timeout=15).read()
    print("initial rule:", json.dumps(prom_rule_state(), default=str, indent=2), flush=True)

    with connect() as conn:
        with conn.cursor() as cur:
            exec_sql(cur, f"drop database if exists {DB}")
            exec_sql(cur, f"create database {DB}")
            exec_sql(cur, f"use {DB}")
            exec_sql(cur, "set global tidb_enable_auto_analyze = 0")
            exec_sql(cur, "set global tidb_auto_analyze_ratio = 0.01")
            exec_sql(cur, "set global tidb_auto_analyze_start_time = '00:00 +0000'")
            exec_sql(cur, "set global tidb_auto_analyze_end_time = '23:59 +0000'")
            exec_sql(cur, "set global tidb_analyze_version = 2")
            exec_sql(cur, "set global tidb_max_auto_analyze_time = %s", (MAX_TIME,))
            exec_sql(
                cur,
                f'''
                create table {TABLE} (
                    id bigint primary key,
                    a bigint,
                    b bigint,
                    c bigint,
                    d bigint,
                    e bigint,
                    f bigint,
                    g bigint,
                    h bigint,
                    s varchar(32),
                    index ia(a),
                    index ib(b),
                    index ic(c),
                    index idd(d),
                    index ie(e),
                    index iff(f),
                    index ig(g),
                    index ih(h),
                    index is1(s)
                )
                ''',
            )

            print("loading baseline rows", BASE_ROWS, flush=True)
            insert_rows(cur, 1, BASE_ROWS)
            exec_sql(cur, "flush stats_delta")
            exec_sql(cur, f"analyze table {TABLE}")

            before = metric_failed()
            next_id = BASE_ROWS + 1
            for i in range(FAIL_ROUNDS):
                print("round", i + 1, "delta rows", DELTA_ROWS, flush=True)
                round_before = metric_failed()
                exec_sql(cur, "set global tidb_enable_auto_analyze = 0")
                insert_rows(cur, next_id, DELTA_ROWS)
                next_id += DELTA_ROWS
                exec_sql(cur, "flush stats_delta")
                exec_sql(cur, "set global tidb_enable_auto_analyze = 1")

                deadline = time.time() + 300
                row = None
                rows = []
                while time.time() < deadline:
                    rows = latest_analyze_rows(cur)
                    if metric_failed() > round_before:
                        row = latest_failed_auto_analyze(rows)
                        if row is not None:
                            break
                    time.sleep(POLL_INTERVAL)
                print("rows", json.dumps(rows, default=str, indent=2), flush=True)
                if row is None:
                    return 2

            after = metric_failed()
            print("failed metric", before, after, flush=True)
            print("prom increase", json.dumps(prom_query('increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m])'), indent=2), flush=True)
            pending_rule, pending_alerts = wait_for_alert_state("pending", 120)
            print("pending_rule", json.dumps(pending_rule, default=str, indent=2), flush=True)
            print("pending_alerts", json.dumps(pending_alerts, default=str, indent=2), flush=True)
            if pending_rule.get("state") != "pending":
                return 3
            firing_rule, firing_alerts = wait_for_alert_state("firing", 180)
            print("firing_rule", json.dumps(firing_rule, default=str, indent=2), flush=True)
            print("firing_alerts", json.dumps(firing_alerts, default=str, indent=2), flush=True)
            if firing_rule.get("state") != "firing":
                return 4
            return 0


if __name__ == "__main__":
    sys.exit(main())
PY

Run it:

CASE_SQL_PORT="$SQL_PORT" CASE_STATUS_PORT="$STATUS_PORT" CASE_PROM_PORT="$PROM_PORT" \
python "$WORK_DIR/check_alert_timeout_e2e.py"

Verified Result

initial rule: {
  "state": "inactive",
  "name": "TiDB_auto_analyze_failed",
  "query": "increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m]) > 0",
  "duration": 60,
  "keepFiringFor": 0,
  "labels": {
    "env": "ENV_LABELS_ENV",
    "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
    "level": "warning"
  },
  "annotations": {
    "description": "cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}",
    "summary": "TiDB auto analyze failed",
    "value": "{{ $value }}"
  },
  "alerts": [],
  "health": "ok",
  "evaluationTime": 0.000341417,
  "lastEvaluation": "2026-04-15T14:51:41.250725+02:00",
  "type": "alerting"
}
loading baseline rows 5000000
round 1 delta rows 1000000
rows [
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "auto analyze table all indexes, all columns with 256 buckets, 100 topn, 0.018333333333333333 samplerate",
    "Processed_rows": 655661,
    "Start_time": "2026-04-15 14:58:29",
    "End_time": "2026-04-15 14:58:30",
    "State": "failed",
    "Fail_reason": "[executor:1317]Query execution was interrupted",
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  },
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "analyze table all indexes, all columns with 256 buckets, 100 topn, 0.03308270676691729 samplerate",
    "Processed_rows": 5000000,
    "Start_time": "2026-04-15 14:57:20",
    "End_time": "2026-04-15 14:57:22",
    "State": "finished",
    "Fail_reason": null,
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  }
]
round 2 delta rows 1000000
rows [
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "auto analyze table all indexes, all columns with 256 buckets, 100 topn, 0.015714285714285715 samplerate",
    "Processed_rows": 655661,
    "Start_time": "2026-04-15 14:59:38",
    "End_time": "2026-04-15 14:59:39",
    "State": "failed",
    "Fail_reason": "[executor:1317]Query execution was interrupted",
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  },
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "auto analyze table all indexes, all columns with 256 buckets, 100 topn, 0.018333333333333333 samplerate",
    "Processed_rows": 655661,
    "Start_time": "2026-04-15 14:58:29",
    "End_time": "2026-04-15 14:58:30",
    "State": "failed",
    "Fail_reason": "[executor:1317]Query execution was interrupted",
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  },
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "analyze table all indexes, all columns with 256 buckets, 100 topn, 0.03308270676691729 samplerate",
    "Processed_rows": 5000000,
    "Start_time": "2026-04-15 14:57:20",
    "End_time": "2026-04-15 14:57:22",
    "State": "finished",
    "Fail_reason": null,
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  }
]
failed metric 0.0 2.0
prom increase [
  {
    "metric": {
      "instance": "127.0.0.1:48080",
      "job": "tidb-review-67733-default",
      "type": "failed"
    },
    "value": [
      1776257979.41,
      "0"
    ]
  }
]
pending_rule {
  "state": "pending",
  "name": "TiDB_auto_analyze_failed",
  "query": "increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m]) > 0",
  "duration": 60,
  "keepFiringFor": 0,
  "labels": {
    "env": "ENV_LABELS_ENV",
    "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
    "level": "warning"
  },
  "annotations": {
    "description": "cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}",
    "summary": "TiDB auto analyze failed",
    "value": "{{ $value }}"
  },
  "alerts": [
    {
      "labels": {
        "alertname": "TiDB_auto_analyze_failed",
        "env": "ENV_LABELS_ENV",
        "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
        "instance": "127.0.0.1:48080",
        "job": "tidb-review-67733-default",
        "level": "warning",
        "type": "failed"
      },
      "annotations": {
        "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.2420914387808162",
        "summary": "TiDB auto analyze failed",
        "value": "1.2420914387808162"
      },
      "state": "pending",
      "activeAt": "2026-04-15T12:59:56.224973056Z",
      "value": "1.2420914387808162e+00"
    }
  ],
  "health": "ok",
  "evaluationTime": 0.002013916,
  "lastEvaluation": "2026-04-15T15:00:11.21733+02:00",
  "type": "alerting"
}
pending_alerts [
  {
    "labels": {
      "alertname": "TiDB_auto_analyze_failed",
      "env": "ENV_LABELS_ENV",
      "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
      "instance": "127.0.0.1:48080",
      "job": "tidb-review-67733-default",
      "level": "warning",
      "type": "failed"
    },
    "annotations": {
      "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.2420914387808162",
      "summary": "TiDB auto analyze failed",
      "value": "1.2420914387808162"
    },
    "state": "pending",
    "activeAt": "2026-04-15T12:59:56.224973056Z",
    "value": "1.2420914387808162e+00"
  }
]
firing_rule {
  "state": "firing",
  "name": "TiDB_auto_analyze_failed",
  "query": "increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m]) > 0",
  "duration": 60,
  "keepFiringFor": 0,
  "labels": {
    "env": "ENV_LABELS_ENV",
    "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
    "level": "warning"
  },
  "annotations": {
    "description": "cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}",
    "summary": "TiDB auto analyze failed",
    "value": "{{ $value }}"
  },
  "alerts": [
    {
      "labels": {
        "alertname": "TiDB_auto_analyze_failed",
        "env": "ENV_LABELS_ENV",
        "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
        "instance": "127.0.0.1:48080",
        "job": "tidb-review-67733-default",
        "level": "warning",
        "type": "failed"
      },
      "annotations": {
        "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.1344957115544",
        "summary": "TiDB auto analyze failed",
        "value": "1.1344957115544"
      },
      "state": "firing",
      "activeAt": "2026-04-15T12:59:56.224973056Z",
      "value": "1.1344957115544e+00"
    }
  ],
  "health": "ok",
  "evaluationTime": 0.002007042,
  "lastEvaluation": "2026-04-15T15:01:11.217456+02:00",
  "type": "alerting"
}
firing_alerts [
  {
    "labels": {
      "alertname": "TiDB_auto_analyze_failed",
      "env": "ENV_LABELS_ENV",
      "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
      "instance": "127.0.0.1:48080",
      "job": "tidb-review-67733-default",
      "level": "warning",
      "type": "failed"
    },
    "annotations": {
      "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.1344957115544",
      "summary": "TiDB auto analyze failed",
      "value": "1.1344957115544"
    },
    "state": "firing",
    "activeAt": "2026-04-15T12:59:56.224973056Z",
    "value": "1.1344957115544e+00"
  }
]

0xPoe

🔢 Self-check (PR reviewed by myself and ready for feedback)

Code compiles successfully
Tested locally
No AI-generated elegant nonsense in PR.
Comments added where necessary
PR title and description updated
Documentation PR created (I will update it later)
PR size is reasonable

/cc @AilinKid @time-and-fate @XuHuaiyu

ti-chi-bot · 2026-04-16T13:40:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: time-and-fate, XuHuaiyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/metrics/OWNERS~~ [XuHuaiyu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-04-16T13:40:32Z

[LGTM Timeline notifier]

Timeline:

2026-04-15 18:37:04.90911353 +0000 UTC m=+1586230.114473587: ☑️ agreed by time-and-fate.
2026-04-16 13:40:31.12331886 +0000 UTC m=+1654836.328678907: ☑️ agreed by XuHuaiyu.

pkg/metrics: add auto analyze failed alert rule

37fac92

ti-chi-bot Bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 13, 2026

ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 13, 2026

pantheon-ai Bot reviewed Apr 13, 2026

View reviewed changes

0xPoe commented Apr 15, 2026

View reviewed changes

ti-chi-bot Bot requested review from AilinKid, XuHuaiyu and time-and-fate April 15, 2026 13:08

time-and-fate approved these changes Apr 15, 2026

View reviewed changes

ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 15, 2026

XuHuaiyu approved these changes Apr 16, 2026

View reviewed changes

ti-chi-bot Bot added lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 16, 2026

ti-chi-bot Bot merged commit d53f621 into pingcap:master Apr 16, 2026
35 checks passed

0xPoe deleted the poe/auto-analyze-failed-alert branch April 16, 2026 16:07

0xTars mentioned this pull request Apr 22, 2026

pkg/metrics: add auto analyze failed alert rule (#67733) #67973

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/metrics: add auto analyze failed alert rule#67733

pkg/metrics: add auto analyze failed alert rule#67733
ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
0xPoe:poe/auto-analyze-failed-alert

0xPoe commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

pantheon-ai Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

codecov Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

pantheon-ai Bot left a comment

Uh oh!

0xPoe commented Apr 15, 2026

Uh oh!

0xPoe left a comment

Uh oh!

ti-chi-bot Bot commented Apr 16, 2026

Uh oh!

ti-chi-bot Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0xPoe commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

codecov Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pantheon-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

0xPoe commented Apr 15, 2026

1. Start A Real Playground Cluster

2. Start Standalone Prometheus With The PR Rule

3. Python Checker

Verified Result

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Apr 16, 2026

Uh oh!

ti-chi-bot Bot commented Apr 16, 2026

[LGTM Timeline notifier]

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0xPoe commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

pantheon-ai Bot commented Apr 13, 2026 •

edited

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

codecov Bot commented Apr 13, 2026 •

edited

Loading