Skip to content

pkg/metrics: add auto analyze failed alert rule#67733

Merged
ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
0xPoe:poe/auto-analyze-failed-alert
Apr 16, 2026
Merged

pkg/metrics: add auto analyze failed alert rule#67733
ti-chi-bot[bot] merged 1 commit intopingcap:masterfrom
0xPoe:poe/auto-analyze-failed-alert

Conversation

@0xPoe
Copy link
Copy Markdown
Member

@0xPoe 0xPoe commented Apr 13, 2026

What problem does this PR solve?

Issue Number: ref #63934

Problem Summary:
TiDB does not have an alert rule for failed auto-analyze tasks.

What changed and how does it work?

Add an Alertmanager rule that fires when tidb_statistics_auto_analyze_total{type="failed"} increases in the last 10 minutes.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.
    • This change only adds an alert rule for an existing metric.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features
    • Added alert rule for TiDB auto-analyze failures that triggers when failures are detected over a 10-minute interval, providing diagnostic information including instance details and environment context for faster issue identification and response.

@ti-chi-bot ti-chi-bot Bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 13, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 13, 2026

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d4a77252-3f35-40d9-ab28-c9ef2bc04065

📥 Commits

Reviewing files that changed from the base of the PR and between 885b297 and 37fac92.

📒 Files selected for processing (1)
  • pkg/metrics/alertmanager/tidb.rules.yml

📝 Walkthrough

Walkthrough

A new alert rule TiDB_auto_analyze_failed has been added to the TiDB alerting configuration. This rule triggers when the failed auto-analyze statistics counter increases within a 10-minute window and persists for 1 minute, including environment labels and detailed annotations for incident context.

Changes

Cohort / File(s) Summary
TiDB Auto-Analyze Alert
pkg/metrics/alertmanager/tidb.rules.yml
Added alert rule TiDB_auto_analyze_failed that fires when auto-analyze operations fail, with labels for environment and severity level, plus annotations for dynamic description, summary, and value reporting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A watchful whisker twitches with delight,
New alerts now guard the stats through the night,
When analyze stumbles and fails to complete,
The monitoring bell rings—no defeat!
One rule to alert them, one rule so right! 🔔

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding an alert rule for failed auto-analyze tasks in the metrics package.
Description check ✅ Passed The description is complete with all required sections: issue reference, problem summary, explanation of changes, properly marked test checkboxes, and release notes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.4344%. Comparing base (885b297) to head (37fac92).
⚠️ Report is 15 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67733        +/-   ##
================================================
- Coverage   77.6100%   77.4344%   -0.1756%     
================================================
  Files          1981       1965        -16     
  Lines        548611     548624        +13     
================================================
- Hits         425777     424824       -953     
- Misses       122024     123798      +1774     
+ Partials        810          2       -808     
Flag Coverage Δ
integration 40.9241% <ø> (+6.5844%) ⬆️
unit 76.6434% <ø> (+0.3059%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (ø)
parser ∅ <ø> (∅)
br 49.9052% <ø> (-10.5308%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@pantheon-ai pantheon-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Code looks good. No issues found.

@0xPoe
Copy link
Copy Markdown
Member Author

0xPoe commented Apr 15, 2026

Tested locally:

export WORK_DIR="$HOME/tmp/review-67733-alert"
export PLAY_TAG="case-review-67733-alert-default"
export PORT_OFFSET=38000

export SQL_PORT=42000
export STATUS_PORT=48080
export PROM_PORT=43194

export PROM_RULES="$WORK_DIR/tidb-auto-analyze.rules.yml"
export PROM_CFG="$WORK_DIR/prometheus.yml"
export PROM_DATA="$WORK_DIR/prom-data"
export PROM_BIN="$HOME/.tiup/components/prometheus/v8.5.5/prometheus/prometheus"

1. Start A Real Playground Cluster

tiup playground nightly \
  --db 1 --pd 1 --kv 1 --tiflash 0 --without-monitor \
  --tag "$PLAY_TAG" \
  --port-offset "$PORT_OFFSET" --db.binpath /Users/poe/code/tidb/bin/tidb-server

2. Start Standalone Prometheus With The PR Rule

command mkdir -p -- "$WORK_DIR" "$PROM_DATA"

cat > "$PROM_RULES" <<'RULES'
groups:
  - name: alert.rules
    rules:
      - alert: TiDB_auto_analyze_failed
        expr: increase( tidb_statistics_auto_analyze_total{type="failed"}[10m] ) > 0
        for: 1m
        labels:
          env: ENV_LABELS_ENV
          level: warning
          expr: increase( tidb_statistics_auto_analyze_total{type="failed"}[10m] ) > 0
        annotations:
          description: 'cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}'
          value: '{{ $value }}'
          summary: TiDB auto analyze failed
RULES

cat > "$PROM_CFG" <<EOF_CFG
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - $PROM_RULES

scrape_configs:
  - job_name: "tidb-review-67733-default"
    static_configs:
      - targets:
          - "127.0.0.1:$STATUS_PORT"
EOF_CFG

"$PROM_BIN" \
  --config.file="$PROM_CFG" \
  --web.listen-address="127.0.0.1:$PROM_PORT" \
  --storage.tsdb.path="$PROM_DATA"

Wait until:

curl -sf "http://127.0.0.1:$PROM_PORT/-/ready"

returns:

Prometheus Server is Ready.

3. Python Checker

Create and activate a local venv in the same shell first:

python3 -m venv "$WORK_DIR/.venv"
. "$WORK_DIR/.venv/bin/activate"
python -m pip install PyMySQL
cat > "$WORK_DIR/check_alert_timeout_e2e.py" <<'PY'
import json
import os
import sys
import time
import urllib.parse
import urllib.request

try:
    import pymysql
except ModuleNotFoundError as exc:
    raise SystemExit("Missing dependency: activate the venv and run `python -m pip install PyMySQL`") from exc

HOST = os.getenv("CASE_HOST", "127.0.0.1")
PORT = int(os.getenv("CASE_SQL_PORT", "42000"))
STATUS_PORT = int(os.getenv("CASE_STATUS_PORT", "48080"))
PROM_PORT = int(os.getenv("CASE_PROM_PORT", "43194"))
DB = os.getenv("CASE_DB", "review67733_timeout")
TABLE = os.getenv("CASE_TABLE", "t_auto")
ROW_BATCH = int(os.getenv("CASE_ROW_BATCH", "5000"))
BASE_ROWS = int(os.getenv("CASE_BASE_ROWS", "5000000"))
DELTA_ROWS = int(os.getenv("CASE_DELTA_ROWS", "1000000"))
FAIL_ROUNDS = int(os.getenv("CASE_FAIL_ROUNDS", "2"))
MAX_TIME = int(os.getenv("CASE_MAX_AUTO_ANALYZE_TIME", "1"))
POLL_INTERVAL = float(os.getenv("CASE_POLL_INTERVAL", "0.1"))


def connect(db=None, autocommit=True):
    return pymysql.connect(
        host=HOST,
        port=PORT,
        user="root",
        password="",
        database=db,
        autocommit=autocommit,
        charset="utf8mb4",
        cursorclass=pymysql.cursors.DictCursor,
        read_timeout=120,
        write_timeout=120,
    )


def exec_sql(cur, sql, args=None):
    cur.execute(sql, args)
    try:
        return cur.fetchall()
    except Exception:
        return None


def http_json(url):
    with urllib.request.urlopen(url, timeout=15) as resp:
        return json.loads(resp.read().decode("utf-8"))


def prom_query(expr):
    query = urllib.parse.quote(expr)
    data = http_json(f"http://127.0.0.1:{PROM_PORT}/api/v1/query?query={query}")
    return data["data"]["result"]


def prom_rule_state():
    data = http_json(f"http://127.0.0.1:{PROM_PORT}/api/v1/rules")
    for group in data["data"]["groups"]:
        for rule in group["rules"]:
            if rule.get("name") == "TiDB_auto_analyze_failed":
                return rule
    raise RuntimeError("TiDB_auto_analyze_failed not found")


def prom_alerts():
    return http_json(f"http://127.0.0.1:{PROM_PORT}/api/v1/alerts")["data"]["alerts"]


def wait_for_alert_state(target, timeout_sec):
    deadline = time.time() + timeout_sec
    last_rule = None
    last_alerts = []
    while time.time() < deadline:
        last_rule = prom_rule_state()
        last_alerts = prom_alerts()
        if last_rule.get("state") == target:
            return last_rule, last_alerts
        time.sleep(POLL_INTERVAL)
    return last_rule, last_alerts


def insert_rows(cur, start, count):
    remaining = count
    next_id = start
    sql = f'''
    insert into {DB}.{TABLE}
    (id, a, b, c, d, e, f, g, h, s)
    values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    '''
    while remaining > 0:
        batch = min(ROW_BATCH, remaining)
        values = []
        for i in range(batch):
            v = next_id + i
            row = [v]
            for mod in [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000]:
                row.append(v % mod)
            row.append(f"s{v:08d}")
            values.append(tuple(row))
        cur.executemany(sql, values)
        next_id += batch
        remaining -= batch


def latest_analyze_rows(cur):
    rows = exec_sql(cur, f"show analyze status where table_schema = '{DB}' and table_name = '{TABLE}'")
    return rows or []


def latest_failed_auto_analyze(rows):
    for row in rows:
        txt = json.dumps(row, default=str).lower()
        if "auto analyze" in txt and "failed" in txt:
            return row
    return None


def metric_failed():
    text = urllib.request.urlopen(f"http://{HOST}:{STATUS_PORT}/metrics", timeout=15).read().decode("utf-8", errors="replace")
    for line in text.splitlines():
        if line.startswith('tidb_statistics_auto_analyze_total{type="failed"}'):
            return float(line.split()[-1])
    return 0.0


def main():
    urllib.request.urlopen(f"http://127.0.0.1:{PROM_PORT}/-/ready", timeout=15).read()
    print("initial rule:", json.dumps(prom_rule_state(), default=str, indent=2), flush=True)

    with connect() as conn:
        with conn.cursor() as cur:
            exec_sql(cur, f"drop database if exists {DB}")
            exec_sql(cur, f"create database {DB}")
            exec_sql(cur, f"use {DB}")
            exec_sql(cur, "set global tidb_enable_auto_analyze = 0")
            exec_sql(cur, "set global tidb_auto_analyze_ratio = 0.01")
            exec_sql(cur, "set global tidb_auto_analyze_start_time = '00:00 +0000'")
            exec_sql(cur, "set global tidb_auto_analyze_end_time = '23:59 +0000'")
            exec_sql(cur, "set global tidb_analyze_version = 2")
            exec_sql(cur, "set global tidb_max_auto_analyze_time = %s", (MAX_TIME,))
            exec_sql(
                cur,
                f'''
                create table {TABLE} (
                    id bigint primary key,
                    a bigint,
                    b bigint,
                    c bigint,
                    d bigint,
                    e bigint,
                    f bigint,
                    g bigint,
                    h bigint,
                    s varchar(32),
                    index ia(a),
                    index ib(b),
                    index ic(c),
                    index idd(d),
                    index ie(e),
                    index iff(f),
                    index ig(g),
                    index ih(h),
                    index is1(s)
                )
                ''',
            )

            print("loading baseline rows", BASE_ROWS, flush=True)
            insert_rows(cur, 1, BASE_ROWS)
            exec_sql(cur, "flush stats_delta")
            exec_sql(cur, f"analyze table {TABLE}")

            before = metric_failed()
            next_id = BASE_ROWS + 1
            for i in range(FAIL_ROUNDS):
                print("round", i + 1, "delta rows", DELTA_ROWS, flush=True)
                round_before = metric_failed()
                exec_sql(cur, "set global tidb_enable_auto_analyze = 0")
                insert_rows(cur, next_id, DELTA_ROWS)
                next_id += DELTA_ROWS
                exec_sql(cur, "flush stats_delta")
                exec_sql(cur, "set global tidb_enable_auto_analyze = 1")

                deadline = time.time() + 300
                row = None
                rows = []
                while time.time() < deadline:
                    rows = latest_analyze_rows(cur)
                    if metric_failed() > round_before:
                        row = latest_failed_auto_analyze(rows)
                        if row is not None:
                            break
                    time.sleep(POLL_INTERVAL)
                print("rows", json.dumps(rows, default=str, indent=2), flush=True)
                if row is None:
                    return 2

            after = metric_failed()
            print("failed metric", before, after, flush=True)
            print("prom increase", json.dumps(prom_query('increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m])'), indent=2), flush=True)
            pending_rule, pending_alerts = wait_for_alert_state("pending", 120)
            print("pending_rule", json.dumps(pending_rule, default=str, indent=2), flush=True)
            print("pending_alerts", json.dumps(pending_alerts, default=str, indent=2), flush=True)
            if pending_rule.get("state") != "pending":
                return 3
            firing_rule, firing_alerts = wait_for_alert_state("firing", 180)
            print("firing_rule", json.dumps(firing_rule, default=str, indent=2), flush=True)
            print("firing_alerts", json.dumps(firing_alerts, default=str, indent=2), flush=True)
            if firing_rule.get("state") != "firing":
                return 4
            return 0


if __name__ == "__main__":
    sys.exit(main())
PY

Run it:

CASE_SQL_PORT="$SQL_PORT" CASE_STATUS_PORT="$STATUS_PORT" CASE_PROM_PORT="$PROM_PORT" \
python "$WORK_DIR/check_alert_timeout_e2e.py"

Verified Result

initial rule: {
  "state": "inactive",
  "name": "TiDB_auto_analyze_failed",
  "query": "increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m]) > 0",
  "duration": 60,
  "keepFiringFor": 0,
  "labels": {
    "env": "ENV_LABELS_ENV",
    "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
    "level": "warning"
  },
  "annotations": {
    "description": "cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}",
    "summary": "TiDB auto analyze failed",
    "value": "{{ $value }}"
  },
  "alerts": [],
  "health": "ok",
  "evaluationTime": 0.000341417,
  "lastEvaluation": "2026-04-15T14:51:41.250725+02:00",
  "type": "alerting"
}
loading baseline rows 5000000
round 1 delta rows 1000000
rows [
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "auto analyze table all indexes, all columns with 256 buckets, 100 topn, 0.018333333333333333 samplerate",
    "Processed_rows": 655661,
    "Start_time": "2026-04-15 14:58:29",
    "End_time": "2026-04-15 14:58:30",
    "State": "failed",
    "Fail_reason": "[executor:1317]Query execution was interrupted",
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  },
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "analyze table all indexes, all columns with 256 buckets, 100 topn, 0.03308270676691729 samplerate",
    "Processed_rows": 5000000,
    "Start_time": "2026-04-15 14:57:20",
    "End_time": "2026-04-15 14:57:22",
    "State": "finished",
    "Fail_reason": null,
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  }
]
round 2 delta rows 1000000
rows [
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "auto analyze table all indexes, all columns with 256 buckets, 100 topn, 0.015714285714285715 samplerate",
    "Processed_rows": 655661,
    "Start_time": "2026-04-15 14:59:38",
    "End_time": "2026-04-15 14:59:39",
    "State": "failed",
    "Fail_reason": "[executor:1317]Query execution was interrupted",
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  },
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "auto analyze table all indexes, all columns with 256 buckets, 100 topn, 0.018333333333333333 samplerate",
    "Processed_rows": 655661,
    "Start_time": "2026-04-15 14:58:29",
    "End_time": "2026-04-15 14:58:30",
    "State": "failed",
    "Fail_reason": "[executor:1317]Query execution was interrupted",
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  },
  {
    "Table_schema": "review67733_timeout",
    "Table_name": "t_auto",
    "Partition_name": "",
    "Job_info": "analyze table all indexes, all columns with 256 buckets, 100 topn, 0.03308270676691729 samplerate",
    "Processed_rows": 5000000,
    "Start_time": "2026-04-15 14:57:20",
    "End_time": "2026-04-15 14:57:22",
    "State": "finished",
    "Fail_reason": null,
    "Instance": "127.0.0.1:42000",
    "Process_ID": null,
    "Remaining_seconds": null,
    "Progress": null,
    "Estimated_total_rows": null
  }
]
failed metric 0.0 2.0
prom increase [
  {
    "metric": {
      "instance": "127.0.0.1:48080",
      "job": "tidb-review-67733-default",
      "type": "failed"
    },
    "value": [
      1776257979.41,
      "0"
    ]
  }
]
pending_rule {
  "state": "pending",
  "name": "TiDB_auto_analyze_failed",
  "query": "increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m]) > 0",
  "duration": 60,
  "keepFiringFor": 0,
  "labels": {
    "env": "ENV_LABELS_ENV",
    "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
    "level": "warning"
  },
  "annotations": {
    "description": "cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}",
    "summary": "TiDB auto analyze failed",
    "value": "{{ $value }}"
  },
  "alerts": [
    {
      "labels": {
        "alertname": "TiDB_auto_analyze_failed",
        "env": "ENV_LABELS_ENV",
        "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
        "instance": "127.0.0.1:48080",
        "job": "tidb-review-67733-default",
        "level": "warning",
        "type": "failed"
      },
      "annotations": {
        "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.2420914387808162",
        "summary": "TiDB auto analyze failed",
        "value": "1.2420914387808162"
      },
      "state": "pending",
      "activeAt": "2026-04-15T12:59:56.224973056Z",
      "value": "1.2420914387808162e+00"
    }
  ],
  "health": "ok",
  "evaluationTime": 0.002013916,
  "lastEvaluation": "2026-04-15T15:00:11.21733+02:00",
  "type": "alerting"
}
pending_alerts [
  {
    "labels": {
      "alertname": "TiDB_auto_analyze_failed",
      "env": "ENV_LABELS_ENV",
      "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
      "instance": "127.0.0.1:48080",
      "job": "tidb-review-67733-default",
      "level": "warning",
      "type": "failed"
    },
    "annotations": {
      "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.2420914387808162",
      "summary": "TiDB auto analyze failed",
      "value": "1.2420914387808162"
    },
    "state": "pending",
    "activeAt": "2026-04-15T12:59:56.224973056Z",
    "value": "1.2420914387808162e+00"
  }
]
firing_rule {
  "state": "firing",
  "name": "TiDB_auto_analyze_failed",
  "query": "increase(tidb_statistics_auto_analyze_total{type=\"failed\"}[10m]) > 0",
  "duration": 60,
  "keepFiringFor": 0,
  "labels": {
    "env": "ENV_LABELS_ENV",
    "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
    "level": "warning"
  },
  "annotations": {
    "description": "cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, values:{{ $value }}",
    "summary": "TiDB auto analyze failed",
    "value": "{{ $value }}"
  },
  "alerts": [
    {
      "labels": {
        "alertname": "TiDB_auto_analyze_failed",
        "env": "ENV_LABELS_ENV",
        "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
        "instance": "127.0.0.1:48080",
        "job": "tidb-review-67733-default",
        "level": "warning",
        "type": "failed"
      },
      "annotations": {
        "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.1344957115544",
        "summary": "TiDB auto analyze failed",
        "value": "1.1344957115544"
      },
      "state": "firing",
      "activeAt": "2026-04-15T12:59:56.224973056Z",
      "value": "1.1344957115544e+00"
    }
  ],
  "health": "ok",
  "evaluationTime": 0.002007042,
  "lastEvaluation": "2026-04-15T15:01:11.217456+02:00",
  "type": "alerting"
}
firing_alerts [
  {
    "labels": {
      "alertname": "TiDB_auto_analyze_failed",
      "env": "ENV_LABELS_ENV",
      "expr": "increase( tidb_statistics_auto_analyze_total{type=\"failed\"}[10m] ) > 0",
      "instance": "127.0.0.1:48080",
      "job": "tidb-review-67733-default",
      "level": "warning",
      "type": "failed"
    },
    "annotations": {
      "description": "cluster: ENV_LABELS_ENV, instance: 127.0.0.1:48080, values:1.1344957115544",
      "summary": "TiDB auto analyze failed",
      "value": "1.1344957115544"
    },
    "state": "firing",
    "activeAt": "2026-04-15T12:59:56.224973056Z",
    "value": "1.1344957115544e+00"
  }
]

Copy link
Copy Markdown
Member Author

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔢 Self-check (PR reviewed by myself and ready for feedback)

  • Code compiles successfully

  • Tested locally

  • No AI-generated elegant nonsense in PR.

  • Comments added where necessary

  • PR title and description updated

  • Documentation PR created (I will update it later)

  • PR size is reasonable

/cc @AilinKid @time-and-fate @XuHuaiyu

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: time-and-fate, XuHuaiyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 16, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 16, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-15 18:37:04.90911353 +0000 UTC m=+1586230.114473587: ☑️ agreed by time-and-fate.
  • 2026-04-16 13:40:31.12331886 +0000 UTC m=+1654836.328678907: ☑️ agreed by XuHuaiyu.

@ti-chi-bot ti-chi-bot Bot merged commit d53f621 into pingcap:master Apr 16, 2026
35 checks passed
@0xPoe 0xPoe deleted the poe/auto-analyze-failed-alert branch April 16, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants