Skip to content

Four configurations of TiDB are overwritten by the operator #6134

@kos-team

Description

@kos-team

Bug Report

What version of Kubernetes are you using?
Client Version: v1.31.1
Kustomize Version: v5.4.2

What version of TiDB Operator are you using?
v1.6.0

What did you do?
We deployed a tidb-cluster and specified some configurations in spec.tidb.config. However, the runtime configurations used by tidb are different.

Properties List

  1. host: The host is overwritten by startup script of operator and is hard-coded to 0.0.0.0
  2. log.slow-query-file: The TiDB operator expose a CR property, spec.tidb.slowLogVolumeName, for user to configure the slow log file name and it has a default value of slowly. Even if the user want to configure the cluster using configuration file and does not provide value for spec.tidb.slowLogVolumeName, its default value will be used by the operator to start the tide cluster.
  3. store: This property is replaced by the hard-coded value and is always set to tikv. This could causes the deployment of standalone tidb cluster using local storage fail.
  4. path: This property is overwritten by the --path flag in the startup script. However, if we deploy a standalone tidb cluster without PD and TiKV, the deployment would fail as they cannot contact the nonexistent PD pods.

How to reproduce

  1. Deploy a TiDB cluster with the properties listed above, for example:
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: test-cluster
spec:
  configUpdateStrategy: RollingUpdate
  enableDynamicConfiguration: true
  helper:
    image: alpine:3.16.0
  pd:
    baseImage: pingcap/pd
    config: "[dashboard]\n  internal-proxy = true\n"
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 10Gi
  pvReclaimPolicy: Retain
  tidb:
    baseImage: pingcap/tidb
    config: `host = "192.168.100.113"

      [log]
      
      slow-query-file = "tidb_slow.log"

      [performance]

      tcp-keep-alive = true\n
      `
    initializer:
      createPassword: true
    maxFailoverCount: 0
    replicas: 3
    service:
      externalTrafficPolicy: Local
      type: NodePort
  tikv:
    baseImage: pingcap/tikv
    config: 'log-level = "info"

      '
    maxFailoverCount: 0
    mountClusterClientSecret: true
    replicas: 3
    requests:
      storage: 100Gi
  timezone: UTC
  version: v8.1.0

or a standalone tidb cluster

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: test-cluster
spec:
  configUpdateStrategy: RollingUpdate
  enableDynamicConfiguration: true
  helper:
    image: alpine:3.16.0
  pvReclaimPolicy: Retain
  tidb:
    baseImage: pingcap/tidb
    config: 'store = "unistore"

      path="/tmp/tidb"

      [performance]

      tcp-keep-alive = true

      '
    maxFailoverCount: 1
    replicas: 1
    service:
      externalTrafficPolicy: Local
      type: NodePort
  timezone: UTC
  version: v8.1.0
  1. Check the runtime configuration of tidb.

What did you expect to see?
We expect to see that the configurations we provide in the configuration file match the runtime configuration of tidb when we does not provide value for their corresponding CR property exposed by the operator.

What did you see instead?
Some values are overwritten by the operator using the command line flags in the startup script.

Root Cause
For slow-query-log, it is set to the default value if spec.tidb.slowLogVolumn is nil at this line: https://github.com/pingcap/tidb-operator/blob/main/pkg/manager/member/tidb_member_manager.go#L835.
For host, store, and path, they are hard-coded in the startup script.

How to fix
We think these two issues can be resolved by checking the value of config property when building or initializing container.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions