Skip to content

When upgrad or reload a cluster, increase retry when accessing PD #2494

@wsiqiang6

Description

@wsiqiang6

Bug Report

  1. What did you do?
    tiup cluster upgrade <clsuter_name>

In the TiKV evict leader phase :
error requesting pd api , response: no leader

  1. What did you expect to see?

After investigation, it was found that due to the leader priority setting in PD, a leader switch occurred during the "upgrade cluster" pd stage. Subsequently, PD checked the leader priority every minute, causing a PD leader transfer that took 0.5 seconds.

Coincidentally, during this 0.5-second window, the upgrade cluster process had already reached the TiKV stage and was performing the "set leader evict scheduler" operation, resulting in a "no leader" error when accessing PD, which caused TiUP to exit.

I think a retry mechanism should be added when calling the PD API to prevent TiUP upgrade or reload operations from being interrupted due to such short-term changes in PD.

  1. What did you see instead?
    tiup error exits

  2. What version of TiUP are you using (tiup --version)?
    v1.14.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugCategorizes issue as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions