Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix support for multiple pvc for pd #3820

Merged
merged 24 commits into from Mar 10, 2021
Merged

Conversation

dragonly
Copy link
Contributor

@dragonly dragonly commented Mar 2, 2021

What problem does this PR solve?

ref: #3802

What is changed and how does it work?

Change pd related code to support multiple pvc from pod spec, instead of constructing a single pvc name from tc name and ordinal.

Similar to #3816.

Code changes

  • Has Go code change
  • Has CI related scripts change

Tests

  • Unit test

  • E2E test

  • Manual test

    Scalar Test

    1. deploy a tc
    apiVersion: pingcap.com/v1alpha1
    kind: TidbCluster
    metadata:
        name: basic
    spec:
        version: v4.0.10
        timezone: UTC
        pvReclaimPolicy: Delete
        configUpdateStrategy: RollingUpdate
        discovery: {}
        pd:
            baseImage: pingcap/pd
            replicas: 4
            requests:
                storage: "1Gi"
            storageVolumes:
                - name: additional
                storageSize: "1Gi"
                mountPath: /mnt
            config: {}
        tikv:
            baseImage: pingcap/tikv
            replicas: 1
            requests:
                storage: "1Gi"
            config: {}
        tidb:
            baseImage: pingcap/tidb
            replicas: 1
            config: {}
    1. check PVC with kubectl get pvc
    NAME                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pd-additional-basic-pd-0   Bound    pvc-9e530faa-d73a-4656-b1e0-bae655379224   1Gi        RWO            standard       103s
    pd-additional-basic-pd-1   Bound    pvc-93216fd5-be51-4f9b-9b3f-db12acc6da76   1Gi        RWO            standard       103s
    pd-additional-basic-pd-2   Bound    pvc-0b8b3658-d1fc-406f-ba4d-3cbb1439cc52   1Gi        RWO            standard       103s
    pd-additional-basic-pd-3   Bound    pvc-d30b679b-c3cf-435d-b03e-9c3e7cda701a   1Gi        RWO            standard       103s
    pd-basic-pd-0              Bound    pvc-34d01898-e50b-4df0-8385-3692213b3cd1   1Gi        RWO            standard       103s
    pd-basic-pd-1              Bound    pvc-26c0e7f3-7678-4b47-80c9-8e0d74fc54cc   1Gi        RWO            standard       103s
    pd-basic-pd-2              Bound    pvc-f54d5784-056e-4da6-8811-7270d2c23738   1Gi        RWO            standard       103s
    pd-basic-pd-3              Bound    pvc-e9dda077-63dc-44f0-a0bd-754560c06ca7   1Gi        RWO            standard       103s
    tikv-basic-tikv-0          Bound    pvc-76f92001-66c3-4065-ac36-fb509a7b6cd5   1Gi        RWO            standard       55s
    
    1. scale in pd to 3 replicas, check pvc again, returns same results as in step 2, because PVCs are retained by default.
    NAME                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pd-additional-basic-pd-0   Bound    pvc-9e530faa-d73a-4656-b1e0-bae655379224   1Gi        RWO            standard       2m19s
    pd-additional-basic-pd-1   Bound    pvc-93216fd5-be51-4f9b-9b3f-db12acc6da76   1Gi        RWO            standard       2m19s
    pd-additional-basic-pd-2   Bound    pvc-0b8b3658-d1fc-406f-ba4d-3cbb1439cc52   1Gi        RWO            standard       2m19s
    pd-additional-basic-pd-3   Bound    pvc-d30b679b-c3cf-435d-b03e-9c3e7cda701a   1Gi        RWO            standard       2m19s
    pd-basic-pd-0              Bound    pvc-34d01898-e50b-4df0-8385-3692213b3cd1   1Gi        RWO            standard       2m19s
    pd-basic-pd-1              Bound    pvc-26c0e7f3-7678-4b47-80c9-8e0d74fc54cc   1Gi        RWO            standard       2m19s
    pd-basic-pd-2              Bound    pvc-f54d5784-056e-4da6-8811-7270d2c23738   1Gi        RWO            standard       2m19s
    pd-basic-pd-3              Bound    pvc-e9dda077-63dc-44f0-a0bd-754560c06ca7   1Gi        RWO            standard       2m19s
    tikv-basic-tikv-0          Bound    pvc-76f92001-66c3-4065-ac36-fb509a7b6cd5   1Gi        RWO            standard       91s
    
    1. scale out PD to 4 replicas, check pvc again, found that pd-basic-pd-3 and pd-additional-basic-pd-3 has changed, because the old PVCs are deleted and new ones are created with the same name. we can tell from the different VOLUME name.
    NAME                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pd-additional-basic-pd-0   Bound    pvc-9e530faa-d73a-4656-b1e0-bae655379224   1Gi        RWO            standard       2m48s
    pd-additional-basic-pd-1   Bound    pvc-93216fd5-be51-4f9b-9b3f-db12acc6da76   1Gi        RWO            standard       2m48s
    pd-additional-basic-pd-2   Bound    pvc-0b8b3658-d1fc-406f-ba4d-3cbb1439cc52   1Gi        RWO            standard       2m48s
    pd-additional-basic-pd-3   Bound    pvc-56455d64-0e9c-4fa2-a906-22d2693d3b23   1Gi        RWO            standard       11s
    pd-basic-pd-0              Bound    pvc-34d01898-e50b-4df0-8385-3692213b3cd1   1Gi        RWO            standard       2m48s
    pd-basic-pd-1              Bound    pvc-26c0e7f3-7678-4b47-80c9-8e0d74fc54cc   1Gi        RWO            standard       2m48s
    pd-basic-pd-2              Bound    pvc-f54d5784-056e-4da6-8811-7270d2c23738   1Gi        RWO            standard       2m48s
    pd-basic-pd-3              Bound    pvc-a7ce2bb5-ef39-42f9-aa5a-7a377a9be729   1Gi        RWO            standard       11s
    tikv-basic-tikv-0          Bound    pvc-76f92001-66c3-4065-ac36-fb509a7b6cd5   1Gi        RWO            standard       2m
    

    Failover Test

    1. deploy tc
    apiVersion: pingcap.com/v1alpha1
    kind: TidbCluster
    metadata:
        name: basic
    spec:
        version: v4.0.10
        timezone: UTC
        pvReclaimPolicy: Delete
        configUpdateStrategy: RollingUpdate
        discovery: {}
        pd:
            baseImage: pingcap/pd
            replicas: 3
            requests:
                storage: "1Gi"
            config: {}
            storageVolumes:
                - name: additional
                storageSize: "1Gi"
                mountPath: /mnt
        tikv:
            baseImage: pingcap/tikv
            replicas: 1
            requests:
                storage: "1Gi"
            config: {}
        tidb:
            baseImage: pingcap/tidb
            replicas: 1
            config: {}
    1. check PVC with kubectl get pvc
    NAME                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pd-additional-basic-pd-0   Bound    pvc-75938966-581c-473a-880f-f8a8012ad195   1Gi        RWO            standard       2m56s
    pd-additional-basic-pd-1   Bound    pvc-0a5019b1-cef2-48d2-929d-64bdf7d99bba   1Gi        RWO            standard       2m56s
    pd-additional-basic-pd-2   Bound    pvc-4a318448-6abd-4a1d-84e7-3fdc0a5f3738   1Gi        RWO            standard       2m56s
    pd-basic-pd-0              Bound    pvc-f861df73-c351-4d01-bbd1-ac50ed5ad1da   1Gi        RWO            standard       2m56s
    pd-basic-pd-1              Bound    pvc-3de3254b-4180-4d1e-aadf-8523d42936b7   1Gi        RWO            standard       2m56s
    pd-basic-pd-2              Bound    pvc-1e0646c3-bb32-47e6-9cf7-63b826721de8   1Gi        RWO            standard       2m56s
    tikv-basic-tikv-0          Bound    pvc-14a0fcc2-bca6-48d8-9243-35387f4141cf   1Gi        RWO            standard       2m26s
    
    1. patch Pod basic-pd-0 with wrong image using kubectl patch pod basic-pd-0 --patch '{"spec": {"containers": [{"name": "pd", "image": "bad"}]}}'. then check pod status with kubectl get pod, and find that basic-pd-0 is in ErrImagePull status.
    NAME                               READY   STATUS         RESTARTS   AGE
    basic-discovery-59c77754b4-ljfdx   1/1     Running        0          3m35s
    basic-pd-0                         0/1     ErrImagePull   0          3m35s
    basic-pd-1                         1/1     Running        1          3m35s
    basic-pd-2                         1/1     Running        0          3m35s
    basic-tidb-0                       2/2     Running        0          2m39s
    basic-tikv-0                       1/1     Running        0          3m5s
    
    1. after failover period (default to 5 minutes), a new Pod basic-pd-3 is created, which is the failover operation. then Pod basic-pd-0 and its mounted PVCs are deleted, and recreated by StatefulSet controller. then basic-pd-3 is deleted, which is the recover operation. we can check by kubectl describe tc basic and find that the basic-pd-3 is in the unjoined member list.
    # failover
    NAME                               READY   STATUS    RESTARTS   AGE
    basic-discovery-59c77754b4-ljfdx   1/1     Running   0          8m46s
    basic-pd-0                         1/1     Running   0          8s
    basic-pd-1                         1/1     Running   1          8m46s
    basic-pd-2                         1/1     Running   0          8m46s
    basic-pd-3                         1/1     Running   0          19s
    basic-tidb-0                       2/2     Running   0          7m50s
    basic-tikv-0                       1/1     Running   0          8m16s
    
    # recover
    NAME                               READY   STATUS    RESTARTS   AGE
    basic-discovery-59c77754b4-ljfdx   1/1     Running   0          9m2s
    basic-pd-0                         1/1     Running   0          24s
    basic-pd-1                         1/1     Running   1          9m2s
    basic-pd-2                         1/1     Running   0          9m2s
    basic-tidb-0                       2/2     Running   0          8m6s
    basic-tikv-0                       1/1     Running   0          8m32s
    
    1. check that all PVCs are recreated for basic-pd-0 by kubectl get pvc. note that the UIDs are changed for pd-basic-pd-0 and pd-additional-basic-pd-0.
    NAME                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    pd-additional-basic-pd-0   Bound    pvc-d3174c48-b165-4a2b-95a6-bffc28d538c5   1Gi        RWO            standard       42s
    pd-additional-basic-pd-1   Bound    pvc-0a5019b1-cef2-48d2-929d-64bdf7d99bba   1Gi        RWO            standard       9m20s
    pd-additional-basic-pd-2   Bound    pvc-4a318448-6abd-4a1d-84e7-3fdc0a5f3738   1Gi        RWO            standard       9m20s
    pd-additional-basic-pd-3   Bound    pvc-437fc0d6-7a48-42e1-a1fd-4fdebddd5a12   1Gi        RWO            standard       53s
    pd-basic-pd-0              Bound    pvc-a9c900f8-44ee-4888-b515-3da64272fe68   1Gi        RWO            standard       42s
    pd-basic-pd-1              Bound    pvc-3de3254b-4180-4d1e-aadf-8523d42936b7   1Gi        RWO            standard       9m20s
    pd-basic-pd-2              Bound    pvc-1e0646c3-bb32-47e6-9cf7-63b826721de8   1Gi        RWO            standard       9m20s
    pd-basic-pd-3              Bound    pvc-7911b63e-8d3c-45a0-bef5-4b3fb9768e9d   1Gi        RWO            standard       53s
    tikv-basic-tikv-0          Bound    pvc-14a0fcc2-bca6-48d8-9243-35387f4141cf   1Gi        RWO            standard       8m50s
    
    1. check unjoined PD members with kubectl get tc -o yaml | grep -i unjoinedmembers -A 10, and find out that PD member basic-pd-3 has joined the PD cluster and now is in the unjoinedMembers list. Also note that the pvcUID is the same as pd-basic-pd-3.
    unjoinedMembers:
        basic-pd-3:
            createdAt: "2021-03-10T09:04:35Z"
            podName: basic-pd-3
            pvcUID: 7911b63e-8d3c-45a0-bef5-4b3fb9768e9d
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.

Fix support for multiple PVC for PD.

@codecov-io
Copy link

codecov-io commented Mar 2, 2021

Codecov Report

Merging #3820 (19f56e8) into master (a3f1515) will decrease coverage by 0.94%.
The diff coverage is 80.64%.

@@            Coverage Diff             @@
##           master    #3820      +/-   ##
==========================================
- Coverage   68.72%   67.77%   -0.95%     
==========================================
  Files         173      173              
  Lines       18452    18448       -4     
==========================================
- Hits        12681    12503     -178     
- Misses       4667     4848     +181     
+ Partials     1104     1097       -7     
Flag Coverage Δ
e2e 40.38% <38.70%> (-2.53%) ⬇️
unittest 62.35% <79.56%> (?)

@dragonly dragonly marked this pull request as ready for review March 4, 2021 03:09
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover_test.go Show resolved Hide resolved
pkg/manager/member/pd_scaler_test.go Outdated Show resolved Hide resolved
pkg/util/util.go Outdated
}
pvcs = append(pvcs, pvc)
}
}
if len(pvcs) == 0 {
return nil, errors.NewNotFound(corev1.Resource("pvc"), pod.Name)
err := errors.NewNotFound(corev1.Resource("pvc"), "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

details are fixed in the next line

@@ -686,8 +722,12 @@ func oneFailureMember(tc *v1alpha1.TidbCluster) {
pd0: {Name: pd0, ID: "0", Health: true},
pd2: {Name: pd2, ID: "2", Health: true},
}

pvcUIDSet := make(map[types.UID]struct{})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indicates that the failure member pd1 has 2 mounted PVCs as is set in the test function.

pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/failover.go Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
pkg/manager/member/pd_failover.go Outdated Show resolved Hide resolved
@@ -724,8 +764,12 @@ func oneNotReadyMemberAndAFailureMember(tc *v1alpha1.TidbCluster) {
pd1: {Name: pd1, ID: "12891273174085095651", Health: false, LastTransitionTime: metav1.Time{Time: time.Now().Add(-10 * time.Minute)}},
pd2: {Name: pd2, ID: "2", Health: true},
}

pvcUIDSet := make(map[types.UID]struct{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the code changes in pd_failover.go are fully covered.

pkg/manager/member/pd_member_manager.go Outdated Show resolved Hide resolved
pkg/manager/member/utils.go Outdated Show resolved Hide resolved
Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@handlerww handlerww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • DanielZhangQD
  • handlerww

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@DanielZhangQD
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 3f7fa0b

@ti-chi-bot ti-chi-bot merged commit 52e1f7f into pingcap:master Mar 10, 2021
ti-srebot pushed a commit to ti-srebot/tidb-operator that referenced this pull request Mar 10, 2021
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-1.1 in PR #3837

shonge pushed a commit to shonge/tidb-operator that referenced this pull request Mar 30, 2021
shonge pushed a commit to shonge/tidb-operator that referenced this pull request Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants