Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: fix probable cause of intermittent fails in the manager test #8690

Merged
merged 1 commit into from
Sep 20, 2021

Conversation

jmolmo
Copy link
Contributor

@jmolmo jmolmo commented Sep 10, 2021

Explicitly set the length of the string parameter in json.Unmarshal method

fixes: #8669

Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com

@jmolmo jmolmo added test unit or integration testing ceph-mgr Relating to the Ceph mgr or mgr modules labels Sep 10, 2021
@jmolmo jmolmo requested a review from leseb September 10, 2021 10:35
@mergify mergify bot added the ceph main ceph tag label Sep 10, 2021
Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at my last comment in the issue #8669 (comment)

tests/integration/ceph_mgr_test.go Outdated Show resolved Hide resolved
tests/integration/ceph_mgr_test.go Outdated Show resolved Hide resolved
tests/integration/ceph_mgr_test.go Outdated Show resolved Hide resolved
@jmolmo
Copy link
Contributor Author

jmolmo commented Sep 13, 2021

@travisn : I take note of your useful comments. I have reverted all the changes related with the retry of "get k8 nodes" and replace by another fix that I think is the real cause of the problem. See #8669 (comment)

@jmolmo jmolmo changed the title ceph: retry retrieval of k8s nodes in manager test ceph: fix probable cause of intermittent fails in the manager test Sep 13, 2021
Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my last comment #8669 (comment)

@@ -221,6 +223,14 @@ func (s *CephMgrSuite) TestStatus() {
assert.Equal(s.T(), status, "Backend: rook\nAvailable: Yes")
}

func bytesInfo(bytesSlice []byte){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit on the func name

Suggested change
func bytesInfo(bytesSlice []byte){
func logBytesInfo(bytesSlice []byte){

@leseb
Copy link
Member

leseb commented Sep 17, 2021

New failure?

2021-09-17 08:23:28.630469 I | integrationTest: Ceph manager modules still not ready ... 
2021-09-17 08:23:33.630593 I | integrationTest: Waiting for rook orchestrator module enabled and ready ...
2021-09-17 08:23:33.630637 D | exec: Running command: kubectl exec -i rook-ceph-tools-78cdfd976c-bslzm -n mgr-ns -- timeout 15 ceph orch status --format json --connect-timeout=15
2021-09-17 08:23:34.647080 I | integrationTest: {"available": true, "backend": "rook"}
2021-09-17 08:23:34.647101 I | integrationTest: Ceph orchestrator ready to execute commands
2021-09-17 08:23:34.647106 I | integrationTest: ---- bytes slice info ---
2021-09-17 08:23:34.647111 I | integrationTest: bytes: [123 34 97 118 97 105 108 97 98 108 101 34 58 32 116 114 117 101 44 32 34 98 97 99 107 101 110 100 34 58 32 34 114 111 111 107 34 125]
2021-09-17 08:23:34.647115 I | integrationTest: length: 38
2021-09-17 08:23:34.647125 I | integrationTest: string: -->{"available": true, "backend": "rook"}<--
2021-09-17 08:23:34.647128 I | integrationTest: -------------------------
2021-09-17 08:23:34.647176 I | integrationTest: Orchestrator backend is <Rook>
2021-09-17 08:23:34.647183 I | testutil: kubectl apply manifest:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
2021-09-17 08:23:34.647192 I | testutil: Running kubectl [apply -f -]
storageclass.storage.k8s.io/local-storage created
2021-09-17 08:23:34.944226 D | exec: Running command: kubectl exec -i rook-ceph-tools-78cdfd976c-bslzm -n mgr-ns -- timeout 15 ceph config set mgr mgr/rook/storage_class local-storage --connect-timeout=15
2021-09-17 08:23:36.308535 I | integrationTest: Storage class "local-storage" set in manager config
=== RUN   TestCephMgrSuite/TestDeviceLs
2021-09-17 08:23:36.308753 I | integrationTest: Testing .... <ceph orch device ls>
2021-09-17 08:23:36.308777 D | exec: Running command: kubectl exec -i rook-ceph-tools-78cdfd976c-bslzm -n mgr-ns -- timeout 15 ceph orch device ls --connect-timeout=15
2021-09-17 08:23:51.664119 W | installer: Error executing command "ceph": <exit status 124>
    ceph_mgr_test.go:213: 
        	Error Trace:	ceph_mgr_test.go:213
        	Error:      	Expected nil, but got: &exec.ExitError{ProcessState:(*os.ProcessState)(0xc000138258), Stderr:[]uint8{0x63, 0x6f, 0x6d, 0x6d, 0x61, 0x6e, 0x64, 0x20, 0x74, 0x65, 0x72, 0x6d, 0x69, 0x6e, 0x61, 0x74, 0x65, 0x64, 0x20, 0x77, 0x69, 0x74, 0x68, 0x20, 0x65, 0x78, 0x69, 0x74, 0x20, 0x63, 0x6f, 0x64, 0x65, 0x20, 0x31, 0x32, 0x34, 0xa}}
        	Test:       	TestCephMgrSuite/TestDeviceLs
2021-09-17 08:23:51.664276 I | integrationTest: output = . command terminated with exit code 124

@leseb
Copy link
Member

leseb commented Sep 20, 2021

@Mergifyio rebase

Explicitly set the length of the string parameter in json.Unmarshal method

fixes: rook#8669

Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
@mergify
Copy link

mergify bot commented Sep 20, 2021

Command rebase: success

Branch has been successfully rebased

@travisn
Copy link
Member

travisn commented Sep 20, 2021

The latest run failed:

2021-09-20 13:42:07.353876 D | exec: Running command: kubectl exec -i rook-ceph-tools-78cdfd976c-m99hm -n mgr-ns -- timeout 15 ceph orch device ls --connect-timeout=15
2021-09-20 13:42:22.617685 W | installer: Error executing command "ceph": <exit status 124>
    ceph_mgr_test.go:214: 
        	Error Trace:	ceph_mgr_test.go:214
        	Error:      	Expected nil, but got: &exec.ExitError{ProcessState:(*os.ProcessState)(0xc0002fccd8), Stderr:[]uint8{0x63, 0x6f, 0x6d, 0x6d, 0x61, 0x6e, 0x64, 0x20, 0x74, 0x65, 0x72, 0x6d, 0x69, 0x6e, 0x61, 0x74, 0x65, 0x64, 0x20, 0x77, 0x69, 0x74, 0x68, 0x20, 0x65, 0x78, 0x69, 0x74, 0x20, 0x63, 0x6f, 0x64, 0x65, 0x20, 0x31, 0x32, 0x34, 0xa}}
        	Test:       	TestCephMgrSuite/TestDeviceLs

@jmolmo
Copy link
Contributor Author

jmolmo commented Sep 20, 2021

The latest run failed:

2021-09-20 13:42:07.353876 D | exec: Running command: kubectl exec -i rook-ceph-tools-78cdfd976c-m99hm -n mgr-ns -- timeout 15 ceph orch device ls --connect-timeout=15
2021-09-20 13:42:22.617685 W | installer: Error executing command "ceph": <exit status 124>
    ceph_mgr_test.go:214: 
        	Error Trace:	ceph_mgr_test.go:214
        	Error:      	Expected nil, but got: &exec.ExitError{ProcessState:(*os.ProcessState)(0xc0002fccd8), Stderr:[]uint8{0x63, 0x6f, 0x6d, 0x6d, 0x61, 0x6e, 0x64, 0x20, 0x74, 0x65, 0x72, 0x6d, 0x69, 0x6e, 0x61, 0x74, 0x65, 0x64, 0x20, 0x77, 0x69, 0x74, 0x68, 0x20, 0x65, 0x78, 0x69, 0x74, 0x20, 0x63, 0x6f, 0x64, 0x65, 0x20, 0x31, 0x32, 0x34, 0xa}}
        	Test:       	TestCephMgrSuite/TestDeviceLs

Taking a look to it. Different cause.
--> Exit code 124 seems to come from a timeout reached when trying to execute the command ceph orch device ls.

@jmolmo
Copy link
Contributor Author

jmolmo commented Sep 20, 2021

@travisn , @leseb: I have created a new issue to deal with the timeout problem:
#8759

@travisn travisn merged commit 8755759 into rook:master Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag ceph-mgr Relating to the Ceph mgr or mgr modules test unit or integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mgr suite fails intermittently
3 participants