Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the problem that Darwin memory leak detection may fail #8213

Merged

Conversation

yangbodong22011
Copy link
Collaborator

refer: #8205

@oranagra oranagra linked an issue Dec 18, 2020 that may be closed by this pull request
@oranagra
Copy link
Member

@yangbodong22011 thanks for looking into this.
I wonder what are the pros and cons of using your approach rather than add another string matching on the response to cover another version of leaks.
I suppose there's no race possible in your approach since you check if the process is still alive after the execution of leaks, but still, maybe it's better to let it decide if it succeeded or failed.
P. S I see this in the man page (maybe we should use it instead of string matching):

     The leaks command exits with one of the following values:

     0	   No leaks were detected.
     1	   One or more leaks were detected.
     >1    An error occurred.

@yangbodong22011
Copy link
Collaborator Author

@oranagra Agree to use the return value. My code is from: here.

I also test this, but when pid not exists, the code is 1 and not 255, so I used the above method.

catch {exec leaks $pid} output option
puts [dict get $option -code]

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code you wrote / copied looks good to me.
i changed the comment a bit, let me know if you agree it's better.

tests/support/server.tcl Outdated Show resolved Hide resolved
Co-authored-by: Oran Agra <oran@redislabs.com>
oranagra
oranagra previously approved these changes Dec 18, 2020
@oranagra
Copy link
Member

thanks @yangbodong22011 can you please make sure again that this works on all MacOS variants you have (since we have no coverage for this in CI)

@yangbodong22011
Copy link
Collaborator Author

thanks @yangbodong22011 can you please make sure again that this works on all MacOS variants you have (since we have no coverage for this in CI)

On my computer, MacOS High Sierra, I've tested it.
At present, I can't test the MacOS Big Sur because I borrowed other people's computers before. I will verify the MacOS Big Sur at a convenient time later.

@oranagra oranagra added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Dec 18, 2020
@yangbodong22011
Copy link
Collaborator Author

thanks @yangbodong22011 can you please make sure again that this works on all MacOS variants you have (since we have no coverage for this in CI)

On my computer, MacOS High Sierra, I've tested it.
At present, I can't test the MacOS Big Sur because I borrowed other people's computers before. I will verify the MacOS Big Sur at a convenient time later.

I borrowed someone else’s "MacOS Big Sur" test and passed, but I found that the leaks error still did not match the original issue. The original error is: [fatal] process 4812 has not started. But my "MacOS Big Sur" is:

MacOS Big Sur
➜ ~ leaks 55555
leaks[27180]: leaks cannot examine process 55555 (with name like '55555') because it no longer appears to be running.
leaks[27180]: [fatal] mach port for process 55555 not valid

So let's go back to the original issue and let the person who raised the issue help verify it.

@oranagra oranagra merged commit ee59dc1 into redis:unstable Dec 23, 2020
oddcc pushed a commit to oddcc/redis that referenced this pull request Dec 27, 2020
…dis#8213)

Apparently the "leaks" took reports a different error string about process
that's not found in each version of MacOS.
This cause the test suite to fail on some OS versions, since some tests terminate
the process before looking for leaks.
Instead of looking at the error string, we now look at the (documented) exit code.
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Mar 2, 2021
…dis#8213)

Apparently the "leaks" took reports a different error string about process
that's not found in each version of MacOS.
This cause the test suite to fail on some OS versions, since some tests terminate
the process before looking for leaks.
Instead of looking at the error string, we now look at the (documented) exit code.
@enjoy-binbin
Copy link
Collaborator

@yangbodong22011 do you have some times to look into this one? https://github.com/enjoy-binbin/redis/actions/runs/5017425080/jobs/8995524700#step:5:15263
i don't know how to read the leak report, and it may be a false positive. and you may have better vision

@yangbodong22011
Copy link
Collaborator Author

@yangbodong22011 do you have some times to look into this one? https://github.com/enjoy-binbin/redis/actions/runs/5017425080/jobs/8995524700#step:5:15263 i don't know how to read the leak report, and it may be a false positive. and you may have better vision

seems this test fail:

Starting test Check for memory leaks (pid 61056) in tests/unit/cluster/hostnames.tcl - with servers:

and have some log

61056:M 18 May 2023 21:43:12.561 - Connection with Node 426efca909c1e37120f9b0e0ee58cf43a1e16669 at 127.0.0.1:31668 failed: Connection refused
61056:M 18 May 2023 21:43:12.561 - Connection with Node 5ca5a2d7fd19904b113ebd7bbb47ad8cdf07b624 at 127.0.0.1:31670 failed: Connection refused
61056:M 18 May 2023 21:43:12.561 - Connection with Node ebcd8d132b88052f68fa01a8a66073744ef1033a at 127.0.0.1:31669 failed: Connection refused
61056:M 18 May 2023 21:43:12.663 - Connection with Node 45053a08a706609ce0530f95008e3b36f8cd80ea at 127.0.0.1:31667 failed: Connection refused
61056:M 18 May 2023 21:43:12.663 - Connection with Node 118c512d5a7889718584a7611948da38e01bf2b3 at 127.0.0.1:31671 failed: Connection refused
61056:M 18 May 2023 21:43:12.663 - Connection with Node 426efca909c1e37120f9b0e0ee58cf43a1e16669 at 127.0.0.1:31668 failed: Connection refused
61056:M 18 May 2023 21:43:12.663 - Connection with Node 5ca5a2d7fd19904b113ebd7bbb47ad8cdf07b624 at 127.0.0.1:31670 failed: Connection refused
61056:M 18 May 2023 21:43:12.663 - Connection with Node ebcd8d132b88052f68fa01a8a66073744ef1033a at 127.0.0.1:31669 failed: Connection refused
61056:M 18 May 2023 21:43:12.765 - Connection with Node 45053a08a706609ce0530f95008e3b36f8cd80ea at 127.0.0.1:31667 failed: Connection refused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[fatal] process 4812 has not started
3 participants