Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk monitor improvements #3895

Merged
merged 6 commits into from Dec 21, 2021

Conversation

lukebakken
Copy link
Collaborator

@lukebakken lukebakken commented Dec 15, 2021

Related to VESC-1015

  • Remove infinity timeouts
  • Improve free disk space retrieval on win32

This PR fixes an issue I observed while reproducing VESC-1015 on Windows 10. Within an hour or so of running a 3-node cluster that has health checks being run against it, one or more nodes' memory use would spike. I would see that the rabbit_disk_monitor process is stuck executing os:cmd to retrieve free disk space information. Thus, all gen_server:call calls to the process would never return, especially since they used an infinity timeout.

@lukebakken lukebakken self-assigned this Dec 15, 2021
@lukebakken lukebakken force-pushed the lukebakken/rabbit_disk_monitor-improvements branch 3 times, most recently from e0a1d4c to de8eddc Compare December 16, 2021 13:51
@lukebakken lukebakken marked this pull request as ready for review December 16, 2021 14:06
@lukebakken lukebakken marked this pull request as draft December 16, 2021 14:07
@lukebakken lukebakken force-pushed the lukebakken/rabbit_disk_monitor-improvements branch from d531182 to 545da1a Compare December 16, 2021 15:24
@lukebakken lukebakken marked this pull request as ready for review December 16, 2021 15:24
@lukebakken lukebakken force-pushed the lukebakken/rabbit_disk_monitor-improvements branch from 545da1a to be691a0 Compare December 18, 2021 16:20
@lukebakken lukebakken marked this pull request as draft December 18, 2021 20:53
@lukebakken lukebakken marked this pull request as ready for review December 19, 2021 01:59
@lukebakken lukebakken marked this pull request as draft December 20, 2021 13:50
@lukebakken lukebakken marked this pull request as ready for review December 20, 2021 15:14
@lukebakken
Copy link
Collaborator Author

I added the ETS table after running this command to test:

lists:foreach(fun (_) -> rabbit_disk_monitor ! update end, lists:seq(0, 8)).

Since the infinity timeouts are removed (a good thing), repeated updates will cause gen_server:call calls to time out because retrieving FS information on win32 isn't particularly fast.

Since the minimum update interval is 100ms I figured better safe than sorry.

@lhoguin
Copy link
Contributor

lhoguin commented Dec 20, 2021

Can the ets:lookup functions be called before the value is first computed, and if so, can that cause an issue?

@lukebakken lukebakken marked this pull request as draft December 20, 2021 15:46
Related to VESC-1015

* Remove `infinity` timeouts
* Improve free disk space retrieval on win32

Run commands with a timeout

This PR fixes an issue I observed while reproducing VESC-1015 on Windows
10. Within an hour or so of running a 3-node cluster that has health
checks being run against it, one or more nodes' memory use would spike.
I would see that the rabbit_disk_monitor process is stuck executing
os:cmd to retrieve free disk space information. Thus, all
gen_server:call calls to the process would never return, especially
since they used an infinity timeout.

Do something with timeout

Fix unit_disk_monitor_mocks_SUITE
@lukebakken lukebakken force-pushed the lukebakken/rabbit_disk_monitor-improvements branch from 3e0af4c to 89aa4f7 Compare December 20, 2021 16:41
@lukebakken lukebakken force-pushed the lukebakken/rabbit_disk_monitor-improvements branch from 89aa4f7 to 2b63bae Compare December 20, 2021 16:42
@lukebakken lukebakken marked this pull request as ready for review December 20, 2021 17:45
@lukebakken
Copy link
Collaborator Author

@lhoguin thanks for the reviews. All set for the next one.

@lukebakken lukebakken merged commit dcb8e0f into master Dec 21, 2021
@lukebakken lukebakken deleted the lukebakken/rabbit_disk_monitor-improvements branch December 21, 2021 14:42
lukebakken added a commit that referenced this pull request Dec 21, 2021
@michaelklishin michaelklishin added this to the 3.9.12 milestone Dec 22, 2021
lukebakken added a commit that referenced this pull request Jan 8, 2022
Fsutil has language-specific messages. Fix by using wmic.exe instead.

Follow-up to #3895

Reported here:
https://groups.google.com/g/rabbitmq-users/c/ypk51AtmrSM
@lukebakken lukebakken mentioned this pull request Jan 8, 2022
lukebakken added a commit that referenced this pull request Jan 8, 2022
Fsutil has language-specific messages. Fix by using wmic.exe instead.

Follow-up to #3895

Reported here:
https://groups.google.com/g/rabbitmq-users/c/ypk51AtmrSM
lukebakken added a commit that referenced this pull request Jan 8, 2022
Fsutil has language-specific messages. Fix by using powershell.exe instead.

Follow-up to #3895

Reported here:
https://groups.google.com/g/rabbitmq-users/c/ypk51AtmrSM
michaelklishin added a commit that referenced this pull request Jan 10, 2022
mergify bot pushed a commit that referenced this pull request Jan 10, 2022
Fsutil has language-specific messages. Fix by using powershell.exe instead.

Follow-up to #3895

Reported here:
https://groups.google.com/g/rabbitmq-users/c/ypk51AtmrSM

(cherry picked from commit fd78144)
michaelklishin added a commit that referenced this pull request Jan 10, 2022
Fix issue with fsutil

(cherry picked from commit 98539b9)
michaelklishin pushed a commit that referenced this pull request Mar 18, 2022
Fsutil has language-specific messages. Fix by using powershell.exe instead.

Follow-up to #3895

Reported here:
https://groups.google.com/g/rabbitmq-users/c/ypk51AtmrSM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants