-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs dataset becomes stuck #8321
Comments
I think I'm getting the same issue; We see it sporadically happen when running Spark inside Docker containers. We don't have a way to reproduce it intentionally either; it just seems to happen, and then writes to the zpool stop or come to a trickle. Update 1For some added information. It seems like it might have a correlation with Docker + ZFS. (I'm running Docker CE 18.09.2). I was running Here' is a zpool history of the operations it performed
@Machske Are you running nodejs inside of a docker container? Logs and other Info
|
Thanks for commenting, from the posted stacks it looks like there may be some issue reclaiming memory. But there's unfortunately not enough information to really understand why that's the case. If you do encounter this again, it would be great if you could dump all of the blocked kernel stack traces for analysis. You can do this by running |
Thanks for the quick response @behlendorf. I've attached the stack traces here: |
@justenwalker In regards to your question about nodejs in a container. Yes we are running nodejs procs in 1 or more containers. |
@justenwalker thanks. It looks to me like you're hitting issue #7939 which is resulting in the deadlock. This was fixed in in the master branch with 779a6c0 and that fix was backported for the 0.7.13 release. I'd suggest updating to 0.7.13 release if you can. |
Thanks @behlendorf - We use Ubuntu 16.04, so it looks like this package will not be available through official channels. Not sure we'd move to unofficial channels in order to get upgrade to 0.7.13 (as this isn't even in the latest ubuntu release) Is there any way that we can work-around this bug beyond updating to the 0.7.13? |
@justenwalker I'd like to point out both for your own information and for others that you can install the Ubuntu "LTS Enablement Stack" on a 16.04 system which mainly just installs a new kernel (on servers) and will give you the same version of ZoL that's used on 18.04. It's generally the very latest 0.7.X. |
Thanks @dweeezil We're on Update: tried installing the hwe kernel for 16.04. While it didn't seem to break anything, it also didn't install any newer version of ZFS. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
This server has 303 datasets in 1 zpool
One specific dataset became "stuck". The command "ls" hangs on that specific dataset, though other datasets appear to function correctly.
This issue appeared to have started after running a npm process (nodejs) which writes huge number of files.
Describe how to reproduce the problem
There is no specific way that we can do to reproduce. Running the same command after a reboot runs succesfully on the same dataset.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: