-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(reproducable)After the pool is imported, it will "Out of memory:killed process" and freeze the operating system #16322
Comments
I forgot to mention that executing 'zpool import -Ff pool0' will not solve the problem, it will trigger the same problem. |
|
Did you try to limit the memory allocated by ZFS? Also maybe this module parameter could be useful: |
@AllKind Thank you very much, I was unaware of these zfs parameter Settings. I tried the following: // Here is the original default value
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_max
1671430144
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_sys_free
0
// 0.5GB *1024*1024*1024 = 536870912
root@pve1:~# echo 536870912 > /sys/module/zfs/parameters/zfs_arc_max
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_max
536870912
root@pve1:~# cat /sys/module/zfs/parameters/zfs_scan_strict_mem_lim
0
root@pve1:~# echo 1 > /sys/module/zfs/parameters/zfs_scan_strict_mem_lim
root@pve1:~# cat /sys/module/zfs/parameters/zfs_scan_strict_mem_lim
1
// 8GB *1024*1024*1024 = 8589934592
root@pve1:~# echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_sys_free
root@pve1:~# cat /sys/module/zfs/parameters/zfs_arc_sys_free
8589934592
root@pve1:~# zpool import -Ff pool0
root@pve1:~# Connection to 192.168.1.2 closed by remote host. The system still crashes after the pool is imported. Animotica_5_7_17_44_48.mp4 |
I don't think zfs_scan_strict_mem_lim has any context here. Generally I'd apply the parameters at module load time (/etc/modprobe.d/ or similar - distro dependent). So you are allowing the ARC to be 0.5 GB of max size and then you tell ZFS to give 8GB to other applications if needed. |
Hello, thank you very much for your help. I set up /etc/modprobe.d/zfs.conf and restarted the computer. 'arcstat' as shown below, it seems possible that zfs can use 8GB of memory? I'm not sure if the Settings are in effect. Try importing pool0 again, still crashing. Can I ask you a question? Currently, when I reboot, the system automatically mounts pool0 and crashes. I currently start the ubuntu 2404 iso system, manually mount pool0, and start pve again after ubuntu crashes. Since pool0 had an ubuntu mount unreleased record, pve did not automatically mount pool0, so the system did not crash immediately and I had time to modify the zfs Settings. In addition to this method, is there any other way to make pve boot does not automatically mount pool0?
|
The pve system is primarily a storage requirement with only one internal lxc container that provides smb/sync. Sorry, because I don't know much about zfs, I'm not sure if the parameters I just set are correct. |
I have removed this parameter. I am not familiar with the zfs parameter, but I just read the documentation and thought it might be useful before adding it.以上翻译结果来自有道神经网络翻译(YNMT)· 通用场景 |
That would be the default on linux to use half the available system memory. As you also use ZFS on root, I'm not sure how the behavior is with /etc/modprobe.d.
I don't know. |
The same problem appeared 5 hours ago on the same version of PVE and kernel. Memory limits - useless. The problem occurs when importing a large 44TB pool |
try this and show please:
as you can see Anonymous metadata size: 99.4% 50.1 GiB - this is incorrect. on our other pve it does not exceed 400kb |
hello, My guess is that root pool has nothing to do with it, as ubuntu 2404 also suffers from the same crash problem.
|
The recording above suggests that it's claiming the slabs for zio_buf_comb_4096 is 9 GiB, which is, I think, basically all your RAM. My wild blind guess would be that it's issuing a lot of IOs and for some reason the old buffers from them aren't being reaped from the cache before you OOM. Maybe the kernel in newer versions changed some logic about how that process works, and that's why it's not biting people on older versions? |
@wangxinmian do you have an older kernel (preferably older than 6.6) available in your distro to test? |
@wangxinmian |
Using ubuntu 23.10.1 iso system test, the system is still stuck.The graphical interface does not display an error message, but is completely unresponsive. disable Multi-Gen LRU:
|
As it seems it's not that easy to limit the memory usage of the pool import, I'd suggest to try it with swap and oom settings. If you do not have a dedicated partition (disk) for swap, you can use a file as swapfile on zfs. This guide I think is a good starting point: |
Thank you very much for your help. I am very sorry that I have network problems and cannot access the connection you provided. I tried to add a swap partition and import pool0. The screen is not showing any error messages such as out of memory killing processes, but both ssh and the native console are not responding. I'll wait a day or so to see if the import completes successfully.
|
There's a couple of directions I'd like to tackle this from. I need some info to get started. Could you capture Has this problem only started since using 2.2.4? What the was the last "good" version that it did work on? What about kernel versions? When you say you tried Ubuntu, can you please confirm that that was with the kernel and ZFS that comes with it, that is, not the PVE builds? (I expect so, since you say it was an ISO; I just want to be sure). |
It has been one day and the system is still not responding. |
Thank you very much. I am not familiar with linux and zfs, please let me know if you need any more information. Below is output at 0.5 second intervals (/proc/sp/kmem/slab and /proc/meminfo), containing records from before import until out of memory crash. I used to use freenas core and freenas scale and didn't have this problem. But I'm not sure if it's just because there's no trigger that I don't encounter this problem. There was a question earlier that I wasn't sure would make a difference: So far I have tested ubuntu-24.04-desktop-amd64.iso (should be zfs 2.2.4), ubuntu-23.10.1-desktop-amd64.iso(ZFs-2.1.0-RC3-0Ubuntu4) and ubuntu-22.04.4-desktop-amd64.iso(zfs-2.1.5-1ubuntu6-22.04.2 zFs-kmod-2.0-0Ubuntu1-23.10) has a crash problem after importing the pool. Also, I am not sure if it is due to pve deleting large files that pool0 has been damaged, and no version of zfs can be imported into the pool after that. |
Thanks for that info. These are the time points where things went from "fine" to "very bad", and the specific memory caches that blew up:
It's pretty clear that this is a ton of 512B and 4K IO being issued at high speed, and it doesn't stop. Obviously that shouldn't happen.
If you've got output from this crash, it would be very useful! If this was the only report, I would guess that there's some damage in the pool that is causing something to run in a tight loop, blasting out IO until all memory is consumed. From the sizes, I'd guess raidz IO. But! #16325 reports basically the same issue, on the same kernel and OpenZFS versions. That suggests something subtle has changed in an interaction between OpenZFS and the kernel, and OpenZFS is responding incorrectly. I can't help any more right now; time for bed here. If no one else is able to help out overnight, I'll have a bit more of a think about it tomorrow. |
Conceivably, you could try setting |
It seems that setting options spl spl_kmem_cache_slab_limit=0 will not start, I used ubuntu.iso to try to remove it.
|
I tried to see if I could get the crash, which I suspect was a little difficult. |
I may have hit that, I forget, but I didn't think that would break that horrendously. You could try setting it below 512, to like 511, but not 0, so it still uses Linux's caches for very small things, maybe it's doing something foolish like trying to build 4k caches for its own metadata for caches out of itself. |
Thank you for your help. |
Set this value to 511 after the operating system is started. If zpool import pool0 is executed, the memory still crashes.
|
Hi! I have a similar problem, I created a separate theme. My pool is mounted read-only successfully. Important note! If I import the pool as read-only, everything works fine, and the RAM is not consumed. It's a pity that no one answers me in my topic (( The ARC size (current) is ten times larger than the Max size (zfs_arc_max), it will result in "Out of memory: killed process". #16325 |
I posted in your thread a link to this one, suggesting I thought it was a duplicate, and that also causes it to make a link in this thread to that one, if you scroll up. Usually what happens when that is suggested is people keep debugging in the one that's more active until enough evidence exists to test whether that's a duplicate or not, often by fixing the problem and seeing if the other people's problems went away. If you'd like support in a more timely fashion than random volunteers are providing, I believe various companies out there will sell you support on demand, though I have no idea what their rates are. |
Thank you very much. I can also read only mount here, but the amount of data is too large, it is not easy to rebuild the pool after backup. |
Did you try import pool not from pve? From ubuntu or different OS? Probably at different OS it imports fine? |
Thank you very much for your reply So far, I have tested ubuntu-24.04-desktop-amd64.iso (should be ZFs 2.2.4) ubuntu-23.10.1-desktop-amd64.iso(ZFs-2.1.0-RC3-0Ubuntu4) and ubuntu-22.04.4-desktop-amd64.iso(zfs-2.1.5-1ubuntu6-22.04.2 zFs-kmod-2.0-0Ubuntu1-23.10), will be faulty. I'm not sure if these versions have bugs, or if pve has broken zfs and cannot import under ubuntu. |
System information
Describe the problem you're observing
Immediately after
zpool import pool0
, insufficient memory will be triggered, and many processes will be killed. Then the system completely stops responding.After restarting the operating system, this problem is triggered again after automatic import of the pool or manual import of the pool.
The computer is an hp gen8 microserver with 16GB of memory, and since the device only supports a maximum of 16GB of memory, I couldn't increase the memory.
An attempt to add a 100 GB swap partition file to root pool rpool did not work for this out-of-memory problem.
I tried to start the ubuntu 24.04 iso desktop operating system to mount this pool, also failed, the graphical interface is stuck and no response, I can not see the error message.
I have a screen recording of the crash in pve OS, I slowed it down 0.25x, not sure if this one has anything useful in it.
I checked journalctl, which logs when there is no suspected failure, maybe the system crashed without retaining the information.
Sorry, I'm not a linux professional and don't know what other information would help with this error. If you need any action, please let me know so that I can reproduce the error again and collect the information.
Describe how to reproduce the problem
pool0 is a pool of four 12 TB disks with a capacity of 20TB. Since there's only 300gb of free space in the pool, I'm trying to free up space. Manually deleting a 1Tb file triggered the problem.
Include any warning/errors/backtraces from the system logs
Screen recording when ssh executes import operation:
https://github.com/openzfs/zfs/assets/174770717/eba99562-5c0c-4cf5-9deb-ba045992a4df
Some screenshots captured from the screen recording at the time of failure (for easy observation) :
journalctl.log:
journalctl.log
The text was updated successfully, but these errors were encountered: