New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot destroy snapshots on full drive: "internal error: Channel number out of range" #9849
Comments
Ok, I think the |
Ok, interesting. This seems to be a regression! I pulled the drive and stuck it in another machine running the same version (0.8.2-pve2), and had the same issue. I then put it on a older proxmox (v5.4) machine:
and I can delete the snapshots without issue! |
This issue appears to have been caused by your pool being entirely out of free space. Normally, this isn't a problem because a percentage of total capacity is reserved to ensure administrative commands like What I suspect happened is that the recursive snapshot was created while the pool was already at, or near, 100% reported capacity. This operation, and possibly others, resulted in a significant amount of the reserve space being consumed. If more than three quarters of the reserve space is used commands such as The latest versions of ZFS used Channel Programs internally to more quickly handle removing a large number of datasets in one command. This is why you saw that internal "Channel number out of range" error when the channel program unexpectedly failed. Trying an older version of ZFS was a good idea, since internally things are a little different so you managed to not exceed this limit. You might have also been able to successfully remove the snapshot by setting the I'd suggest trying not to run your pool at <95% capacity if possible. That will avoid this kind of issue and improve performance. Regardless, I've gone ahead and tagged this as a defect since internally ZFS should always keep a large enough reserve of free space to avoid this. We may way to look at having |
Without the snapshots, the pool is not full:
There's 80GB free (so it's ~83% full), after clearing out the snapshots. |
I'm also observing this problem on a pool that was less than 95% full before making the snapshots. When the pool uses encryption, using 0.7.X is not an option, so this bug seems to be able to permanently wedge ZFS pools (as EDIT: Increasing |
I'm totally going to run into this situation soon-ish, pool was 95-97% full recently (3 TB) (spa_slop_shift is already at 6) thanks to @loli10K for a suggested fix 👍 |
--If you're using a 3TB drive that's nearly full, I suggest replacing it with a 4TB or larger. ' zpool set autoexpand=on poolname ' --I haven't tried this with a single-disk zpool, but should be able to simulate with a file-backed zpool. |
Update: I just verified this works OK with a single file-based pool on OSX. You should see more free space immediately when the resilver finishes. NOTE I haven't tried doing this on a live rpool (I don't use them) so it might be best to power off the system and try this from a rescue environment... |
I got the same problem but I don't know where the space went. There were 5TBs left and in one hour gone. The pool is on LVM and I could raise each vdev from 3T to 3.1T but it couldn't get more space. The VMs were still working and I could shutdown them and vzdump makes still a backup.
|
I have found some old data from my console. I don't know how could it raise so much und where is the space exactly? This is (1-2 days) before the disaster:
And this after:
|
Export/import does not work. I'm not on site, so I backup with 100MBit for the next 30h one way.
|
FYI: I just encountered this same error ("internal error: Channel number out of range") while attempting to destroy snapshots on version 2.0.4:
Although I was able to free some space and workaround the issue. |
So, I have the exact same problem, I think I happened after upgrading my ZFS version, I had not used sparse volumes and something happened after the upgrade and now I am unable to do anything other than read the datasets, I can not null a file, not set down refreservation, destroy snapshot or datasets or anything. Is there anyway to fix this other than to destroy my pool? And my pool is not full, but refreservations is taking the space, zpool list says I have 8TB of free space. I am running omnios-r151038-96eabf6ba4. |
zfs version zfs destroy -f rpool/vm-202-disk-2 zfs destroy -r rpool/vm-202-disk-2 |
I'm pretty sure I encountered this on Saturday - I neglected to save all the pertinent information, but I'll post what I have FWIW. It happened in a single vdev pool of 250GB where the vdev is a single LUKS device. The remaining free space was probably just a few gigabytes. I knew it was low on disk space, so I wanted to move some VM-related zvols to a different pool, and I ran The root filesystem and Most of my zvols accidentally had the default refreservation enabled, which I had not intended. After running zfs-snapshot I tried to zfs-send the snapshots to the target pool, but I got either an internal error or "no space left on device" (I unfortunately can't remember which). Then I tried to remove some bigger files on the filesystem to free up space, but I booted into a rescue environment and sent all the pool's datasets (not snapshots, which wasn't possible) to another pool, recreated the full pool ( The host is running Void Linux/musl.
Layout of the pool in question (this was run after recreating the pool, so it's not the original):
The pool's attributes (from after recreating it, but should be the same or very similar):
It was created from my rescue environment running While in the rescue environment, I set It was pointed out on IRC that zpool-checkpoint could have made backtracking trivial, which I'm guessing is good advice for anyone who is doing maintenance on pools with very little space left. |
Just for future reference, I wanted to mention that as of the 2.1.3 release the low space handling has been further improved, commit 145af48. Depending on how overcapacity your pool was you may still have encountered issues, but this should help in the future when trying to remove files to free up space. |
I've managed to get into a similar situation where my zpool still has free space but zfs is full because of snapshots. The zvol are still writeable because only the root zfs is full, the volume itself still has free space. Destroy snapshots fails with:
When the zfs is such full not even a
It fails with ECHRNG and error message "Could not open pool: data" For ECHRNG see https://github.com/openzfs/zfs/blob/master/lib/libzfs_core/libzfs_core.c#L1509 |
Further my zpool seemed to run into very low slop space. Changing spa_slop_shift didn't changed anything. I've managed to remove the snapshots by changing the space requirement for zcp and zfs.sync.destroy. Be careful here. It (seemed) worked for me. But I'm not a zfs developer and my insight to zfs a very new! I don't know if there are further implications when running such low on slop space.
|
I had this happen to me when taking a snapshot of large, non-sparse zvols (ie,
Setting |
System information
Describe the problem you're observing
So I have a zpool that's degraded. I then tried to create snapshots to do a backup. I snapshotted the whole pool
sudo zfs snapshot -r rpool@dump
I now have a pool that's wedged, and I cannot do anything.
I'm not sure how to reproduce this, and I currently have a server that's basically totally hosed.
I cannot do anything.
(rm vzdump-qemu-104-2019_04_22-23_31_57.vma is a large file I was hoping to delete to clear up space)
I'm not sure what's going on. The underlying zpool apparently has
51%
utilization, but all the zvols are 100% full. Additionally, there are no quotas that seem to be causing this:Additionally, I have no idea where
internal error: Channel number out of range
is coming from. It doesn't appear to exist in the current source.The text was updated successfully, but these errors were encountered: