Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't snapshot anymore after a "snapper rollback" #159

Open
moy opened this issue Mar 13, 2015 · 21 comments
Open

Can't snapshot anymore after a "snapper rollback" #159

moy opened this issue Mar 13, 2015 · 21 comments

Comments

@moy
Copy link

moy commented Mar 13, 2015

Hi,

I messed-up my system, and did a "snapper rollback " and rebooted to get back to a consistant state. My system is repaired, but now, my default subvolume is not the root anymore, and snapper stops working.

$ sudo snapper --verbose -c root list  
Type   | # | Pre # | Date | User | Cleanup | Description | Userdata
-------+---+-------+------+------+---------+-------------+---------
single | 0 |       |      | root |         | current     |         
$ sudo snapper -c root create          
IO Error.
$ sudo tail -4 /var/log/snapper.log
2015-03-13 18:41:51 ERR libsnapper(22492) Snapshot.cc(initialize):366 - reading failed
2015-03-13 18:41:51 ERR libsnapper(22492) Btrfs.cc(openInfosDir):213 - .snapshots is not a btrfs snapshot
2015-03-13 18:42:04 ERR libsnapper(22492) Btrfs.cc(openInfosDir):213 - .snapshots is not a btrfs snapshot
2015-03-13 18:42:33 ERR libsnapper(22492) Btrfs.cc(openInfosDir):213 - .snapshots is not a btrfs snapshot
$ sudo ls /.snapshots/
$ 

The subvolumes are still there, but not visible in /.snaphots since I'm now using a different root:

$ sudo btrfs subvolume list -a /
ID 268 gen 3943 top level 5 path <FS_TREE>/.snapshots
ID 269 gen 1766 top level 268 path <FS_TREE>/.snapshots/1/snapshot
ID 273 gen 1806 top level 268 path <FS_TREE>/.snapshots/5/snapshot
ID 274 gen 1808 top level 268 path <FS_TREE>/.snapshots/6/snapshot
...

I'm not sure what's the expected behavior here (nor how I'm supposed to get back to a system that both works and is able to continue snapshoting), but either what I'm seeing is a bug, or at least the documentation of "snapper rollback" should be updated to explain the user that one of the consequences of doing a "snapper rollback" is to set the system to boot into a subvolume that doesn't show other snapshots and can't do snapshots anymore.

$ snapper --version
snapper 0.2.4
flags btrfs,lvm,ext4,xattrs,rollback,btrfs-quota
@moy
Copy link
Author

moy commented Mar 13, 2015

OK, I think I understand what happens: one needs to have a dedicated entry for /.snapshots/ in /etc/fstab like

UUID=<id-of-the-volume> /.snapshots btrfs subvol=.snapshots 0 1

I'd suggest two things:

  • Make this clearer in the documentation. The man page currently says "Rollback [...] requires a properly configured system." and gives no hint on what "properly configured" means and where to find this information. I could just find "e.g. mountpoint for subvolumes must be included in /etc/fstab, esp. for /.snapshots" here http://snapper.io/2014/04/29/rollback.html but it could be much clearer.
  • When the user is in the situation described above, snapper could issue a much better error message than "IO Error".

In any case, thanks for such great piece of software!

@alexduf
Copy link

alexduf commented Dec 18, 2015

Thanks @moy this actually saved me hours of research!

@lapsio
Copy link

lapsio commented Jan 21, 2016

I'm having the same error after rollback on external drive with btrfs (not registered in fstab). How do I solve this problem without fstab? I can't add drive entry to fstab for number of reasons.

I just want to scratch all changes to my ext drive permanently - especially timestamps which don't seem to be preserved by undochange because I actually damaged mostly modification dates of files.

@Ra72xx
Copy link

Ra72xx commented Jun 9, 2016

As I run into this problem with a relatively fresh installed openSUSE Tumbleweed system (installed about two weeks ago), I fear that this problem is still open. Is it possible that the much-famed snapshot system per default fails because of a missing mount point entry?! Is there a reason that this mountpoint isn't set up automatically?

@teotikalki
Copy link

teotikalki commented Apr 15, 2017

I just experienced this after installing Snapper in Debian Jessie. I reverted to a snapshot and it broke snapper, just as described above. "btrfs subvolume list -a /" returns a list of the snapshots that should be available. "snapper --version" gives:
snapper 0.4.1
flags btrfs,lvm,ext4,xattrs,rollback,btrfs-quota,no-selinux

I'd like to add the the rollback itself worked perfectly and un-broke my system after I broke it trying to satisfy dependencies, which is exactly what I wanted it to do. On that note, thanks.

@Ra72xx
Copy link

Ra72xx commented Apr 15, 2017

I gave up on BTRFS as a whole. It's simply not everyday-usable for the average user. The half-ready snapshot issue described in this thread for me was only one symptom. (ext4 with rsnapshot and backups seems more reliable.)

@teotikalki
Copy link

@Larx The problem here isn't with btrfs, it's with snapper.

@Ra72xx
Copy link

Ra72xx commented Apr 16, 2017

Probably not even snapper, but the default configuration (and mountpoint/subvolume selection) of openSUSE. For me it was a great disappointment that the first time I really needed a rollback, I was completely left alone and searched for hours until I found in this thread how to get back a running system when "rolling back". (As the snapshot feature is much hyped, I had more or less expected it to work out of the box with some nice YaST frontend.)

@teotikalki
Copy link

@Larx I can understand your frustration and that is good to know. I had been thinking that snapper and rollbacks was a strong selling point for SUSE (I'm a Debian person myself).

With a little bit of clarification the solution offered by @moy worked:
UUID=id-of-main-root-volume /.snapshots btrfs subvol=.snapshots 0 1

(I played around a bit at first, thinking that it wanted the UUID assigned by btrfs to the .snapshots subvolume. To make it work enter the UUID of the partition containing the / filesystem.)

@Larx I'm not using SUSE and have no YaST, so my expectations weren't the same as yours. I manually added snapper-gui and have been playing with it to make it work properly. I've been documenting my steps/results so that I can possibly make a .deb package out of it; I've never done this before but have wanted to learn and I feel that the functionality added by btrfs/snapper is worth it. I actually strongly feel that if it was implemented smoothly it would be a critical part of the Linux user experience PERIOD. The ability to painlessly do rollbacks takes away the risk of installing unknown software, playing with dependencies, etc. (It even lets me undo browser updates that break my extensions and destroy my user experience. ;) )

@amerlyq
Copy link

amerlyq commented Aug 12, 2017

Actually, the necessity for the mountpoint was mentioned in tutorial: http://snapper.io/2014/04/29/rollback.html
Why it's not in defaults of openSUSE -- who knows.

So far this only works with btrfs and the system must be properly configured, e.g. mountpoint for subvolumes must be included in /etc/fstab, esp. for /.snapshots.

On the other hand, there is two distinct approaches to the problem of initial revert:

  1. First one is git-like: you have staging subvolume e.g. /root and history of it in mounted /.snapshots.
    When system is broken -- you simply replace staging by one of snapshots.
    It's easy, but then you must manually manage (mount-rename-delete) your broken /root if you want to track it later, as it's outside of snapper's snapshots. Deleting it by default isn't great at all.

  2. Second streamlined scheme is like the versioning: forget about using normal staging /root and use rollback scheme from the start. E.g. create your filesystem directly as rw /root/.snapshots/0/snapshot. Then on each snapshot rename current fs to next number and copy read-only version as previous one. But it's impossible -- because it requires unmount/mount. Moreover quota policies will be messed up by keeping current staging inside snapshots -- therefore you can't limit snapshots-only quota size anymore.

Therefore, current snapshots layout is a mess. Better variant, as sublimation of this thread https://bbs.archlinux.org/viewtopic.php?id=194491 would be:

  1. copy-delete /root as read-only snapshot on rollback under snapper management into /.snapshots/nextN/snapshot, attaching there some default info.xml and then copying current working snapshot back into /root. But why it was't done that way from beginning is out of my understanding.
umount /.snapshots  # mounted from /@snapshots
mount -o subvol=/ /mnt
btrfs subvolume snapshot -r /mnt/root /mnt/@snapshots/nextN/snapshot
btrfs subvolume delete /mnt/root
btrfs subvolume snapshot  /mnt/@snapshots/nextN-1/snapshot /mnt/root

Please, @aschnell, comment on (3) why it wasn't adopted as default variant.
Edit: this approach must be viable, as we do snapper rollback on system already booted from some other snapshot and not from /root, so I'm interested in any unobvious reasons to deprecate this variant.

@lapsio
Copy link

lapsio commented Aug 12, 2017

After setting up plenty of snapshotted btrfs systems i came to conclusion that easiest, most reliable and relatively fastest way to perform full rollback is to boot from liveCD, rm -R /snapshotted/dir and then cp --preserve=all --reflink ./.snapshots/xxx/.... I saved plenty of server systems this way. I used simplified snapshots scheme comparing to default opensuse choice but it should be just a bit more complex with more subvolumes. Basically i'd recommend to keep all stuff that we want to snapshot/rollback as one subvol and the rest as other subvols. It makes described operation really easy.

In easiest case scenario i kept everything apart from /var/log in one snapshotted subvol and /tmp in tmpfs. Full system recovery was accomplished in 2 commands:

rm -Rf /mnt/*
cp -Rnv --preserve=all --reflink /mnt/.snapshots/xxxx/snapshot/* /mnt/

In around 30 seconds system was up and running again (after power failure in the middle of kernel update fml)

@lapsio
Copy link

lapsio commented Aug 12, 2017

Ah one note - it has to be done from liveCD - i tried to do such thing on live system but it didn't survive and crashed in the middle of some critical files overwriting (still after booting liveCD and doing above it worked)

@amerlyq
Copy link

amerlyq commented Aug 12, 2017

@lapsio, this snapper rollback issue is about rollbacking of whole / system from snapshot. Not some dedicated /snapshotted/dir, for which your method works best (but using liveCD -- it depends). And if we are talking about subvolumes, not some dedicated dirs -- then btrfs subvolume snapshot source [dest/]name is faster and most preferable solution than some rm/cp --reflink crutch. Of course, if only you read recommendations and created subvol=root for your system's root instead of dumping all your files directly into subvol=/ for the sake of fstab convenience -- in which case you are out of options and must suffer yourself, without advocating half-hearted solutions. Which, however won't explain why you could't use btrfs subvolume set-default subvolume-id / from the start and avoid problems altogether.

Edit: not mentioning that on system with enabled quotas by doing rm/cp --reflink you may temporarily hit quotas and they will hit you hard http://www.spinics.net/lists/linux-btrfs/msg58385.html.

@thomas-merz
Copy link

Will there be a solution anytime for this issue? I also have this problem, that "snapper cleanup" isn't functional anymore after a rollback :-(
https://www.opensuse-forum.de/thread/64330-snapper-cleanup-nach-rollback-nicht-mehr-m%C3%B6glich/

@okurz
Copy link
Member

okurz commented Nov 30, 2021

We also observed the problem in https://progress.opensuse.org/issues/102942 in our SUSE internal openQA infrastructure on one of our machines after a rollback.

@thomasmerz
Copy link

So this means, that we might get a fix for this after being patient so many years? 🙏🏻

@okurz
Copy link
Member

okurz commented Nov 30, 2021

Sorry but I am just reporting where we saw the very same problem so I am not speaking for developers of snapper here. And at the very least by adding our story we can learn more, at least about workarounds and best practices :)

@thomasmerz
Copy link

My hope was/is:
If SUSE themself ran into this problem and now can see and feel what some OpenSUSE users are experiencing since a long time, the problem might get fixed "soon".

@okurz
Copy link
Member

okurz commented Nov 30, 2021

Sure, but it's not like SUSE is one person ;)

@thomas-merz
Copy link

I know 🤣 I've been at your office in Nuremberg some times for the "SUSE Expert Days" 👍
But if this problem is not "far away" on customer/community side anymore, but within SUSE's infrastructure, the "pain" might be much bigger to fix this finally. Please don't take away my hope… 😉

@Martchus
Copy link

Martchus commented Dec 1, 2021

That's my conclusion: https://progress.opensuse.org/issues/102942#note-15
So our problem was a different one after all (nested subvolumes were present but those were created by docker).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests