Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refuse systemd-tmpfiles --purge invocation without config file specified on cmdline #33349

Closed
jedenastka opened this issue Jun 14, 2024 · 21 comments · Fixed by #33350, #33383 or #33393
Closed

Comments

@jedenastka
Copy link

jedenastka commented Jun 14, 2024

systemd version the issue has been seen with

256

Used distribution

Debian Unstable

Linux kernel version used

6.8.12-amd64

CPU architectures issue was seen on

x86_64

Component

systemd-tmpfiles

Expected behaviour you didn't see

I noticed my /var/tmp directory grew quite a bit and wanted to manually clean it. Since I was about to log off for the day, I skimmed through the manual for systemd-tmpfiles(8).

--clean
If this command is passed, all files and directories with an age parameter configured will be cleaned up.

--remove
If this command is passed, the contents of directories marked with D or R, and files or directories themselves marked with r or R are removed unless an exclusive or shared BSD lock is taken on them (see flock(2)).

--purge
If this option is passed, all files and directories created by a tmpfiles.d/ entry will be deleted.

Added in version 256.

Not knowing much about the systemd-tmpfiles architecture, other than that it's being used to clean up temp files, --purge seemed like a good idea. So, I ran sudo systemd-tmpfiles --purge, expecting it to clean my temp files.

Unexpected behaviour you saw

A lot of warning messages started appearing, which included paths in /home (it couldn't restore modification times...?). What's a temp-cleaning tool doing in my home directory? That's no good. My heart started beating faster and I hit Ctrl-C as fast as I could.

Turns out, a good portion of my home directory got deleted. Thankfully, it seems like it started from the config files and not the actual data, though I am still not sure if I lost anything important. I turned off the machine to recover the data later with extundelete (it didn't work btw, it instantly crashes for some reason; I do have backups, but they are a bit outdated, I'm filling drives way too fast).

I talked about this on #debian-next, and seems that this is not a bug, but a feature, as systemd-tmpfiles also handles autocreating data directories such as /home, and the --purge flag is meant to wipe them clean. I am unsure what's the utility of that, but I assume there is a good reason.

What I do have an issue with is the documentation. It does explain what the options do, technically speaking (well, I assume so - I don't know a lot about the architecture here), but it doesn't explain why they do what they do. As someone just browsing the documentation, the three options seem to do the same thing - cleaning temp files, though --purge seems "more throughout" as it does mention it removes all files (including user data).

This kinda wraps back to the "Expected behaviour you didn't see" section, but sometimes the rigid form of the issue templates doesn't fit everything - so, what I would want to see is:

  1. An explanation of why a given option does something, for example:

    --clean
    Cleans temporary files. [More in depth explanation follows...]

    --remove
    Cleans temporary files that [I don't know what's the difference with --clean, but explain it]. [...]

    --purge
    Removes all user data. [...]

    As a user, I don't know what those are actually intended to be used for, but as the developers, you probably do, and can describe them better than I did ;) I don't think this is specific to systemd-tmpfiles, it probably should be used as a guideline on how to write manuals project-wide (and honestly, industry-wide).

  2. A huge warning next to --purge. This option is dangerous, so it should be made clear that it's dangerous. hdparm(8) is a great example:

    --dco-restore
    Reset all drive settings, features, and accessible capacities back to factory defaults and full capabilities. This command will fail if DCO is frozen/locked, or if a -Np maximum size restriction has also been set. This is EXTREMELY DANGEROUS and will very likely cause massive loss of data. DO NOT USE THIS COMMAND.

    --drq-hsm-error
    VERY DANGEROUS, DON'T EVEN THINK ABOUT USING IT. This option causes hdparm to issue an IDENTIFY command to the kernel, but incorrectly marked as a "non-data" command. This results in the drive being left with its DataReQust(DRQ) line "stuck" high. This confuses the kernel drivers, and may crash the system immediately with massive data loss. The option exists to help in testing and fortifying the kernel against similar real-world drive malfunctions. VERY DANGEROUS, DO NOT USE!!

For me, this would be sufficient, as I check the options I'm using in the manual. Not sure if it should have a confirmation or anything, I don't think it's necessary, even if you do have a confirmation, some people will blindly confirm anyways. Having a confirmation would also be API breakage I assume.

Steps to reproduce the problem

  1. Run systemd-tmpfiles --dry-run --purge (due to --dry-run this is safe to run).
  2. Notice how without --dry-run it would remove all your data.

Additional program output to the terminal or log subsystem illustrating the issue

No response

@jedenastka jedenastka added the bug 🐛 Programming errors, that need preferential fixing label Jun 14, 2024
@bluca
Copy link
Member

bluca commented Jun 14, 2024

Not knowing much about the systemd-tmpfiles architecture, other than that it's being used to clean up temp files, --purge seemed like a good idea.

So an option that is literally documented as saying "all files and directories created by a tmpfiles.d/ entry will be deleted", that you knew nothing about, sounded like a "good idea"? Did you even go and look what tmpfiles.d entries you had beforehand?

Maybe don't just run random commands that you know nothing about, while ignoring what the documentation tells you? Just a thought eh

@bluca bluca added documentation and removed bug 🐛 Programming errors, that need preferential fixing labels Jun 14, 2024
@bluca bluca linked a pull request Jun 14, 2024 that will close this issue
@bluca bluca changed the title systemd-tmpfiles can delete all your files Improve systemd-tmpfiles documentation Jun 14, 2024
@YHNdnzj
Copy link
Member

YHNdnzj commented Jun 14, 2024

Technically speaking, systemd-tmpfiles simply makes use of tmpfiles.d/ definitions to perform purge operations, so the tool itself is nothing wrong.

Yet I understand that people would think sd-tmpfiles as a tool that only manages temporary files, which hasn't been the case for a pretty long time. Our documentation also still states that it's a tool to

creates, deletes, and cleans up volatile and temporary files and directories

which is definitely something we should improve.

@jedenastka
Copy link
Author

jedenastka commented Jun 14, 2024

So an option that is literally documented as saying "all files and directories created by a tmpfiles.d/ entry will be deleted", that you knew nothing about, sounded like a "good idea"?

I know systemd-tmpfiles is (at least) used to clean temp files, I think it's not unreasonable to assume the config entries have to do with explaining which files should be considered "temp" and which ones are not (well, turns out they do, but they also do other things). Especially because the first line of the DESCRIPTION section told me that they indeed are, so I didn't feel the need to check tmpfiles.d(5). Why would I? I wasn't configuring anything, I just wanted to delete temp files.

Maybe don't just run random commands that you know nothing about, while ignoring what the documentation tells you? Just a thought eh

No need to be aggressive. I think the documentation simply depends too much on the knowledge of the subsystem, for users who just want to do something with it. If I read the apt(8) manpage, it doesn't rely on knowledge of how dpkg and Debian packaging works, it just tells me - this installs packages, this removes packages, this removes packages and cleans the configuration. Which is all I need to know. Sure, have in depth explanations on what it does, they are great if you decide to write your own config or you want to contribute to the project, but let people who just need to use it have understandable explanations on why it does that.

Technically speaking, systemd-tmpfiles simply makes use of tmpfiles.d/ definitions to perform purge operations, so the tool itself is nothing wrong.

Yeah, as I said, I realize this is not a bug. It's just a feature that's not well explained.

I did use the bug label because I do think it's a bug in the documentation, and it fits the template best.

which is definitely something we should improve.

Yes, at this point I'm still not sure why /home would be managed by systemd-tmpfiles, and what good use would --purge have. I am sure it's reasonable and useful in some case (more common than wanting to nuke your data), but I don't think it's well explained.

@jedenastka
Copy link
Author

jedenastka commented Jun 14, 2024

While #33350 does address point 2, somewhat, it does not address point 1. I would still like to see a explanation why the options do what they do (in other words, what they are supposed to be used for).

I also would emphasize the "Keep in mind" or replace it with emphasized "Warning:" (which I think sounds more direct) but oh well. The way it is now definitely would have stopped me from using it.

@poettering
Copy link
Member

I think we should fail --purge if no config file is specified on the command line. I see no world where an invocation without one would make sense, and it would have caught the problem here.

@poettering poettering reopened this Jun 15, 2024
@poettering poettering changed the title Improve systemd-tmpfiles documentation refuse systemd-tmpfiles --purge invocation without config file specified on cmdline Jun 15, 2024
@poettering poettering added this to the v257 milestone Jun 15, 2024
@bluca
Copy link
Member

bluca commented Jun 15, 2024

I think we should fail --purge if no config file is specified on the command line. I see no world where an invocation without one would make sense, and it would have caught the problem here.

The actual integration does not specify config files, as they are all gone by the time it is called, instead it passes input via stdin (ie, -), so that also needs to continue working at a minimum.
Also an option to do a full factory reset is something I want to use in the future, but it's ok if you want to add a separate --factory-reset or so switch for that - I'll note that this is exactly how it was first implemented, to be tied to the "system is being factory reset" special state rather than manual invocation, but it was changed as it was because you specifically asked for it ;-)

@istankovic
Copy link

Maybe don't just run random commands that you know nothing about, while ignoring what the documentation tells you? Just a thought eh

@bluca I would kindly ask you to reconsider this attitude. For instance, I happily went on to read the the documentation on tmpfiles.d(5), to clarify what the line

Q /home 0755 - - -

actually does.

And the man pages has this to say:

Q
   Create the subvolume or directory the same as v, but assign the new subvolume to a new leaf quota group. Instead of copying the higher-level quota group assignments from the parent as is done with q, the lowest quota group of the parent   
   subvolume is determined that is not the leaf quota group. Then, an "intermediary" quota group is inserted that is one level below this level, and shares the same ID part as the specified subvolume. If no higher-level quota group exists for
   the parent subvolume, a new quota group at level 255 sharing the same ID as the specified subvolume is inserted instead. This new intermediary quota group is then assigned to the parent subvolume's higher-level quota groups, and the       
   specified subvolume's leaf quota group is assigned to it.                                                                                                                                                                                      
                                                                                                                                                                                                                                                  
   Effectively, this has a similar effect as q, however introduces a new higher-level quota group for the specified subvolume that may be used to enforce limits and accounting to the specified subvolume and children subvolume created within  
   it. Thus, by creating subvolumes only via q and Q, a concept of "subtree quotas" is implemented. Each subvolume for which Q is set will get a "subtree" quota group created, and all child subvolumes created within it will be assigned to it.
   Each subvolume for which q is set will not get such a "subtree" quota group, but it is ensured that they are added to the same "subtree" quota group as their immediate parents.                                                               
                                                                                                                                                                                                                                                  
   It is recommended to use Q for subvolumes that typically contain further subvolumes, and where it is desirable to have accounting and quota limits on all child subvolumes together. Examples for Q are typically /home/ or /var/lib/machines/.
   In contrast, q should be used for subvolumes that either usually do not include further subvolumes or where no accounting and quota limits are needed that apply to all child subvolumes together. Examples for q are typically /var/ or       
   /var/tmp/.                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                  
   As with q, Q has no effect on the quota group hierarchy if the subvolume already exists, regardless of whether the subvolume already belong to a quota group or not.                                                                           
                                                                                                                                                                                                                                                  
   Added in version 228.                                                                                                                                                                                                                          

Now, it takes a lot of patience, re-reading and understanding (of a number o concepts!) to go from this to the real effects of the commands.

At least to me, that was quite far from intuitive. I urge you to consider how the current behaviour may affect thousands of ordinary Linux users and admins, and improve the behaviour so that it is harder to shoot oneself -- by default.

@keszybz
Copy link
Member

keszybz commented Jun 15, 2024

We need to rethink how --purge works. The principle of not ever destroying user data is paramount. There can be commands which do remove user data, but they need to be minimized and guarded. A command that can be easily invoked, by mistake or misunderstanding or some simple error, is something to be avoided. The case of factory reset is a very very niche thing, and the case where you really want to destroy all user data is again something that is so niche, that those features don't need to available with a short call and must be guarded properly. Also note that we have tab completion, so a user might mistype and press enter and call the command in other ways which are just too easy.

(For example, please consider how systemd-repart behaves. You need to specify --factory-reset=yes on the commandline and FactoryReset=yes in the config and --dry-run=no.)

Apart from how the command is invoked, I think the current design needs some rethinking. The command is advertised as "remove files and directories that tmpfiles creates", but the operation is asymmetrical. Tmpfiles does not manage /home, we just use it to "convert" the dircectory to a subvolume is possible. It's fine and expected that --purge removes the contents that it manages anyway, e.g. contents of /var/tmp which are subject to time-based cleanup specified in the tmpfiles.d config. But tmpfiles doesn't manage the contents of /home, so removing the subvolume when it's not empty seems accidental and unexpected.

As is, this is a footgun that we shouldn't provide to our users. I'll file a simple PR to hide the feature for now. We can re-enable it when other details have been figured out. In particular, the documentation needs to be updated in the style of hdparm(8) quoted above, and the option must be changed to be much harder to invoke unintentionally.

keszybz added a commit to keszybz/systemd that referenced this issue Jun 15, 2024
This effectively reverts 81a1838.
Note that the actual implementation was retained, but there is no way to invoke
it once the commandline switch is removed. I expect that the switch will be
restored in a slightly different form, so it doesn't make sense to create churn
by removing all the code.

As discussed in systemd#33349, --purge
provides a footgun for users by allowing them to easily remove contents of
/home. At least one user has done this, which is one too many. Let's disable
the command before this hits stable distributions and more users have a
chance to destroy their data.

Closes systemd#33349.
@keszybz
Copy link
Member

keszybz commented Jun 15, 2024

#33353 is intended as a patch to apply and backport to stable while we rethink how this should work.

YHNdnzj added a commit to YHNdnzj/systemd that referenced this issue Jun 15, 2024
Historically, systemd-tmpfiles was designed to manager temporary
files, but nowadays it has become a generic tool for managing
all kinds of files. To avoid user confusion, let's remove "temporary"
from the tool's description.

As discussed in systemd#33349
keszybz added a commit to keszybz/systemd that referenced this issue Jun 15, 2024
This effectively reverts 81a1838.
Note that the actual implementation was retained, but there is no way to invoke
it once the commandline switch is removed. I expect that the switch will be
restored in a slightly different form, so it doesn't make sense to create churn
by removing all the code.

As discussed in systemd#33349, --purge
provides a footgun for users by allowing them to easily remove contents of
/home. At least one user has done this, which is one too many. Let's disable
the command before this hits stable distributions and more users have a
chance to destroy their data.

Closes systemd#33349.
YHNdnzj added a commit to YHNdnzj/systemd that referenced this issue Jun 15, 2024
Historically, systemd-tmpfiles was designed to manager temporary
files, but nowadays it has become a generic tool for managing
all kinds of files. To avoid user confusion, let's remove "temporary"
from the tool's description.

As discussed in systemd#33349
keszybz pushed a commit that referenced this issue Jun 15, 2024
Historically, systemd-tmpfiles was designed to manager temporary
files, but nowadays it has become a generic tool for managing
all kinds of files. To avoid user confusion, let's remove "temporary"
from the tool's description.

As discussed in #33349
@flokli
Copy link
Contributor

flokli commented Jun 15, 2024

There's been a discussion around the tempfiles name in #12447 (comment), back then it was decided it's too much churn. Glad the "temporary" now at least is gone since #33354 :-)

@poettering
Copy link
Member

Apart from how the command is invoked, I think the current design needs some rethinking. The command is advertised as "remove files and directories that tmpfiles creates", but the operation is asymmetrical. Tmpfiles does not manage /home, we just use it to "convert" the dircectory to a subvolume is possible.

That's not really true. We do not "convert" anything (we can't do that, dirs cannot be convetred into subvols in btrfs). The file exists for environments where we boot up with an empty / (i.e. only /usr mounted in), and /home should hence be created on first boot.

@keszybz
Copy link
Member

keszybz commented Jun 17, 2024

I know that. That's why I put "convert" in quotes. From the users POV, without our configuration in place, they get /home which is a plain directory. With our configuration, and installation happening in the right order, they get a subvolume. Otherwise, the structure is the same.

@bluca
Copy link
Member

bluca commented Jun 17, 2024

At least on the Debian side of things, /home is shipped in base-files, so you really have to modify an image manually to be able to trigger the subvolume-on-first-boot logic, which means it really has to be intentional. base-files is the first thing (well, one of the couple of) that is installed, so you won't get an image without /home unless you are really trying to.

@poettering
Copy link
Member

mkosi images with only a usr partition are a thing. The package manager is only used for prepping the original image, but it is then booted up with only /usr/ populated, anything outside of it is not even part of the golden image,

@bluca
Copy link
Member

bluca commented Jun 17, 2024

Yeah, building such an image with mkosi is one of the things I meant when I said "really trying to" - home.conf is doing what it's supposed to in that case I think

@Betonhaus
Copy link

I may have some misunderstandings as English is only my first language, but I believe the definition of "temp files" are files that:

  • can be perfectly regenerated using existing information, with no significant losses of data
  • have been used to generate finalized data, and can be deleted once that finalized data has been saved
  • can be deleted with the sole consequences being that a process using the data will need to regenerate the tempfiles as needed, or requiring a restart of the process to resume regular usage.

Having a function that declares irreplaceable files - such as the contents of a home directory - to be temporary files that can be easily purged, is at best poor user interfacing design and at worst a severe design flaw.

poettering added a commit to poettering/systemd that referenced this issue Jun 18, 2024
…n --purge

Also, extend the man page explanation substantially, matching more
closely what --create says.

Fixes: systemd#33349
@poettering
Copy link
Member

. The case of factory reset is a very very niche thing, and the case where you really want to destroy all user data is again something that is so niche, that those features don't need to available with a short call and must be guarded properly.

I don't thinkt he usecase for --purge is really primarily factory reset. Instead it's about package removal: a systematic way how files created by some RPM/DEB xyz via tmpfiles.d/ can be removed when xyz is removed again. That usecase has nothing to do with factory reset.

Also note that we have tab completion, so a user might mistype and press enter and call the command in other ways which are just too easy.

I'd be entirely fine with dropping --purge from tab completion btw. similar the other destructive factory reset options.

@bluca
Copy link
Member

bluca commented Jun 18, 2024

Also note that we have tab completion, so a user might mistype and press enter and call the command in other ways which are just too easy.

I'd be entirely fine with dropping --purge from tab completion btw. similar the other destructive factory reset options.

There is not, and there never was, tab completion for --purge

@SleepingPanda

This comment was marked as off-topic.

@celesteking
Copy link

This seems to be turning into MS Recall drama. RIP devs inboxes.

As re the issue itself, /usr/lib/tmpfiles.d/home.conf is supplied by systemd package over here. It's really not my problem systemd (distro?) decided to put /home dir into something that I would assume to be purely and exclusively temporary.

Either rename tmpfiles chunk into sysfiles, or rename or drop the --purge option. Or have user specify --i-understand if there's a terminal present.

@systemd systemd locked as too heated and limited conversation to collaborators Jun 18, 2024
@bluca bluca closed this as completed in 41064a3 Jun 18, 2024
bluca pushed a commit to bluca/systemd that referenced this issue Jun 18, 2024
…n --purge

Also, extend the man page explanation substantially, matching more
closely what --create says.

Fixes: systemd#33349
(cherry picked from commit 41064a3)
@bluca bluca linked a pull request Jun 18, 2024 that will close this issue
keszybz pushed a commit that referenced this issue Jun 18, 2024
…n --purge

Also, extend the man page explanation substantially, matching more
closely what --create says.

Fixes: #33349
(cherry picked from commit 41064a3)
@bluca
Copy link
Member

bluca commented Jun 18, 2024

v256.1 has been released with the belt and braces and updated docs

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.