Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace SAGE_TMP by the system location in the sage library #33213

Closed
orlitzky opened this issue Jan 22, 2022 · 122 comments
Closed

Replace SAGE_TMP by the system location in the sage library #33213

orlitzky opened this issue Jan 22, 2022 · 122 comments

Comments

@orlitzky
Copy link
Contributor

Re: https://groups.google.com/g/sage-devel/c/zhjl_j6j_Qc

These days, using a SAGE_TMP that by default lives under $HOME is overly complex and often inefficient. In this ticket we implement the first phase of its removal, to be replaced by python's tempfile module. Specifically,

  1. We replace all direct uses of SAGE_TMP within the sage library and doctests.
  2. We update tmp_dir() and tmp_filename() to use the tempfile defaults.
  3. We remove SAGE_TMP.

Afterward, the custom functions tmp_dir() and tmp_filename() can be deprecated in favor of tempfile.TemporaryDirectory() and tempfile.NamedTemporaryFile().

Moreover when #8784 is done, we'll be able to remove sage-cleaner entirely.

Depends on #33829
Depends on #33790

CC: @slel @tscrim @kiwifb

Component: misc

Author: Michael Orlitzky

Branch: d94a1e6

Reviewer: Matthias Koeppe, Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/33213

@orlitzky orlitzky added this to the sage-9.6 milestone Jan 22, 2022
@orlitzky
Copy link
Contributor Author

comment:1

I've posted a work-in-progress branch that raises a question. The python docs,

https://docs.python.org/3/library/tempfile.html

state that, in regards to a named temporary file,

Whether the name can be used to open the file a second time, while the named temporary
file is still open, varies across platforms (it can be so used on Unix; it cannot on
Windows NT or later)

So in many cases, where I would have liked to have done

sage: with tempfile.NamedTemporaryFile() as f:
....:     q.save_image(f.name)

I have instead done something like

sage: f = tempfile.NamedTemporaryFile(delete=False); f.close()
sage: q.save_image(f.name)
sage: os.remove(f.name)

or

sage: with tempfile.TemporaryDirectory() as d:
....:     filename = os.path.join(d, "foo.png") 
....:     q.save_image(filename)

to avoid the save_image call reopening the same name. However, the docs are unclear about what "on Windows" means. So the two examples above can be simplified if this is not a problem on cygwin.


Last 10 new commits:

dabb1c7Trac #33213: replace SAGE_TMP in repl doctests.
c292ea8Trac #33213: replace SAGE_TMP in magma interface doctests.
172e9c4Trac #33213: replace SAGE_TMP in dist script doctests.
24499f1Trac #33213: remove mentions of SAGE_TMP from the lazy_string module.
3c8c0e6Trac #33213: replace SAGE_TMP in session doctests.
3db8444Trac #33213: remove SAGE_TMP from tmp_dir() and tmp_filename().
5e7e080Trac #33213: don't clear SAGE_TMP upon exiting.
ea2fefdTrac #33213: remove SAGE_TMP from sage.misc.all.
b880571Trac #33213: replace SAGE_TMP in cleaner/quit interfaces.
ffd9aa6Trac #33213: drop SAGE_TMP from sage.misc.misc.

@orlitzky
Copy link
Contributor Author

Commit: ffd9aa6

@orlitzky

This comment has been minimized.

@orlitzky
Copy link
Contributor Author

Branch: u/mjo/ticket/33213

@orlitzky orlitzky changed the title Avoid SAGE_TMP for truly transient files and directories Replace SAGE_TMP by the system location in the sage library Feb 15, 2022
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 17, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

03941c1Trac #33213: don't reset expect interfaces when starting them.
339ac94Trac #33213: simplify GAP workspace doctest.
f2b85e2Trac #33213: close an open file a bit sooner in gap interface.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 17, 2022

Changed commit from ffd9aa6 to f2b85e2

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 18, 2022

Changed commit from f2b85e2 to 0f90d98

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 18, 2022

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:

976d88bTrac #33213: remove SAGE_TMP from tmp_dir() and tmp_filename().
8e09502Trac #33213: don't clear SAGE_TMP upon exiting.
a22fa71Trac #33213: remove SAGE_TMP from sage.misc.all.
810c855Trac #33213: replace SAGE_TMP in cleaner/quit interfaces.
ce5a76dTrac #33213: drop SAGE_TMP from sage.misc.misc.
955fdbaTrac #33213: remove the sage-cleaner program.
a9df953Trac #33213: remove "remote cleaner" option from expect interface.
9f504b5Trac #33213: remove sage.interfaces.cleaner.
92e84daTrac #33213: close an open file a bit sooner in gap interface.
0f90d98Trac #33213: use @cached_function for the spawned processes file.

@orlitzky
Copy link
Contributor Author

comment:4

Making #8784 a dependency because we can avoid many contortions on this branch if we don't have to keep sage-cleaner aware of where the spawned_processes file lives.

@orlitzky
Copy link
Contributor Author

Dependencies: #8784

@orlitzky

This comment has been minimized.

@williamstein
Copy link
Contributor

comment:5

How did you solve the problems that sage-cleaner solves? It's not like I wrote it for no reason. There are many situations in practice that people hit where temp files and subprocess get left around. Make sure that at least the following works:

  1. Start sage. Do something that creates a temp file and also something that spawns a subprocess.
  2. Segfault the sage that you started, e.g., by doing something stupid with Cython.

What happens? Are all temp files and spawned processes properly cleaned up?

It's an unfortunate reality that we make Cython relatively easy to use, hence it's easy from the command line or notebook to segfault a running Sage process.

Just curious. I never liked or wanted to write sage-cleaner, but I couldn't think of any better approach to make end users happy.

@orlitzky
Copy link
Contributor Author

comment:6

Replying to @williamstein:

How did you solve the problems that sage-cleaner solves? It's not like I wrote it for no reason. There are many situations in practice that people hit where temp files and subprocess get left around. Make sure that at least the following works:

  1. Start sage. Do something that creates a temp file and also something that spawns a subprocess.
  2. Segfault the sage that you started, e.g., by doing something stupid with Cython.

What happens? Are all temp files and spawned processes properly cleaned up?

  1. Now that all temporary files are created in /tmp or equivalent, they'll be cleaned up by the system eventually. In the event of a crash, I would say that this is preferable, because those temporary files can contain important debugging information.

  2. Sage doesn't segfault. More seriously, it depends on the process. In the simplest case, the subprocess will continue doing whatever it's doing until it completes, and will then terminate. Next would be subprocesses that communicate with the parent; eventually they'll be sent a SIGPIPE and commit suicide. If you go out of your way to launch something that is basically a daemon, though, it can indeed outlive the parent.

For (2) to be an issue, you have to launch a standalone process that never terminates, detach it from the parent, compile a C extension, load the extension into your current session, and then cause it to segfault. I don't think cleaning up after that situation should be the responsibility of any application; but especially not of a computer algebra system.

If you intend to do something like that frequently (or host a service for users who will...), there are better solutions these days -- cgroups on linux being the best example. Keeping in mind that sage-cleaner works by parsing a spawned_processes file that must be updated every time a process is launched, solving the problem at the OS level in particular has the advantage that it will catch sub-subprocesses. Sage-cleaner generally cannot (unless the subprocess is also sage), because the subprocess doesn't know that it's supposed to add a line item to spawned_processes.

@orlitzky
Copy link
Contributor Author

comment:7

I should also point out that it's not critical that we remove sage-cleaner at this point. The main hurdle wrt the removal of SAGE_TMP is that sage-cleaner and the sage library (register_spawned_process()) need to agree on a location for the spawned processes file. The current branch uses a temporary file in the library that lives for the duration of the sage session, and that's of no use to sage-cleaner due to the random name. We could still store the spawned processes file with a fixed name in DOT_SAGE, though, probably with the current sage PID tacked onto the path somewhere.

I saw this as a good time to remove it because its one crucial job -- to clean up SAGE_TMP under $HOME -- is now obsolete.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 18, 2022

Branch pushed to git repo; I updated commit sha1. New commits:

de4a00eTrac #33213: ensure temporary gap workspace is created in a doctest.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Feb 18, 2022

Changed commit from 0f90d98 to de4a00e

@orlitzky
Copy link
Contributor Author

comment:9

I believe I've fixed all of the test failures so I'm setting this to needs_review to see what the patchbots think.

@dimpase
Copy link
Member

dimpase commented Feb 18, 2022

@williamstein
Copy link
Contributor

comment:11

Thanks @orlitzky for your explanation. Your responses don't necessarily apply in the context of users that got me to write the sage-cleaner in the first place, and they will probably be pretty annoyed if they ever realize what happened. Sage is different than some more less complicated computer algebra software like Magma, because Sage much more frequently spawns subprocesses. Fortunately, over the years the number of subprocesses Sage spawns has been reduced due to writing and using C library interfaces instead of pexpect, etc.

Regarding /tmp, unless you specifically set something up to periodically clean up files, unless things have changed a lot, don't most Linux distros not automatically delete files from there, possibly until a reboot? You're making a huge change from "temp files get cleaned up by Sage itself quickly" to "maybe someday the system will clean them up". Maybe at this point that is the right change, but it's important to acknowledge that it is a significant change.

I agree that when users are using Linux, have sufficient privileges to use containers, and also actually know how to use them, then these same problems can be solved in a more robust way. However, this is a small percentage of users of Sage. Many users use other operating systems or don't have admin permissions.

CoCalc is the main platform for running Sage that I care about these days, and every individual "project" is its own container, /tmp is extremely ephemeral, etc., so everything you say applies there. That said, users of CoCalc can't use Docker or other container mechanisms themselves within projects, due to security constraints.

In any case, I'm not personally worried about these changes having any adverse impact for me. However, I'm leaving this comment here so people who are impacted can maybe know they may have to revert these patches for their own install of Sage. Probably the way things will go is that this gets merged, and a year later certain environments start finding that Sage is leaving around clutter in a way that Sage didn't for the last decade, but users don't know why. Hopefully they google and find this comment.

@orlitzky
Copy link
Contributor Author

comment:12

Replying to @williamstein:

Regarding /tmp, unless you specifically set something up to periodically clean up files, unless things have changed a lot, don't most Linux distros not automatically delete files from there, possibly until a reboot? You're making a huge change from "temp files get cleaned up by Sage itself quickly" to "maybe someday the system will clean them up". Maybe at this point that is the right change, but it's important to acknowledge that it is a significant change.... Probably the way things will go is that this gets merged, and a year later certain environments start finding that Sage is leaving around clutter in a way that Sage didn't for the last decade, but users don't know why. Hopefully they google and find this comment.

Systemd comes with a timer (like a cron job) to clean up /tmp, but I'm not familiar with its out-of-the-box configuration on mainstream distros. I still use OpenRC myself, where a reboot is what triggers it.

On systems like mine, "leftover junk" is a fair criticism, but keep in mind the bigger picture: the python tempfile.TemporaryDirectory() and tempfile.NamedTemporaryFile() functions automatically remove the file/directory as soon as it's done being used. In this branch, I've changed a bunch of manual SAGE_TMP code to use those two functions, so a lot of what would have been leftover junk is now simply not created in the first place. Any remaining junk is due to sage's own tmp_filename() and tmp_dir(), which expect you to clean up the file yourself, or assume that sage-cleaner will do it eventually.

If this gets merged, we can deprecate tmp_filename() and tmp_dir() as obsolete in favor of the python functions. Over a hopefully-short amount of time, that should eliminate the leftover junk at the source.

@dimpase
Copy link
Member

dimpase commented Feb 18, 2022

comment:13

indeed, I am not at all worried about junk left at /tmp, as, at least as far as documentation of Python is correct, the facility of NamedTemporaryFile cleans up after itself automatically (I guess it uses POSIX's tmpfile(3) behind the scene).

I am a bit puzzled by the need to manually keep them open using delete=False the call to NamedTemporaryFile. Is it because one wants to keep them for the duration of a Sage session?

@orlitzky
Copy link
Contributor Author

comment:14

Replying to @dimpase:

I am a bit puzzled by the need to manually keep them open using delete=False the call to NamedTemporaryFile. Is it because one wants to keep them for the duration of a Sage session?

If you're referring to my new code (in many doctests), then this is simply a precaution for cygwin (see comment:1). I would love to simplify it if someone with a cygwin installation says that the "on Windows" warning doesn't apply there.

@orlitzky
Copy link
Contributor Author

comment:15

Replying to @orlitzky:

If you're referring to my new code (in many doctests), then this is simply a precaution for cygwin (see comment:1). I would love to simplify it if someone with a cygwin installation says that the "on Windows" warning doesn't apply there.

(If it DOES apply on cygwin, we could improve this in the future by updating the functions that now take a filename to also accept an open file handle; that will avoid re-open()ing it.)

@dimpase
Copy link
Member

dimpase commented Feb 18, 2022

comment:16

Cygwin is POSIX emulator, so on Windows doesn't really apply to it.

@orlitzky
Copy link
Contributor Author

comment:17

Replying to @dimpase:

Cygwin is POSIX emulator, so on Windows doesn't really apply to it.

You have two wishes remaining =)

I'm "happy" to go back and change all of those doctests, but I'd like confirmation before I spend half a day doing it.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 13, 2022

Changed commit from 9afa20b to d94a1e6

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 13, 2022

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:

c8c9d40Trac #33213: deprecate SAGE_TMP.
ec1a3cfTrac #33213: move sage.interfaces.cleaner.cleaner() to quit interface.
78db2fdTrac #33213: close an open file a bit sooner in gap interface.
56e2964Trac #33213: use @cached_function for the spawned processes file.
31986c1Trac #33213: ensure temporary gap workspace is created in a doctest.
c7ff254Trac #33213: use old SAGE_TMP path for spawned_processes file.
78f029bdo not use SAGE_TMP
7ae0e41add forgotten while rebasing deprecation in misc/dist.py
5d314d9pkgs/sage-setup, sagemath-{objects,categories,standard}: Add tox environment sagepython
d94a1e6src/doc/en/developer/packaging_sage_library.rst: Use sagepython instead of hardcoded py39

@dimpase
Copy link
Member

dimpase commented Jun 13, 2022

comment:93

rebased over #33829, #33790 (and 9.7.beta2), all good

@dimpase
Copy link
Member

dimpase commented Jun 13, 2022

Changed work issues from Rebase on top of the dependencies to none

@vbraun
Copy link
Member

vbraun commented Jun 21, 2022

Changed branch from u/dimpase/ticket/33213 to d94a1e6

@mkoeppe
Copy link
Member

mkoeppe commented Jun 25, 2022

comment:96

The branch includes a rebased version of #32716

@mkoeppe
Copy link
Member

mkoeppe commented Jun 25, 2022

Changed commit from d94a1e6 to none

@mkoeppe
Copy link
Member

mkoeppe commented Sep 27, 2022

comment:97

Follow-up discussion in https://groups.google.com/g/sage-devel/c/jpwUb8OCVVc

@jhpalmieri
Copy link
Member

comment:98

Follow-up ticket: #34593

kryzar pushed a commit to kryzar/sage that referenced this issue Feb 6, 2023
This is a followup to sagemath#33213. Some users prefer (or need) to control
where temporary files are created, and we should document this:

- as described at
https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir,
users can set the environment variable `TMPDIR`

Possible follow-up:
- Also, according to
https://manpages.ubuntu.com/manpages/bionic/man8/tmpreaper.8.html, at
least on some systems it would be helpful to create a non-writable file
`.tmpreaper` in the directory to prevent the system from doing automatic
cleanup on it.

URL: https://trac.sagemath.org/34593
Reported by: jhpalmieri
Ticket author(s): John Palmieri
Reviewer(s): Matthias Koeppe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants