New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement atomic save #6269
Implement atomic save #6269
Conversation
Ping @fperez, this should avoid issues with corrupted/lost notebooks when the disk is full, though I haven't worked out how to test it just yet. Closes ipythongh-6254
if 'b' in mode: | ||
encoding = None | ||
|
||
with open(tmp_file, mode, encoding=encoding, **kwargs) as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to check that tmp_file
exist ?
Also do you want add docstring especially the warning that this will break hardlinks as the file inode changes ?
failure do not seem unrelated, maybe open shoudl be imported from IO ? |
I've reworked this to use mkstemp, so it should now be atomic even if two processes try to write one file at the same time (which the previous implementation wasn't). I've also added a test and a docstring, and fixed the test failure. |
+1 |
Looks solid to me, thanks a lot! @dgleich, in case you want to take a peek. Hopefully this will help prevent new occurrences of the disaster that hit you a few days ago. |
|
||
On Windows, there is a small chink in the atomicity: the target file is | ||
deleted before renaming the temporary file over it. This appears to be | ||
unavoidable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this problem when using Microsoft's own .Net API to overwrite files with other files too. But, there is a way to do it using the Windows API: http://msdn.microsoft.com/en-us/library/windows/desktop/aa363851(v=vs.85).aspx
I doubt it's worth the trouble to wrap if it's not already done so in something like http://sourceforge.net/projects/pywin32/ .
Thoughts?
Edit: Is the original problem even an issue on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's worth the trouble - this should already fix the issue we saw with running out of disk space, even on Windows. The only scenario I can see where this would fail is if the write succeeds, it deletes the target file, and then another process creates a file with the same name before it can do the rename operation. In that bizarre case, the temporary file should be left around, so the data is still there.
If we did want to do that, I think MoveFileEx is the function we'd need, rather than CopyFile. Anything that copies a file suffers from the same problem - it may run out of space while writing a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MoveFIleEx is wrapped in pywin32. If pywin32 is a dependency for ipython, it looks like
from win32file import MoveFileEx, MOVEFILE_REPLACE_EXISTING, MOVEFILE_COPY_ALLOWED
MoveFileEx(tmp_file,path, MOVEFILE_REPLACE_EXISTING | MOVEFILE_COPY_ALLOWED)
is what you'd want. (The Copy allowed will allow movefile to make a copy of the file if you are moving between volumes where atomic moves are not allowed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but pywin32 is not a dependency. I don't think that's worth
worrying about.
I think @dgleich is right that fsync() is necessary, and I've added that to atomic_writing. This will block the server while fsync() finishes. From what I've read, this is typically milliseconds, but if the disk is very busy, it could stretch to a few seconds. Some battery-conserving power management settings also try to keep the hard drive spun down most of the time. Explicitly fsync-ing presumably disrupts that, forcing the disk to spin up. With a single notebook open, the autosave interval is probably long enough not to be a problem, but if you have several open, it may start to be an issue. Should we add an option to disable fsync, or atomic save in general? I started typing that we probably don't need one, because all editors must face this, and I've never seen an option for it. But now I go looking, Geany, Sublime 3 and TextMate all do have options to disable atomic save - although they tend to be in 'advanced' sections of the config, understandably. |
Discussed with @ivanov over lunch. We don't think an option to disable atomic save is necessary - people concerned about e.g. wear on flash storage can disable autosave in the JS. Also, it's apparently impossible to guarantee that data actually hits permanent storage, because of all the layers of caching involved, some of which are beyond application control. This is the best we can do from application code, and I think it should deal with the truncation on a full disk that we discussed. @ivanov plans to set up a docker container to check that. |
my docker adventure went adrift on Friday, pushing on this now... |
@takluyver what type of overhead does this extra step have? |
@ellisonbg if someone wants to do benchmarks they can, but the added operations here are going to be totally negligible for normal use cases. It doesn't write data twice, it just adds fsync and rename, which are negligible under typical load. Since this potentially mitigates data loss, do we want to backport to 2.x, or not? Other than that question, 👍 to merge. |
My vote is to backport. While rare, data loss is such a severe issue that I'd backport. |
Yeah, merging this PR with master still leaves a Bad Request message, so #6303 doesn't help in this case. |
ok, thanks for checking. I think better messages can be a separate PR, if we want to merge/backport this one now. |
Ok, I created #6364 to keep track of that. Otherwise, I'm 👍 on this one. |
this should avoid issues with corrupted/lost notebooks when the disk is full
Backport PR ipython#6269 Backport PR ipython#6453 Backport PR ipython#6583
Implement atomic save
Ping @fperez, this should avoid issues with corrupted/lost notebooks when the disk is full, though I haven't worked out how to test it just yet.
Closes gh-6254