Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement atomic save #6269

Merged
merged 3 commits into from Sep 2, 2014
Merged

Implement atomic save #6269

merged 3 commits into from Sep 2, 2014

Conversation

takluyver
Copy link
Member

Ping @fperez, this should avoid issues with corrupted/lost notebooks when the disk is full, though I haven't worked out how to test it just yet.

Closes gh-6254

Ping @fperez, this should avoid issues with corrupted/lost notebooks
when the disk is full, though I haven't worked out how to test it just
yet.

Closes ipythongh-6254
if 'b' in mode:
encoding = None

with open(tmp_file, mode, encoding=encoding, **kwargs) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to check that tmp_file exist ?

Also do you want add docstring especially the warning that this will break hardlinks as the file inode changes ?

@Carreau
Copy link
Member

Carreau commented Aug 5, 2014

failure do not seem unrelated, maybe open shoudl be imported from IO ?

@takluyver
Copy link
Member Author

I've reworked this to use mkstemp, so it should now be atomic even if two processes try to write one file at the same time (which the previous implementation wasn't). I've also added a test and a docstring, and fixed the test failure.

@Carreau
Copy link
Member

Carreau commented Aug 5, 2014

+1

@fperez
Copy link
Member

fperez commented Aug 5, 2014

Looks solid to me, thanks a lot!

@dgleich, in case you want to take a peek. Hopefully this will help prevent new occurrences of the disaster that hit you a few days ago.


On Windows, there is a small chink in the atomicity: the target file is
deleted before renaming the temporary file over it. This appears to be
unavoidable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this problem when using Microsoft's own .Net API to overwrite files with other files too. But, there is a way to do it using the Windows API: http://msdn.microsoft.com/en-us/library/windows/desktop/aa363851(v=vs.85).aspx

I doubt it's worth the trouble to wrap if it's not already done so in something like http://sourceforge.net/projects/pywin32/ .

Thoughts?

Edit: Is the original problem even an issue on Windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth the trouble - this should already fix the issue we saw with running out of disk space, even on Windows. The only scenario I can see where this would fail is if the write succeeds, it deletes the target file, and then another process creates a file with the same name before it can do the rename operation. In that bizarre case, the temporary file should be left around, so the data is still there.

If we did want to do that, I think MoveFileEx is the function we'd need, rather than CopyFile. Anything that copies a file suffers from the same problem - it may run out of space while writing a file.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MoveFIleEx is wrapped in pywin32. If pywin32 is a dependency for ipython, it looks like

from win32file import MoveFileEx, MOVEFILE_REPLACE_EXISTING, MOVEFILE_COPY_ALLOWED
MoveFileEx(tmp_file,path, MOVEFILE_REPLACE_EXISTING | MOVEFILE_COPY_ALLOWED)

is what you'd want. (The Copy allowed will allow movefile to make a copy of the file if you are moving between volumes where atomic moves are not allowed.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but pywin32 is not a dependency. I don't think that's worth
worrying about.

@takluyver
Copy link
Member Author

I think @dgleich is right that fsync() is necessary, and I've added that to atomic_writing.

This will block the server while fsync() finishes. From what I've read, this is typically milliseconds, but if the disk is very busy, it could stretch to a few seconds. Some battery-conserving power management settings also try to keep the hard drive spun down most of the time. Explicitly fsync-ing presumably disrupts that, forcing the disk to spin up. With a single notebook open, the autosave interval is probably long enough not to be a problem, but if you have several open, it may start to be an issue.

Should we add an option to disable fsync, or atomic save in general? I started typing that we probably don't need one, because all editors must face this, and I've never seen an option for it. But now I go looking, Geany, Sublime 3 and TextMate all do have options to disable atomic save - although they tend to be in 'advanced' sections of the config, understandably.

@takluyver
Copy link
Member Author

Discussed with @ivanov over lunch. We don't think an option to disable atomic save is necessary - people concerned about e.g. wear on flash storage can disable autosave in the JS.

Also, it's apparently impossible to guarantee that data actually hits permanent storage, because of all the layers of caching involved, some of which are beyond application control. This is the best we can do from application code, and I think it should deal with the truncation on a full disk that we discussed. @ivanov plans to set up a docker container to check that.

@ivanov
Copy link
Member

ivanov commented Aug 11, 2014

my docker adventure went adrift on Friday, pushing on this now...

@ellisonbg
Copy link
Member

@takluyver what type of overhead does this extra step have?

@minrk
Copy link
Member

minrk commented Aug 26, 2014

@ellisonbg if someone wants to do benchmarks they can, but the added operations here are going to be totally negligible for normal use cases. It doesn't write data twice, it just adds fsync and rename, which are negligible under typical load.

Since this potentially mitigates data loss, do we want to backport to 2.x, or not?

Other than that question, 👍 to merge.

@fperez
Copy link
Member

fperez commented Aug 26, 2014

My vote is to backport. While rare, data loss is such a severe issue that I'd backport.

@ivanov
Copy link
Member

ivanov commented Aug 26, 2014

I can confirm that, unlike master, this doesn't do a partial write and fail in the middle of it (on master, it's easy to create invalid notebook files because of partial writes). However, in both cases, in the notebook UI, all the users sees is a cryptic orange "Bad Request" message.
2014-08-26-143303_182x34_scrot

The way I was finally able to create a disk full scenario under Linux is as follows:

sudo mkdir /mnt/tiny
sudo mount -t tmpfs -o size=20k tmpfs /mnt/tiny

and start a notebook server in /mnt/tiny.

then in a notebook !dd if=/dev/zero of=bigfile

@minrk
Copy link
Member

minrk commented Aug 26, 2014

@ivanov are you using a version that includes #6303? I know that improved on 'Bad Request' in some cases, but I don't think I tested save.

@ivanov
Copy link
Member

ivanov commented Aug 26, 2014

Yeah, merging this PR with master still leaves a Bad Request message, so #6303 doesn't help in this case.

@minrk
Copy link
Member

minrk commented Aug 26, 2014

ok, thanks for checking. I think better messages can be a separate PR, if we want to merge/backport this one now.

@ivanov
Copy link
Member

ivanov commented Aug 26, 2014

Ok, I created #6364 to keep track of that. Otherwise, I'm 👍 on this one.

minrk added a commit that referenced this pull request Sep 2, 2014
@minrk minrk merged commit 15ecb84 into ipython:master Sep 2, 2014
@minrk minrk added this to the 2.3 milestone Sep 2, 2014
minrk added a commit that referenced this pull request Oct 1, 2014
this should avoid issues with corrupted/lost notebooks when the disk is full
takluyver added a commit to takluyver/ipython that referenced this pull request Oct 1, 2014
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data loss on notebook save
8 participants