Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable 'fsync' by default, but force fsync() in more situations #8304

Merged
merged 6 commits into from Apr 24, 2018

Conversation

justinmk
Copy link
Member

@justinmk justinmk commented Apr 20, 2018

fsync() is very slow on my system. It randomly causes :write and :quit to take 3+ seconds.

This PR does several things:

  • Make shada and swapfile respect the 'fsync' option.
  • Force fsync() (regardless of 'fsync' option) in these cases:
    • Idle (CursorHold).
    • Exit caused by deadly signal.
    • SIGPWR signal.
    • Explicit :preserve command.
  • Disable the 'fsync' option by default.

Also mitigates #6725.

ref ludovicchabant/vim-gutentags#167

shada_write_file() is called on exit (:quit and friends), this can be
very slow.

Note: AFAICT Vim (do_viminfo()) does not appear to fsync() viminfo.
Vim has the 'swapsync' option which we removed in 62d137c.
Instead let 'fsync' control swapfile-fsync.

These cases ALWAYS force fsync (ignoring 'fsync' option):
- Idle (CursorHold).
- Exit caused by deadly signal.
- SIGPWR signal.
- Explicit :preserve command.
ref neovim#6725

fsync() is very slow on some systems.  And since the parent commit, Nvim
is smarter about flushing files at certain times (e.g. CursorHold),
regardless of whether 'fsync' is enabled.  So it's less risky to disable
'fsync'.

Profiling showed slow (2-4s) :write and :quit caused by fsync():

:quit
    shada_write_file(NULL, false);

:write + fsync
    0  0x00007f72da567b2d in fsync () at ../sysdeps/unix/syscall-template.S:84
    1  0x0000000000638970 in uv__fs_fsync (req=<optimized out>) at /home/vagrant/neovim/.deps/build/src/libuv/src/unix/fs.c:150
    2  uv__fs_work (w=<optimized out>) at /home/vagrant/neovim/.deps/build/src/libuv/src/unix/fs.c:953
    3  0x0000000000639a70 in uv_fs_fsync (loop=<optimized out>, req=<optimized out>, file=41, cb=0x7f72da567b2d <fsync+45>)
       at /home/vagrant/neovim/.deps/build/src/libuv/src/unix/fs.c:1094
    4  0x0000000000573694 in os_fsync (fd=41) at ../src/nvim/os/fs.c:631
    5  0x00000000004ec9dc in buf_write (buf=<optimized out>, fname=<optimized out>, sfname=<optimized out>, start=1, end=1997, eap=0x7fffc864c570,
       append=<optimized out>, forceit=<optimized out>, reset_changed=<optimized out>, filtering=<optimized out>) at ../src/nvim/fileio.c:3387
    6  0x00000000004b44ff in do_write (eap=0x7fffc864c570) at ../src/nvim/ex_cmds.c:1745
    ...

:write + nofsync
    0  0x00007f72da567b2d in fsync () at ../sysdeps/unix/syscall-template.S:84
    1  0x0000000000638970 in uv__fs_fsync (req=<optimized out>) at /home/vagrant/neovim/.deps/build/src/libuv/src/unix/fs.c:150
    2  uv__fs_work (w=<optimized out>) at /home/vagrant/neovim/.deps/build/src/libuv/src/unix/fs.c:953
    3  0x0000000000639a70 in uv_fs_fsync (loop=<optimized out>, req=<optimized out>, file=36, cb=0x7f72da567b2d <fsync+45>)
       at /home/vagrant/neovim/.deps/build/src/libuv/src/unix/fs.c:1094
    4  0x0000000000573694 in os_fsync (fd=36) at ../src/nvim/os/fs.c:631
    5  0x0000000000528f5a in mf_sync (mfp=0x7f72d8968d00, flags=5) at ../src/nvim/memfile.c:466
    6  0x000000000052d569 in ml_preserve (buf=0x7f72d890f000, message=0) at ../src/nvim/memline.c:1659
    7  0x00000000004ebadf in buf_write (buf=<optimized out>, fname=<optimized out>, sfname=<optimized out>, start=1, end=1997, eap=0x7fffc864c570,
       append=<optimized out>, forceit=<optimized out>, reset_changed=<optimized out>, filtering=<optimized out>) at ../src/nvim/fileio.c:3071
    8  0x00000000004b44ff in do_write (eap=0x7fffc864c570) at ../src/nvim/ex_cmds.c:1745
    ...
@justinmk justinmk force-pushed the fsync branch 2 times, most recently from af2a657 to 814e1e0 Compare April 23, 2018 08:43
Use it to verify fsync() behavior.
@justinmk justinmk merged commit ad60927 into neovim:master Apr 24, 2018
@justinmk justinmk deleted the fsync branch April 24, 2018 00:51
operations may sometimes take a few seconds.

Files are ALWAYS flushed ('fsync' is ignored) when:
- |CursorHold| event is triggered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CursorHold is based on 'updatetime' but it looks like the decision is being based on 'updatecount'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's 'updatetime'. before_blocking() calls updatescript(0) which is special-cased.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reflection, I think it should also happen for 'updatecount'.

@jamessan
Copy link
Member

Just to clarify, this change means that when :w is issued, the file will be flushed after 'updatecount' characters are typed? How does that affects things like https://github.com/tpope/tpope/blob/e8f5d80cf46df8b962cd947a6575bd0621fd6da6/.vimrc#L329-L330 ?

The save would cause 'swapfile' to be unset, but the file hasn't been fsync()d yet.

@justinmk
Copy link
Member Author

when :w is issued, the file will be flushed after 'updatecount' characters are typed?

The file itself is not fsync'd, only its "memfile" is, and only if the buffer is &modified.

It would be nice to fsync the file asynchronously or after idle, but (I would guess) that involves managing file descriptor lifetimes. (And that would be my preference, so then 'fsync' can be enabled by default.)

My assumption is that swapfiles are enough to protect against data loss (within 'updatetime'). Also this suggests that fsync() by itself is not always effective (maybe libuv takes extra measures?).

I left 9139bf8 as a separate commit if we want to revert it. But again, the goal should be to fsync asynchronously. #6725 also needs to be fixed.

references:

@jamessan
Copy link
Member

The file itself is not fsync'd, only its "memfile" is, and only if the buffer is &modified.

neovim/src/nvim/fileio.c

Lines 3378 to 3387 in ad60927

// On many journalling file systems there is a bug that causes both the
// original and the backup file to be lost when halting the system right
// after writing the file. That's because only the meta-data is
// journalled. Syncing the file slows down the system, but assures it has
// been written to disk and we don't lose it.
// For a device do try the fsync() but don't complain if it does not work
// (could be a pipe).
// If the 'fsync' option is FALSE, don't fsync(). Useful for laptops.
int error;
if (p_fs && (error = os_fsync(fd)) != 0 && !device) {

Now the file being saved is not fsync'd on write and the memfile is only sync'd after a delay.

It would be nice to fsync the file asynchronously or after idle,

Redis does something like that and there was some noise recently about issues PostgreSQL saw related to that.

but (I would guess) that involves managing file descriptor lifetimes.

Or, as tytso suggested in your reference, do the write and fsync async (together).

@justinmk
Copy link
Member Author

justinmk commented Apr 24, 2018

The file itself is not fsync'd, only its "memfile" is, and only if the buffer is &modified.

Now the file being saved is not fsync'd on write and the memfile is only sync'd after a delay.

Yes, if 'fsync' is not set. That's what I said, right? :)

Or, as tytso suggested in your reference, do the write and fsync async (together).

Seems higher risk.

  • write-now-without-fsync only risks data loss if the OS crashes.
  • delayed-write-and-fsync risks data loss if the application crashes.

Different idea: what if we called sync() (to "sync everything") on 'updatetime' (or some other "idle" timer)?

justinmk added a commit to justinmk/neovim that referenced this pull request May 6, 2018
This reverts commit ad60927, reversing
changes made to ffb8904.
@justinmk justinmk mentioned this pull request May 6, 2018
justinmk added a commit that referenced this pull request Jun 11, 2018
FEATURES:
3cc7ebf #7234 built-in VimL expression parser
6a7c904 #4419 implement <Cmd> key to invoke command in any mode
b836328 #7679 'startup: treat stdin as text instead of commands'
58b210e :digraphs : highlight with hl-SpecialKey #2690
7a13611 #8276 'startup: Let `-s -` read from stdin'
1e71978 events: VimSuspend, VimResume #8280
1e7d5e8 #6272 'stdpath()'
f96d99a #8247 server: introduce --listen
e8c39f7 #8226 insert-mode: interpret unmapped META as ESC
98e7112 msg: do not scroll entire screen (#8088)
f72630b #8055 let negative 'writedelay' show all redraws
5d2dd2e win: has("wsl") on Windows Subsystem for Linux #7330
a4f6cec cmdline: CmdlineEnter and CmdlineLeave autocommands (#7422)
207b7ca #6844 channels: support buffered output and bytes sockets/stdio

API:
f85cbea #7917 API: buffer updates
418abfc #6743 API: list information about all channels/jobs.
36b2e3f #8375 API: nvim_get_commands
273d2cd #8329 API: Make nvim_set_option() update `:verbose set …`
8d40b36 #8371 API: more reliable/descriptive VimL errors
ebb1acb #8353 API: nvim_call_dict_function
9f994bb #8004 API: nvim_list_uis
3405704 #7520 API/UI: forward option updates to UIs
911b1e4 #7821 API: improve nvim_command_output

WINDOWS OS:
9cefd83 #8084, #8516 build/win: support MSVC
ee4e1fd win: Fix reading content from stdin (#8267)

TUI:
ffb8904 #8309 TUI: add support for mouse release events in urxvt
8d5a46e #8081 TUI: implement "standout" attribute
6071637 TUI: support TERM=konsole-256color
67848c0 #7653 TUI: report TUI info with -V3 ('verbose' >= 3)
3d0ee17 TUI/rxvt: enable focus-reporting
d109f56 #7640 TUI: 'term' option: reflect effective terminal behavior

FIXES:
ed6a113 #8273 'job-control: avoid kill-timer race'
4e02f1a #8107 'jobs: separate process-group'
451c48a terminal: flush vterm output buffer on pty output #8486
5d6732f :checkhealth fixes #8335
53f11dc #8218 'Fix errors reported by PVS'
d05712f inccommand: pause :terminal redraws (#8307)
51af911 inccommand: do not execute trailing commands #8256
84359a4 terminal: resize to the max dimensions (#8249)
d49c1dd #8228 Make vim_fgets() return the same values as in Vim
60e96a4 screen: winhl=Normal:Background should not override syntax (#8093)
0c59ac1 #5908 'shada: Also save numbered marks'
ba87a2c cscope: ignore EINTR while reading the prompt (#8079)
b1412dc #7971 ':terminal Enter/Leave should not increment jumplist'
3a5721e TUI: libtermkey: force CSI driver for mouse input #7948
6ff13d7 #7720 TUI: faster startup
1c6e956 #7862 TUI: fix resize-related segfaults
a58c909 #7676 TUI: always hide cursor when flushing, never flush buffers during unibilium output
303e1df #7624 TUI: disable BCE almost always
249bdb0 #7761 mark: Make sure that jumplist item will not have zero lnum
6f41ce0 #7704 macOS: Set $LANG based on the system locale
a043899 #7633 'Retry fgets on EINTR'

CHANGES:
ad60927 #8304 default to 'nofsync'
f3f1970 #8035 defaults: 'fillchars'
a6052c7 #7984 defaults: sidescroll=1
b69fa86 #7888 defaults: enable cscopeverbose
7c4bb23 defaults: do :filetype stuff unless explicitly "off"
2aa308c #5658 'Apply :lmap in macros'
8ce6393 terminal: Leave 'relativenumber' alone (#8360)
e46534b #4486 refactor: Remove maxmem, maxmemtot options
131aad9 win: defaults: 'shellcmdflag', 'shellxquote' #7343
c57d315 #8031 jobwait(): return -2 on interrupt also with timeout
6452831 clipboard: macOS: fallback to tmux if pbcopy is broken #7940
300d365 #7919 Make 'langnoremap' apply directly after a map
ada1956 #7880 'lua/executor: Remove lightuserdata'

INTERNAL:
de0a954 #7806 internal statistics for list impl
dee78a4 #7708 rewrite internal list impl
@lewis6991
Copy link
Member

lewis6991 commented Apr 17, 2023

I'm quite confused by this change.

@justinmk is fsync still a problem for you?

For context, I think this change is causing me issues with #23152

@justinmk
Copy link
Member Author

justinmk commented Jul 16, 2023

@lewis6991 it depends on the filesystem. This PR isn't perfect, but in general we shouldn't need to eagerly fsync(), instead we should lazily fsync() at 'updatetime' (swapfiles at least, but ideally a queue of all writes).

@lewis6991
Copy link
Member

The default value of updatetime is 4 seconds though? That seems quite a lot.

justinmk added a commit to justinmk/neovim that referenced this pull request Dec 5, 2023
CI sometimes fails:

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() codepaths neovim#8304
    test/functional/core/fileio_spec.lua:87: Expected objects to be the same.
    Passed in:
    (number) 3
    Expected:
    (number) 2

    stack traceback:
            test/functional/core/fileio_spec.lua:87: in function <test/functional/core/fileio_spec.lua:52>
justinmk added a commit to justinmk/neovim that referenced this pull request Dec 5, 2023
Problem:
CI sometimes fails:

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() codepaths neovim#8304
    test/functional/core/fileio_spec.lua:87: Expected objects to be the same.
    Passed in:
    (number) 3
    Expected:
    (number) 2
    stack traceback:
            test/functional/core/fileio_spec.lua:87: in function <test/functional/core/fileio_spec.lua:52>

Solution:
Something is triggering an extra fsync, possibly timing-related after
the jobstart() call. To avoid the (speculated) race, rearrange the test
so that jobstart() is done last.
justinmk added a commit to justinmk/neovim that referenced this pull request Dec 5, 2023
Problem:
CI sometimes fails:

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() codepaths neovim#8304
    test/functional/core/fileio_spec.lua:87: Expected objects to be the same.
    Passed in:
    (number) 3
    Expected:
    (number) 2
    stack traceback:
            test/functional/core/fileio_spec.lua:87: in function <test/functional/core/fileio_spec.lua:52>

Solution:
Something is triggering an extra fsync, possibly timing-related after
the jobstart() call. To avoid the (speculated) race, rearrange the test
so that jobstart() is done last.
justinmk added a commit to justinmk/neovim that referenced this pull request Dec 5, 2023
Problem:
CI sometimes fails:

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() codepaths neovim#8304
    test/functional/core/fileio_spec.lua:87: Expected objects to be the same.
    Passed in:
    (number) 3
    Expected:
    (number) 2
    stack traceback:
            test/functional/core/fileio_spec.lua:87: in function <test/functional/core/fileio_spec.lua:52>

Solution:
Something is triggering an extra fsync, possibly timing-related after
the jobstart() call. Set 'updatecount' to a high value.
justinmk added a commit to justinmk/neovim that referenced this pull request Dec 5, 2023
Problem:
CI sometimes fails:

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() codepaths neovim#8304
    test/functional/core/fileio_spec.lua:87: Expected objects to be the same.
    Passed in:
    (number) 3
    Expected:
    (number) 2
    stack traceback:
            test/functional/core/fileio_spec.lua:87: in function <test/functional/core/fileio_spec.lua:52>

Solution:
Something is triggering an extra fsync, possibly timing-related after
the jobstart() call. Set 'updatecount' to a high value.
justinmk added a commit to justinmk/neovim that referenced this pull request Dec 5, 2023
Problem:
CI sometimes fails. Something is triggering an extra fsync().

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() codepaths neovim#8304
    test/functional/core/fileio_spec.lua:87: Expected objects to be the same.
    Passed in:
    (number) 3
    Expected:
    (number) 2
    stack traceback:
            test/functional/core/fileio_spec.lua:87: in function <test/functional/core/fileio_spec.lua:52>

Solution:
Relax the assertion to `fsync >= 2` instead of exactly 2.

(Note this is not a behavior change: the next assertion has always
checked `fsync == 4`, it's just that the intermediate 3rd fsync was
never explicitly asserted.)
justinmk added a commit that referenced this pull request Dec 5, 2023
Problem:
CI sometimes fails. Something is triggering an extra fsync().

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() codepaths #8304
    test/functional/core/fileio_spec.lua:87: Expected objects to be the same.
    Passed in:
    (number) 3
    Expected:
    (number) 2
    stack traceback:
            test/functional/core/fileio_spec.lua:87: in function <test/functional/core/fileio_spec.lua:52>

Solution:
Relax the assertion to `fsync >= 2` instead of exactly 2.

(Note this is not a behavior change: the next assertion has always
checked `fsync == 4`, it's just that the intermediate 3rd fsync was
never explicitly asserted.)
justinmk added a commit to justinmk/neovim that referenced this pull request Dec 6, 2023
Followup to 27501d3.

Problem:
CI sometimes fails. Something is triggering an extra fsync().

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() with 'nofsync' neovim#8304
    test/functional/core/fileio_spec.lua:100: Expected objects to be the same.
    Passed in:
    (number) 5
    Expected:
    (number) 4

Solution:
Relax the assertion.
justinmk added a commit to justinmk/neovim that referenced this pull request Dec 6, 2023
Followup to 27501d3.

Problem:
CI sometimes fails. Something is triggering an extra fsync().

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() with 'nofsync' neovim#8304
    test/functional/core/fileio_spec.lua:100: Expected objects to be the same.
    Passed in:
    (number) 5
    Expected:
    (number) 4

Solution:
Relax the assertion.
justinmk added a commit that referenced this pull request Dec 6, 2023
Followup to 27501d3.

Problem:
CI sometimes fails. Something is triggering an extra fsync().

    FAILED   test/functional/core/fileio_spec.lua @ 52: fileio fsync() with 'nofsync' #8304
    test/functional/core/fileio_spec.lua:100: Expected objects to be the same.
    Passed in:
    (number) 5
    Expected:
    (number) 4

Solution:
Relax the assertion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants