Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running without DIRECT IO #47

Closed
coffeemug opened this issue Nov 13, 2012 · 15 comments
Closed

Support running without DIRECT IO #47

coffeemug opened this issue Nov 13, 2012 · 15 comments
Assignees
Milestone

Comments

@coffeemug
Copy link
Contributor

Running with direct io makes sense in production, but when developers try the product, they often have encrypted/journaled file systems that don't support direct io. We should implement an alternative code path that opens the files without direct io and warns devs that it's a compatibility mode.

@frank-trampe
Copy link
Contributor

What was the original reason for having the software fail on lack of O_DIRECT? Was it merely to save time implementing the fallback, or are there things that fail when the file is opened without it?

@frank-trampe
Copy link
Contributor

Let me rephrase that so that it doesn't sound like I don't know the importance of O_DIRECT for a database. What was the original reason for not allowing a bypass? Was it effort required to implement the fallback in disk.cc, things that broke elsewhere, or fear of cache crowding and other serious system performance problems?

@coffeemug
Copy link
Contributor Author

Basically, we just didn't think about it :)

@frank-trampe
Copy link
Contributor

So the low-level support in disk.cc seems to be present after all, just not commented. The argument is_really_direct to linux_file_t::linux_file_t (which is not Linux-specific anymore) controls whether one requests O_DIRECT or F_NOCACHE. All that we need to do is to implement a choice between the typedefs direct_file_t and nondirect_file_t in log_serializer.cc. We could pass the information to that point (from command_line stuff) via extra function arguments (either a session object or a simple flag for this option) or via a global variable. What would be preferable here?

@coffeemug
Copy link
Contributor Author

I haven't looked at this stuff in a while, but here are two problems that I can think of off the top of my head:

  • F_NOCACHE may or may not be supported on all platforms
  • O_DIRECT implies that when one writes a block, the function does not return until the disk driver responds that the block has been committed to disk. This isn't the case without O_DIRECT, and may or may not be the case with F_NOCACHE. We need this behavior for sane operation, so we'll have to make sure it's replicated when O_DIRECT is off (one way to do that is to throw in fsyncs)

@frank-trampe
Copy link
Contributor

  • F_NOCACHE is just for the Macintosh (or whatever you call a sharp-edged aluminum computer that does not support O_DIRECT).
  • I thought that we were willing to allow the user to bypass the disk commit safeguards in this case since it was just for testing. I can examine options for verifying a commit to disk without O_DIRECT if you like.

@frank-trampe
Copy link
Contributor

As it turns out, O_DIRECT does not provide guarantees about data commits like O_SYNC. Would we also want to provide an O_SYNC option?

@srh
Copy link
Contributor

srh commented Jan 7, 2013

We want O_DSYNC (which Linux allegedly treats O_SYNC as, anyway).

We warn the user when O_DIRECT doesn't work and pass O_DSYNC in any case.

This is currently in code review 134.

@coffeemug
Copy link
Contributor Author

What do we do in OSX for sync?

@srh
Copy link
Contributor

srh commented Jan 8, 2013

Nothing yet. If we want proper syncing in OS X we can follow up each write() call with fcntl(fd, F_FULLFSYNC).

@coffeemug
Copy link
Contributor Author

@srh -- presumably we'd only need to do it twice -- once before writing the metablock, and once after.

@srh
Copy link
Contributor

srh commented Jan 8, 2013

@coffeemug - I'm going to do F_FULLFSYNC in the I/O layer after every write call. This allegedly replicates O_DSYNC behavior (except that it presumably also syncs file metadata). Also, there's no reason it would perform worse (except for the relatively negligible cost of an extra syscall on a thread in the i/o pool) than some specific fsync calls made in the serializer.

@srh
Copy link
Contributor

srh commented Jan 18, 2013

This is fixed, with F_FULLFSYNC and O_DSYNC options enabled.

@srh srh closed this as completed Jan 18, 2013
@coffeemug
Copy link
Contributor Author

@srh -- could you please specify review number and commit number?

@coffeemug
Copy link
Contributor Author

@srh -- ping -- could you specify review number and commit number? What is the warning message the users get if DIRECT_IO isn't available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants