Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-831: mnesia:transaction not fsync'ing log, not entirely durable, which is unexpected based on documentation #3975

Closed
OTP-Maintainer opened this issue Jan 14, 2019 · 2 comments
Assignees
Labels
not a bug Issue is determined as not a bug by OTP priority:medium question team:PS Assigned to OTP team PS

Comments

@OTP-Maintainer
Copy link

Original reporter: hammeractually
Affected version: Not Specified
Component: mnesia
Migrated from: https://bugs.erlang.org/browse/ERL-831


Given the following scripts, I'm reliably able to observe mnesia transactions not being durable:

{code:title=write|borderStyle=solid}
#!/usr/bin/env escript
%% -*- erlang -*-
%%! -sname example
main([]) ->
  ok = mnesia:create_schema([node()]),
  ok = mnesia:start(),
  {atomic, ok} = mnesia:create_table(example, [{disc_copies, [node()]}, {type, set}, {attributes, [name, value]}]),
  {atomic, ok} = mnesia:transaction(fun() -> mnesia:write({example, foo, 1}) end),
  {atomic, [{example, foo, 1}]} = mnesia:transaction(fun() -> mnesia:read({example, foo}) end),
  io:format("Wrote foo = 1.~n").
{code}

{code:title=read|borderStyle=solid}
#!/usr/bin/env escript
%% -*- erlang -*-
%%! -sname example
main([]) ->
  ok = mnesia:start(),
  ok = mnesia:wait_for_tables([example], 200),
  {atomic, []} = mnesia:transaction(fun() -> mnesia:read({example, foo}) end),
  io:format("No records found for foo.~n").
{code}

If I execute these scripts sequentially, with first writing an entry to the table and the second attempting to read it back, the entry is lost. The write script terminates without fsync'ing LATEST.LOG, from what I can tell (anecdotally, and from scanning the mnesia source code).

http://erlang.org/doc/apps/mnesia/Mnesia_chap4.html#durability implies that I should be able to trust that once mnesia:transaction/2 returns my entry has safely been written to disk:
{quote}
Once a transaction is committed, all changes made to the database are durable, that is, they are written safely to disc and do not become corrupted and do not disappear.
{quote}
Ostensibly "committed" is marked by the conclusion of the mnesia:transaction call. The documentation for mnesia:transaction (http://erlang.org/doc/man/mnesia.html#transaction-2) does not specifically state that it commits, however the documentation for mnesia:sync_transaction does  assert that it "waits until data has been committed and logged to disk". Unfortunately it also doesn't appear to fsync the log file and produces the same result described above when used instead of mnesia:transaction for the write operation in the above script.

Calling mnesia:sync_log after the transaction does force an fsync and demonstrates the expected durability on the restart of mnesia. The documentation (http://erlang.org/doc/man/mnesia.html#sync_log-0) does imply that a window exists where data may not be durable. That detail seems buried in the documentation where someone may only notice it after discovering durability issues the hard way. It also seems to offer a narrower set of cases where data may be list. In addition to power loss, the above scripts demonstrate that it can also happen when a process terminates normally. Though I have not tested it, I assume there are also crash scenarios where the same behavior would occur.

I imagine that there might be performance ramifications to fsync'ing the log on each transaction, so perhaps that was a decision made intentionally at the risk of possible durability issues during narrow windows. That doesn't match my expectation of ACID durability, so if it's a design choice for mnesia, then I would propose that documentation around that choice be made clearer so developers like myself don't assume a higher level of durability is provided without explicitly triggering an fsync of the log.
@OTP-Maintainer
Copy link
Author

dgud said:

That is intentional behavior, mnesia's durability assumes that you have several mnesia nodes running, and that not
all nodes crashes at the same time.

Erlang systems was most often designad as distributed systems.

Please send a PR on the documentation if you think that is necessary.

@OTP-Maintainer
Copy link
Author

ferd said:

Unless I'm remembering wrong, the default transaction mode for mnesia (transaction) is not even awaiting commit confirmation from remote nodes before proceeding, only the sync_transaction mode does.

This would mean that by default, transactions provides no durability guaranty whatsoever, since they rely on other nodes to ensure write persistence, but do not wait for other nodes to even confirm the writes before moving on.

@OTP-Maintainer OTP-Maintainer added not a bug Issue is determined as not a bug by OTP question team:PS Assigned to OTP team PS priority:medium labels Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug Issue is determined as not a bug by OTP priority:medium question team:PS Assigned to OTP team PS
Projects
None yet
Development

No branches or pull requests

2 participants