Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible crash consistency issue with link + unlink in metadata_csum mode #120

Closed
hayley-leblanc opened this issue Nov 12, 2021 · 1 comment

Comments

@hayley-leblanc
Copy link
Contributor

hayley-leblanc commented Nov 12, 2021

Hi,

I believe there is a crash consistency issue with link and unlink in metadata_csum mode. Suppose we run the following operations on a fresh NOVA file system mounted at /mnt/pmem:

creat /mnt/pmem/foo
ln /mnt/pmem/foo /mnt/pmem/bar
rm /mnt/pmem/bar

Based on the crash consistency tester I'm working on, it appears that if we crash during the remove operation, the unlink may not complete atomically. I have not dug into the root cause a huge amount, but it appears that the occurs at the beginning of nova_unlink(), if only a write to the primary inode in nova_append_link_change_entry() becomes persistent and none of the other preceding writes do. In the resulting crash state, /mnt/pmem/bar is present, and attempting to access it gives the following log message:

nova: nova_verify_entry_csum: entry replica 00000000d95b18c8 error, trying to repair using the primary
nova: nova_repair_entry: entry error repaired
nova: Failure recovery total recovered 2

However, the repaired version of /mnt/pmem/bar has a link count of 1, but /mnt/pmem/foo also exists with the same inode number.

I believe this issue can be fixed by just adding a store fence to the beginning of nova_append_link_change_entry() (or, if we don't want a store fence there, adding it directly before that call in nova_unlink()). Happy to make a PR if you think this is a bug. Thanks!

Edit: this issue feels similar to the one I reported in #119, but adding the store fence to nova_append_link_change_entry() does not fix that problem. I believe that issue stems from the linking logic, whereas this is just a PM programming mistake.

@hayley-leblanc
Copy link
Contributor Author

I believe this may be a false positive; I'll close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant