Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protection against metadata inconsistency and/or data loss #2519

Closed
zebjonsa opened this issue Aug 12, 2022 · 2 comments
Closed

protection against metadata inconsistency and/or data loss #2519

zebjonsa opened this issue Aug 12, 2022 · 2 comments
Labels
kind/question Further information is requested

Comments

@zebjonsa
Copy link

A small code review was performed on the Unlink() call in juicefs. There are constructs in the transaction handling that seem problematic.

Here is a pseudo-trace rundown of a file deletion from the FUSE layer.

func (fs *fileSystem) Unlink(cancel <-chan struct{}, header *fuse.InHeader, name string) (code fuse.Status) {
    func (v *VFS) Unlink(ctx Context, parent Ino, name string) (err syscall.Errno) {
        err = v.Meta.Unlink(ctx, parent, name)
        func (m *baseMeta) Unlink(ctx Context, parent Ino, name string) syscall.Errno {
            return m.en.doUnlink(ctx, m.checkRoot(parent), name)
            func (m *dbMeta) doUnlink(ctx Context, parent Ino, name string) syscall.Errno {
                m.fileDeleted(opened, n.Inode, n.Length) // outside tx
                func (m *baseMeta) fileDeleted(opened bool, inode Ino, length uint64) {
                    m.tryDeleteFileData(inode, length)
                        func (m *baseMeta) tryDeleteFileData(inode Ino, length uint64) {
                            m.en.doDeleteFileData(inode, length)
                                func (m *dbMeta) doDeleteFileData(inode Ino, length uint64) {
                                    err := m.deleteChunk(inode, c.Indx)
                                        func (m *dbMeta) deleteChunk(inode Ino, indx uint32) error {
                               A            err := m.txn(func(s *xorm.Session) error { UPDATE .. update jfs_chunk_ref set ...})
                               B            m.deleteSlice(s.id, s.size)
                                                func (m *baseMeta) deleteSlice(id uint64, size uint32) {
                               C                        if err := m.newMsg(DeleteSlice, id, size); err == nil || strings.Contains(err.Error(), "NoSuchKey") || strings.Contains(err.Error(), "not found") {
                                                            if err = m.en.doDeleteSlice(id, size); err != nil {
                               D                                    return m.txn(func(s *xorm.Session) error { _, err := s.Exec("delete from jfs_chunk_ref where chunkid=?", id) return err })
                               E    _ = m.txn(func(s *xorm.Session) error { _, err := s.Delete(delfile{Inode: inode}) return err })
  • A) deleteChunk() starts new tx, does changes to the database and commits it
  • B) deleteChunk() calls deleteSlice()
  • C) deleteSlice() calls m.newMsg(DeleteSlice) which runs the user's callback (which removes object data, as initialized by registerMetaMsg() in mount())
  • D) deleteSlice() starts new tx, updates the database and commits it
  • E) doDeleteFileData() starts a new tx, changes the database and commits it

Problem P1:

  • A proposal here is that if for example A) commits successfully but D) or E) fails then the resulting database state is invalid.
  • Alternatively, if for example D) commits successfully but E) fails then the resulting database state is invalid.

Problem P2:

  • A proposal here is that if C) runs successfully and the underlying object data is removed but D) and/or E) fail(s), then the user has lost object data. The next time the user tries to read this object, the metadata is found but the data is gone.

Summary:

  • P1. Statements that are part of the same logical operation do not execute in the same database transaction. This means that if one commit succeeds and the other fails, the properties of consistency and atomicity are violated.
  • P2. The coordination of object and metadata, which in itself can be seen as a transaction, may go out-of-sync, resulting in object data loss.

Proposed solutions:

  • P1. Perform all statements within the same transaction. If any statement fails, no changes take effect.
  • P2. It is not sufficient to first commit to DB and remove object later, because how then do you keep track of the need to remove the object if it fails? Nor is it sufficient to first remove the object and then commit to the database, because what if the commit fails (along with any other attempts to persist information about this situation). Therefore, commit intention (statements necessary for the operation) to database, remove object data and then perform the intention. If any step fails, it can be automatically re-tried.

Other information:

  • Running postgres SQL meta backend on commit fb191ed.
  • Only the Unlink() call was reviewed. The general issue of how transactional semantics are applied may potentially be a concern for other top-level calls as well.
@zebjonsa zebjonsa added the kind/bug Something isn't working label Aug 12, 2022
@davies
Copy link
Contributor

davies commented Aug 13, 2022

It's slow to delete the data in object storage, so we can't put all these inside single transaction. When a file is ready to delete (no more links), we put a record into database to track that, and start a background job to delete the data in background. The deletion can fail, we will retry that by scanning the tracking records. Until we delete all the data in object storage, we delete the tracking record.

@davies davies added kind/question Further information is requested and removed kind/bug Something isn't working labels Aug 13, 2022
@SandyXSD
Copy link
Contributor

You may follow the delfile table (the so-called tracking records) in sql.go to review more details.

@juicedata juicedata locked and limited conversation to collaborators Aug 15, 2022
@SandyXSD SandyXSD converted this issue into discussion #2523 Aug 15, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants