Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug in the FS logging layer of xv6? #123

Closed
Kristoff-starling opened this issue May 19, 2022 · 1 comment
Closed

A bug in the FS logging layer of xv6? #123

Kristoff-starling opened this issue May 19, 2022 · 1 comment

Comments

@Kristoff-starling
Copy link

Hello, I'm recently studying the principle of xv6 and benefits a lot from the explicit comments and the textbook. I think that there may be a bug in the logging layer in the file system. I'll describe the details below:

This is the code of install_trans() in /kernel/log.c :

static void
install_trans(void)
{
  int tail;

  for (tail = 0; tail < log.lh.n; tail++) {
    struct buf *lbuf = bread(log.dev, log.start+tail+1); // read log block
    struct buf *dbuf = bread(log.dev, log.lh.block[tail]); // read dst
    memmove(dbuf->data, lbuf->data, BSIZE);  // copy block to dst
    bwrite(dbuf);  // write dst to disk
    bunpin(dbuf);
    brelse(lbuf);
    brelse(dbuf);
  }
}

the line bunpin(dbuf) unpins dbuf in the buffer cache so that this block can be evicted out of the memory after brelse(dbuf). The code works correctly if everything goes normally: there is a bpin() in log_write() which pins the block in the buffer cache by incrementing its reference count and the bunpin() here decrements it.

However, if a crash happened during the installation of a transaction, after xv6 reboots, it would call recover_from_log() in initlog() and call install_trans to write logging blocks into their home blocks. At this moment, there isn't a bpin() before, so the bunpin() here will decrement the reference count to zero and in brelse() the reference count will experience an underflow and become UINT_MAX since the refcnt is a uint type variable. As a result, the block will stay in the buffer cache "forever".

We can expose this bug by adding two lines in xv6:

  • We emulate a crash by adding a panic() in install_trans() like this:

        brelse(dbuf);
      }
      
    +  if (log.lh.n != 0) panic("crash");
    }

    Here the condition log.lh.n != 0 ensures that the panic() won't be triggered during the recover procedure.

  • We add an assertion in brelse():

      acquire(&bcache.lock);
    + if (b->refcnt == 0) panic("bug");
      b->refcnt--;
      if (b->refcnt == 0) {
     

    Before decrementing the reference count, b->refcnt >= 1 should always hold.

After we fire up xv6 for the first time, we'll encounter a "crash", and after rebooting, xv6 will panic at panic("bug"). I made a disk image, fs-bug.img, which records a crashed xv6 using the first step. If we load this image, we'll directly get a buggy scene (and can be detected by the assertion in brelse()). The disk image is available here.

A possible solution for this problem is to add a bpin() in read_head() so that the bunpin() during the recover procedure will have corresponding bpin().

I hope that my suggestions will be helpful. Welcome for further discussions, thanks!

@Kristoff-starling
Copy link
Author

I'm sorry that in the latest version of xv6, this bug has been fixed by an if statement

if (recovering == 0)
    bunpin(dbuf);

I refered to an old version (2020) of xv6 and mistakenly proposed the issue. I apologize for my carelessness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant