Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 byte dat file #5460

Closed
stevendanna opened this issue Aug 16, 2018 · 1 comment
Closed

0 byte dat file #5460

stevendanna opened this issue Aug 16, 2018 · 1 comment

Comments

@stevendanna
Copy link
Contributor

We recently had a user report that a Habitat supervisor was failing to start. The supervisor log showed:

Aug 15 18:30:17 redacted.host hab[1242]: hab-sup(ER)[components/sup/src/error.rs:449:9]: Butterfly error: Error reading or writing to DatFile, /hab/sup/default/data/cab66ba4cf12423db7b6cce2a1d76c54.rst, failed to fill whole buffer

Further investigation showed that /hab/sup/default/data/cab66ba4cf12423db7b6cce2a1d76c54.rst existed but was a 0-byte file.

While we haven't been able to investigate the problem further yet, the code that handles the writes:

pub fn write(&self, server: &Server) -> Result<usize> {
let mut header = Header::default();
let tmp_path = self
.path
.with_extension(thread_rng().gen_ascii_chars().take(8).collect::<String>());
{
let file = OpenOptions::new()
.create(true)
.write(true)
.truncate(true)
.open(&tmp_path)
.map_err(|err| Error::DatFileIO(tmp_path.clone(), err))?;
let mut writer = BufWriter::new(file);
self.init(&mut writer)?;
header.member_len = self.write_member_list(&mut writer, &server.member_list)?;
header.service_len = self.write_rumor_store(&mut writer, &server.service_store)?;
header.service_config_len =
self.write_rumor_store(&mut writer, &server.service_config_store)?;
header.service_file_len =
self.write_rumor_store(&mut writer, &server.service_file_store)?;
header.election_len = self.write_rumor_store(&mut writer, &server.election_store)?;
header.update_len = self.write_rumor_store(&mut writer, &server.update_store)?;
header.departure_len = self.write_rumor_store(&mut writer, &server.departure_store)?;
writer
.seek(SeekFrom::Start(1))
.map_err(|err| Error::DatFileIO(self.path.clone(), err))?;
self.write_header(&mut writer, &header)?;
writer
.flush()
.map_err(|err| Error::DatFileIO(self.path.clone(), err))?;
}
fs::rename(&tmp_path, &self.path).map_err(|err| Error::DatFileIO(self.path.clone(), err))?;
Ok(0)
}

appears to be missing at least two calls to fsync if we want those writes to be durable. We likely need an fsync on the temporary file before the rename to ensure that the content of the file has been written to disk and an fsync on the enclosing directory after the rename to ensure that the rename operation is flushed to disk.

stevendanna added a commit to stevendanna/habitat that referenced this issue Aug 16, 2018
This adds two calls to fsync in the DatFile write method:

1) On the temporary file before the rename
2) On the parent directory after the rename

The first fsync is to persist the content of the dat file to disk. I
think this might have prevented of the 0-byte DatFile seen in habitat-sh#5460.

The second fsync is to persist the rename operation to disk. I've
no-oped it on Windows since we found that calling sync_all on a
directory in windows produces an error in
b688d8a.

Signed-off-by: Steven Danna <steve@chef.io>
stevendanna added a commit to stevendanna/habitat that referenced this issue Aug 16, 2018
This adds two calls to fsync in the DatFile write method:

1) On the temporary file before the rename
2) On the parent directory after the rename

The first fsync is to persist the content of the dat file to disk. I
think this might have prevented of the 0-byte DatFile seen in habitat-sh#5460.

The second fsync is to persist the rename operation to disk. I've
no-oped it on Windows since we found that calling sync_all on a
directory in windows produces an error in
b688d8a.

Signed-off-by: Steven Danna <steve@chef.io>
@mwrock
Copy link
Contributor

mwrock commented Aug 21, 2018

Closed by #5461

@mwrock mwrock closed this as completed Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants