Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DatFile, again #6703

Merged
merged 6 commits into from Jul 3, 2019

Conversation

@raskchanky
Copy link
Member

commented Jul 2, 2019

It turns out that when DatFile owns the BufReader, it keeps the file
open and prevents AtomicWriter from renaming the file on Windows.
Previously, a single BufReader was created and passed around between all
the different functions and that is the solution I've gone back to here
as well.

At first, I thought it should be possible to just create a BufReader
whenever necessary, but it turns out that BufReader keeps track of where
it is in the file, and our current implementation is dependent on
reading the rumors in the same order that they're written in.

Since we store the offsets for all the different rumors, in theory it
should be possible to use that to calculate where to go and how much to
read, but in practice, it proved trickier to implement correctly than
I had originally anticipated.

In the interest of time and sanity, we now pass the same BufReader
around, like we used to. Eventually, when someone has the time and
energy, this whole module should be overhauled.

Signed-off-by: Josh Black raskchanky@gmail.com

@mwrock

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

This initially failed with the same error on my windows environment. I just added a small commit that fixes this by ensuring the file is closed in read_or_create_mlr before the call to write_mlr. With that, the supervisor starts without error.

baumanj added a commit that referenced this pull request Jul 2, 2019

An alternate approach to #6703
Signed-off-by: Jon Bauman <5906042+baumanj@users.noreply.github.com>
@mwrock

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

for the recrod @baumanj 's commit 2d05ec9 above starts on windows without failure.

@baumanj
Copy link
Member

left a comment

Just a couple comments here. Per offline discussion, I think we're better served by moving to a different abstraction where the reading and writing capabilities are encapsulated in different types.

If possible, getting some sort of automated testing in which can trigger the error we aim to fix with this change seems very important. It would only fail on Windows, but having it in CI rather than relying on manual validation to catch the issue would be huge.

header_size: 0,
header: Header::default(),
reader };
let size = {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 2, 2019

Member

I suggest moving the file creation to a separate function with a clear comment about why the file must be closed before we write it.


dat_file.read_header()?;
Ok(dat_file)
}

pub fn path(&self) -> &Path { &self.path }

pub fn reader(&self) -> Result<BufReader<File>> {

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 2, 2019

Member

My original concern about passing BufReader around like this (and the suggestion to make it part of DatFile's state was that the reliance on consumers passing it around (but not doing anything else with it) is very brittle.

Assuming this is correct (it may not be), one way to make this a bit safer would be to wrap BufReader in a type that is public to the consumers, but make the BufReader field private. This accomplishes the goal of keeping BufReader outside DatFile's state, but without exposing it to meddling by consumers that could foul up its operation.

@baumanj

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

for the recrod @baumanj 's commit 2d05ec9 above starts on windows without failure.

Mammas, don't let your babies grow up to put Option fields in structs 😉

@raskchanky raskchanky changed the title DatFile can't own the BufReader after all. Refactor DatFile, again Jul 2, 2019

@raskchanky

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

With the latest commit, DatFile can indeed own the BufReader again, because now we've split the responsibilities for reading and writing into two separate entities. See commit messages for further details.

@raskchanky raskchanky force-pushed the jb/datfiles-are-the-worst branch from 79e10ee to e1039e7 Jul 2, 2019

@baumanj

baumanj approved these changes Jul 2, 2019

Copy link
Member

left a comment

I think this looks good. I think it can be even further simplified and have included a cut at that if you're amenable. Either way, let's make sure this passes the tests in #6707 before merging.

}

Ok(dat_file)
Self::reader_creation(dat_file)

This comment has been minimized.

Copy link
@baumanj

baumanj Jul 2, 2019

Member

I think this can be simplified to:

        if size == 0 {
            DatFileWriter::new(data_path.clone()).write_mlr(server)?;
        }

        Self::reader_creation(&data_path)

There's no need to create a DatFile at all.

I think I was a bit off in the original idea about what belongs in DateFileReader vs DatFileWriter. Poking at it a bit, I came to (derives omitted for clarity):

struct DatFile(PathBuf);

pub struct DatFileReader {
    header:   Header,
    dat_file: DatFile,
    reader:   BufReader<File>,
}

pub struct DatFileWriter(DatFile);

The write case doesn't need to keep the header around. it's re-created in write_mlr, so having it as a field is rather misleading. In fact, it would be possible to eliminate DatFile altogether and just have a PathBuf, but I think it's nice to have somewhere to hang that documentation and show the conceptual linkage between DateFileReader vs DatFileWriter.

This does require shuffling things around a bit, but it wasn't too much effort to get things to build. Here's a diff, so we don't have to duplicate effort.

This comment has been minimized.

Copy link
@raskchanky

raskchanky Jul 2, 2019

Author Member

Sounds good. I'll make the appropriate changes. Thanks for including a diff 😁

raskchanky and others added some commits Jul 2, 2019

DatFile can't own the BufReader after all.
It turns out that when DatFile owns the BufReader, it keeps the file
open and prevents AtomicWriter from renaming the file on Windows.
Previously, a single BufReader was created and passed around between all
the different functions and that is the solution I've gone back to here
as well.

At first, I thought it should be possible to just create a BufReader
whenever necessary, but it turns out that BufReader keeps track of where
it is in the file, and our current implementation is dependent on
reading the rumors in the same order that they're written in.

Since we store the offsets for all the different rumors, in theory it
should be possible to use that to calculate where to go and how much to
read, but in practice, it proved trickier to implement correctly than
I had originally anticipated.

In the interest of time and sanity, we now pass the same BufReader
around, like we used to. Eventually, when someone has the time and
energy, this whole module should be overhauled.

Signed-off-by: Josh Black <raskchanky@gmail.com>
scope file in dat_file creation so it closes before being written to
Signed-off-by: mwrock <matt@mattwrock.com>
fix broken dat_file test
Signed-off-by: mwrock <matt@mattwrock.com>
Split DatFile into multiple structs with specific purposes.
Now, instead of a general purpose DatFile that both reads and writes,
there is DatFileReader and DatFileWriter. DatFile itself is now private
and external consumers of this module only interact with the type
necessary for the work their doing.

Butterfly now owns a DatFileWriter, since that is primary what it does
with this file. It creates a DatFileReader at startup to ingest existing
rumors but doesn't hold onto this struct.

rst-reader uses a DatFileReader, since that matches its reason for
existing.

Using this technique, only DatFileReader needs to hold onto the
BufReader, which avoids the problem of atomically writing the file.

Signed-off-by: Josh Black <raskchanky@gmail.com>
Refactor all the DatFile structs for better readability
Signed-off-by: Josh Black <raskchanky@gmail.com>
Fix a failing test
Signed-off-by: Josh Black <raskchanky@gmail.com>

@raskchanky raskchanky force-pushed the jb/datfiles-are-the-worst branch from 4e619f6 to cafcce6 Jul 3, 2019

@raskchanky raskchanky merged commit 1dfbe16 into master Jul 3, 2019

5 checks passed

DCO This commit has a DCO Signed-off-by
Details
buildkite/habitat-sh-habitat-master-verify Build #2584 passed (30 minutes, 31 seconds)
Details
buildkite/habitat-sh-habitat-master-website Build #3154 passed (37 seconds)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
expeditor/config-validation Validated your Expeditor config file
Details

@raskchanky raskchanky deleted the jb/datfiles-are-the-worst branch Jul 3, 2019

chef-ci added a commit that referenced this pull request Jul 3, 2019

Update CHANGELOG.md with details from pull request #6703
Obvious fix; these changes are the result of automation not creative thinking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.