Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osutil: generalize SyncDir with FileState interface #7484

Merged
merged 13 commits into from Sep 30, 2019

Conversation

zyga
Copy link
Collaborator

@zyga zyga commented Sep 18, 2019

The directory synchronization code grew out of the desire to have a set
of files described by a glob and short, in-memory contents be reflected
to the disk in an efficient and predictable way.

Recently this code has started to be used to install icon themes shipped
by snaps. This means it may be used to coerce snapd to read arbitrary
amount of data into memory.

This address this issue by generalizing the directory sync APIs to take
an interface instead of a concrete representation of the desired file.

There are now three concrete implementations, one that keeps the content
in memory, just like before, called MemoryFileState and two new ones:
FileReference and FileReferencePlusMode. Those both refer to an existing
file for content, opening up the possibility to refer to large files.
They only differ in the treatment of file mode, either mirroring the
mode of the file being referred or using a fixed mode, respectively.

Behind the scenes the EnsureFileState function will no longer read all
of the file into memory. Instead if will use FileReference to stream it,
chunk by chunk, in an attempt to see if the file is identical to what we
expected.

On top of that, if the file is not the same and the caller has provided
a FileReference or FileReferencePlusMode, the logic that writes a new
file and replaces the original is also using streaming, again saving a
in-memory copy.

This way we can now process files of arbitrary size using fixed amount
of memory. This involves the new icon wrapper which has been switched to
use FileReferencePlusMode.

The patch contains some verbose automatic changes around the code using
maps of FileState structure to replace them with maps of FileState
interface instead.

Signed-off-by: Zygmunt Krynicki me@zygoon.pl

@zyga zyga force-pushed the feature/sync-dir-file-ref branch 2 times, most recently from f4601fe to 5b7b34c Compare September 19, 2019 08:54
@zyga zyga changed the title osutil: generalize SyncDir with FileStater interface osutil: generalize SyncDir with FileState interface Sep 19, 2019
@zyga zyga marked this pull request as ready for review September 19, 2019 08:56
osutil/syncdir.go Outdated Show resolved Hide resolved
osutil/syncdir.go Outdated Show resolved Hide resolved
osutil/syncdir.go Outdated Show resolved Hide resolved
osutil/syncdir.go Outdated Show resolved Hide resolved
osutil/syncdir.go Outdated Show resolved Hide resolved
osutil/syncdir.go Show resolved Hide resolved
@zyga
Copy link
Collaborator Author

zyga commented Sep 23, 2019

@bboozzoo thank you for the review. I'll look at addressing this after the standup.

@pedronis pedronis self-requested a review September 25, 2019 09:09
The directory synchronization code grew out of the desire to have a set
of files described by a glob and short, in-memory contents be reflected
to the disk in an efficient and predictable way.

Recently this code has started to be used to install icon themes shipped
by snaps. This means it may be used to coerce snapd to read arbitrary
amount of data into memory.

This address this issue by generalizing the directory sync APIs to take
an interface instead of a concrete representation of the desired file.

There are now three concrete implementations, one that keeps the content
in memory, just like before, called MemoryBlob and two new ones:
FileReference and FileContentReference. Those both refer to an existing
file for content, opening up the possibility to refer to large files.
They only differ in the treatment of file mode, either mirroring the
mode of the file being referred or using a fixed mode, respectively.

Behind the scenes the EnsureFileState function will no longer read all
of the file into memory. Instead if will use FileReference to stream it,
chunk by chunk, in an attempt to see if the file is identical to what we
expected.

On top of that, if the file is not the same and the caller has provided
a FileReference or FileContentReference, the logic that writes a new
file and replaces the original is also using streaming, again saving a
in-memory copy.

This way we can now process files of arbitrary size using fixed amount
of memory. This involves the new icon wrapper which has been switched to
use FileContentReference.

The patch contains some verbose automatic changes around the code using
maps of FileState structure to replace them with maps of FileState
interface instead.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
There's already an implementation called StreamsEqual that is similar
but different. Ahead of unification let's move them closer.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of naming comments, I haven't looked myself at the equality code

osutil/syncdir.go Outdated Show resolved Hide resolved
osutil/syncdir.go Outdated Show resolved Hide resolved
I'm about to merge StreamEqual and StreamsEqual. They currently differ
in signature: one takes an buffer size and returns an error in addition
to boolean, the other does not.

I decided to drop the error value as we don't really need it in the
current scheme. Just the equal-or-not value.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
The private functions take an explicit buffer size and contain the real
implementation. The public functions just call the private functions
with chunk size 0, to indicate that the default should be used.

This brings us closer to merging StreamsEqual and StreamEqual

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
The test suite for the extra implementation of StreamEqual handled the
case where the same stream is passed to the function twice. This was
missing in the original implementation. In advent of the unification
support this edge use case.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Now that the two implementations have the same interface we can drop
my extra copy.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
This just removes the transitional function.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
The removed function was a subset of TestStreamsEqualChunked
that tested buffer size of one. It was a leftover from the development
phase when that variant failed.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
@zyga
Copy link
Collaborator Author

zyga commented Sep 27, 2019

I've addressed everything related to stream equality functions. I also applied the new suggested names.

The new name was suggested by Samuele

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
@zyga zyga requested a review from chipaca September 27, 2019 16:59
osutil/cmp_test.go Show resolved Hide resolved
text := "marry had a little lamb"

// Passing the same stream twice is not mishandled.
readerA := bytes.NewReader([]byte(text))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or bytes.NewBufferString(text)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or strings.NewReader(text). We're spoiled for choice :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or... what I did? :D

text := "marry had a little lamb"

// Passing the same stream twice is not mishandled.
readerA := bytes.NewReader([]byte(text))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or strings.NewReader(text). We're spoiled for choice :-)

@zyga
Copy link
Collaborator Author

zyga commented Sep 30, 2019

Per @mvo5's request I'll split this into two branches:

  • make FileState an interface, introducing MemoryFileState as a replacement
  • the rest of the changes, unless they are absolutely required by the change above

EDIT: After brief discussion on IRC I'll refrain from splitting this.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Copy link
Collaborator

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zyga zyga merged commit f1f8175 into snapcore:master Sep 30, 2019
@zyga zyga deleted the feature/sync-dir-file-ref branch September 30, 2019 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants