New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osutil: generalize SyncDir with FileState interface #7484
Conversation
f4601fe
to
5b7b34c
Compare
@bboozzoo thank you for the review. I'll look at addressing this after the standup. |
The directory synchronization code grew out of the desire to have a set of files described by a glob and short, in-memory contents be reflected to the disk in an efficient and predictable way. Recently this code has started to be used to install icon themes shipped by snaps. This means it may be used to coerce snapd to read arbitrary amount of data into memory. This address this issue by generalizing the directory sync APIs to take an interface instead of a concrete representation of the desired file. There are now three concrete implementations, one that keeps the content in memory, just like before, called MemoryBlob and two new ones: FileReference and FileContentReference. Those both refer to an existing file for content, opening up the possibility to refer to large files. They only differ in the treatment of file mode, either mirroring the mode of the file being referred or using a fixed mode, respectively. Behind the scenes the EnsureFileState function will no longer read all of the file into memory. Instead if will use FileReference to stream it, chunk by chunk, in an attempt to see if the file is identical to what we expected. On top of that, if the file is not the same and the caller has provided a FileReference or FileContentReference, the logic that writes a new file and replaces the original is also using streaming, again saving a in-memory copy. This way we can now process files of arbitrary size using fixed amount of memory. This involves the new icon wrapper which has been switched to use FileContentReference. The patch contains some verbose automatic changes around the code using maps of FileState structure to replace them with maps of FileState interface instead. Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
There's already an implementation called StreamsEqual that is similar but different. Ahead of unification let's move them closer. Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of naming comments, I haven't looked myself at the equality code
I'm about to merge StreamEqual and StreamsEqual. They currently differ in signature: one takes an buffer size and returns an error in addition to boolean, the other does not. I decided to drop the error value as we don't really need it in the current scheme. Just the equal-or-not value. Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
The private functions take an explicit buffer size and contain the real implementation. The public functions just call the private functions with chunk size 0, to indicate that the default should be used. This brings us closer to merging StreamsEqual and StreamEqual Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
The test suite for the extra implementation of StreamEqual handled the case where the same stream is passed to the function twice. This was missing in the original implementation. In advent of the unification support this edge use case. Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Now that the two implementations have the same interface we can drop my extra copy. Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
This just removes the transitional function. Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
The removed function was a subset of TestStreamsEqualChunked that tested buffer size of one. It was a leftover from the development phase when that variant failed. Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
5b7b34c
to
ad8254a
Compare
I've addressed everything related to stream equality functions. I also applied the new suggested names. |
The new name was suggested by Samuele Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
text := "marry had a little lamb" | ||
|
||
// Passing the same stream twice is not mishandled. | ||
readerA := bytes.NewReader([]byte(text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or bytes.NewBufferString(text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or strings.NewReader(text)
. We're spoiled for choice :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or... what I did? :D
text := "marry had a little lamb" | ||
|
||
// Passing the same stream twice is not mishandled. | ||
readerA := bytes.NewReader([]byte(text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or strings.NewReader(text)
. We're spoiled for choice :-)
Per @mvo5's request I'll split this into two branches:
EDIT: After brief discussion on IRC I'll refrain from splitting this. |
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The directory synchronization code grew out of the desire to have a set
of files described by a glob and short, in-memory contents be reflected
to the disk in an efficient and predictable way.
Recently this code has started to be used to install icon themes shipped
by snaps. This means it may be used to coerce snapd to read arbitrary
amount of data into memory.
This address this issue by generalizing the directory sync APIs to take
an interface instead of a concrete representation of the desired file.
There are now three concrete implementations, one that keeps the content
in memory, just like before, called MemoryFileState and two new ones:
FileReference and FileReferencePlusMode. Those both refer to an existing
file for content, opening up the possibility to refer to large files.
They only differ in the treatment of file mode, either mirroring the
mode of the file being referred or using a fixed mode, respectively.
Behind the scenes the EnsureFileState function will no longer read all
of the file into memory. Instead if will use FileReference to stream it,
chunk by chunk, in an attempt to see if the file is identical to what we
expected.
On top of that, if the file is not the same and the caller has provided
a FileReference or FileReferencePlusMode, the logic that writes a new
file and replaces the original is also using streaming, again saving a
in-memory copy.
This way we can now process files of arbitrary size using fixed amount
of memory. This involves the new icon wrapper which has been switched to
use FileReferencePlusMode.
The patch contains some verbose automatic changes around the code using
maps of FileState structure to replace them with maps of FileState
interface instead.
Signed-off-by: Zygmunt Krynicki me@zygoon.pl