-
Notifications
You must be signed in to change notification settings - Fork 14
Add a storage abstraction #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I don't think I'll need to make any further changes to this outside of review improvements, so I've removed the RFC tag from this. |
ttaylorr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @bk2204! This is shaping up to be really great, and I'm excited that it's going to make things easier to work with in downstream consumers of gitobj.
I left a few larger comments, and a few smaller ones, too. My main thoughts revolve mostly around how we might break out some interfaces based on the ones that you introduced, particularly in package storage.
I'm curious of your thoughts:
storage/backend.go
Outdated
| // Storage returns a list of read sources and optionally one write | ||
| // source. Generally, the write location, if present, should also be a | ||
| // read location. | ||
| Storage() ([]ReadOnlyStorage, Storage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is a great way to break apart the "places we read" from "the place we write". I have some other thoughts about the Storage interface, but they are scattered elsewhere throughout this review.
In regard to the []ReadOnlyStorage part of the return tuple, I'm not sure that the slice is a detail that we want to expose to caller. My thinking is that a caller should have to care only about the one place they should point reads, and not have to range through a slice, or similar.
One way that Go tackles this in the standard library is with the concrete type *io.multiReader, exposed by io.MultiReader. Its signature is:
package io
func MultiReader(rs ...io.Reader) io.Reader(Where ... is Go's way of spelling variadic arguments).
Since our ReadOnlyStorage is analogous to Go's io.Reader, what do you think about introducing a *storage.multiReadOnly that is itself a ReadOnlyStorage?
I think that this would make things a little more complicated for us, but easier for the caller to consume, but I'm really curious of your thoughts, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea. Thanks for the suggestion.
storage/storage.go
Outdated
| // Storage implements an interface for reading and writing objects in an | ||
| // object database. | ||
| type Storage interface { | ||
| ReadOnlyStorage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what it would look like to invert the two dependencies: i.e., that Storage reads, and ReadOnlyStorage is replaced with WriteStorage, such that the more common type has less it needs to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sounds good.
pack/storage.go
Outdated
| return d.mr.Read(b) | ||
| } | ||
|
|
||
| func (d *delayedObjectReadWriter) Write(b []byte) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that we could change the signature of Object to not necessarily provide Write?
Perhaps Object doesn't need it at all, but we expect all concrete implementations of Object to provide it. Or, perhaps we introduce type Writeable (and WriteableObject which composes the two), and expect *Blob, *Commit and so on to implement it.
I think that this would allow us to get rid of this implementation which immediately errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because Open returns an io.ReadWriteCloser and we're reusing the Open method from the storer interface. I'm not completely clear on why we need an io.ReadWriteCloser instead of an io.ReadCloser, but let me spend some more time looking at it to see if I can adjust it to use io.ReadCloser instead.
| "strings" | ||
| ) | ||
|
|
||
| type Storage struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's add documentation to publicly exported types and functions.
pack/storage.go
Outdated
| return false | ||
| } | ||
|
|
||
| type delayedObjectReadWriter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's move this to pack/delayed_object.go or pack/object_delayed.go since Go typically has one type per file.
errors/errors.go
Outdated
|
|
||
| func IsNoSuchObject(e error) bool { | ||
| _, ok := e.(*NoSuchObject) | ||
| return ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we might consider is also checking that e != nil, since (*NoSuchObject)(nil) (read: "cast nil to type *NoSuchObject)" is a valid thing to accept in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to nils of different types comparing unequal, I would have to do something like this:
err, ok := e.(*noSuchObject)
return ok && err != nilI've squashed that in.
errors/errors.go
Outdated
|
|
||
| // NoSuchObject is an error type that occurs when no object with a given object | ||
| // ID is available. | ||
| type NoSuchObject struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing we might think about is making this name NoSuchObject the constructor for an otherwise-private implementation, which I think would read nicely when returning an error.
For example, instead of:
return nil, &errors.NoSuchObject{Oid: oid}we could write:
return nil, errors.NoSuchObject(oid)
pack/set.go
Outdated
| } | ||
|
|
||
| return nil, errNotFound | ||
| return nil, &errors.NoSuchObject{Oid: name} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨ ✨ ✨
object_db.go
Outdated
| return nil, err | ||
| if s.IsCompressed() { | ||
| return NewObjectReadCloser(f) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: some linters will complain that an else block that follows an if with an unconditional return is redundant, and should be rewritten as:
if foo {
return bar
}
return bazThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks for pointing this out. I always do this in C, but for some reason I haven't yet convinced myself to do it in Go.
In a future commit, we'll want to provide a storage interface that is read-only. If the Open method for storers returns a io.ReadWriteCloser, then we'll be forced to implement a useless Write method. Since we don't actually need the ability to call Write, simply make the storer interface return an io.ReadCloser instead, which lets us easily provide the read-only interface we want.
There are several places where we want to expose an abstraction for storage for the object database. Add read-write and read-only storage abstractions and a backend abstraction that encapsulates one or more of these abstractions as interfaces that can be implemented. Provide a method to determine whether the data read from the storage is compressed, since pack data will be uncompressed for us, but most other storage will not be.
The memory and file storers already almost completely implement the storage interface. Add Close methods to make them implement the interface completely.
Add a wrapper around a pack set that provides a storage interface as defined by the fs module. This allows us to emulate object storage from packs in an abstract way.
Depending on the storage backend used, we can return any of several errors if an object doesn't exist. For consistency, let's add an error specifically for this condition and place it into a new errors package so that it can be referred to from both the main package and the pack package without import cycles. Add a function in addition to quickly determine if the error is due to a missing object.
Now that we have a consistent type for reporting whether an error is due to an object being missing (as opposed to, say, a serious filesystem error), use that error consistently whenever we have a missing object. Update the tests to expect this new value as well.
Some storage backends need to wrap multiple storage implementations, such as the file backend: we need to read from both loose objects and packs. In order to make these two backends appear to be one unified implementation, add a wrapper implementation that reads from whichever implementation provides the object. Implement automatic decompression, since we may end up with some implementations which decompress automatically and others which do not.
Add two basic concrete backend types, one based on the filesystem using fileStorer and pack.Storage, and the other based on memory using memoryStorer. This provides a standard set of backends that we can use for reading and writing actual Git repositories or for testing using memory locations. Note that this code cannot live in the storage package, where one would logically expect it to be, because the concrete types implementing it are private to the package.
Now that we have a general storage abstraction, use that to read and write data from an arbitrary storage backend. Add a function that reads from an arbitrary backend; use this when updating the tests.
|
I think I've addressed all of the issues mentioned here, either through code changes or commit message updates. Do note that there are some additional commits in order to implement the suggestions made. |
ttaylorr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, and thanks for taking all of my feedback! 🚀
Currently we have a storer abstraction, but it isn't exposed, making it hard to mock the disk in unit tests that use this module. This series wraps both the existing storers and the pack piece in a storage abstraction that can be overridden and used elsewhere. It also provides a consistent error type so that we can tell when an object is missing versus when there's some more serious error.
This series is marked RFC because while I think it should be done, I'm finalizing the use of it in git-lfs/git-lfs and I want to be sure that if I need to make any changes that I do it before this series merges.