Skip to content

Conversation

@bk2204
Copy link
Member

@bk2204 bk2204 commented Sep 5, 2018

Currently we have a storer abstraction, but it isn't exposed, making it hard to mock the disk in unit tests that use this module. This series wraps both the existing storers and the pack piece in a storage abstraction that can be overridden and used elsewhere. It also provides a consistent error type so that we can tell when an object is missing versus when there's some more serious error.

This series is marked RFC because while I think it should be done, I'm finalizing the use of it in git-lfs/git-lfs and I want to be sure that if I need to make any changes that I do it before this series merges.

@bk2204 bk2204 changed the title RFC: add a storage abstraction Add a storage abstraction Sep 5, 2018
@bk2204
Copy link
Member Author

bk2204 commented Sep 5, 2018

I don't think I'll need to make any further changes to this outside of review improvements, so I've removed the RFC tag from this.

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @bk2204! This is shaping up to be really great, and I'm excited that it's going to make things easier to work with in downstream consumers of gitobj.

I left a few larger comments, and a few smaller ones, too. My main thoughts revolve mostly around how we might break out some interfaces based on the ones that you introduced, particularly in package storage.

I'm curious of your thoughts:

// Storage returns a list of read sources and optionally one write
// source. Generally, the write location, if present, should also be a
// read location.
Storage() ([]ReadOnlyStorage, Storage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a great way to break apart the "places we read" from "the place we write". I have some other thoughts about the Storage interface, but they are scattered elsewhere throughout this review.

In regard to the []ReadOnlyStorage part of the return tuple, I'm not sure that the slice is a detail that we want to expose to caller. My thinking is that a caller should have to care only about the one place they should point reads, and not have to range through a slice, or similar.

One way that Go tackles this in the standard library is with the concrete type *io.multiReader, exposed by io.MultiReader. Its signature is:

package io

func MultiReader(rs ...io.Reader) io.Reader

(Where ... is Go's way of spelling variadic arguments).

Since our ReadOnlyStorage is analogous to Go's io.Reader, what do you think about introducing a *storage.multiReadOnly that is itself a ReadOnlyStorage?

I think that this would make things a little more complicated for us, but easier for the caller to consume, but I'm really curious of your thoughts, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. Thanks for the suggestion.

// Storage implements an interface for reading and writing objects in an
// object database.
type Storage interface {
ReadOnlyStorage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what it would look like to invert the two dependencies: i.e., that Storage reads, and ReadOnlyStorage is replaced with WriteStorage, such that the more common type has less it needs to implement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good.

pack/storage.go Outdated
return d.mr.Read(b)
}

func (d *delayedObjectReadWriter) Write(b []byte) (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we could change the signature of Object to not necessarily provide Write?

Perhaps Object doesn't need it at all, but we expect all concrete implementations of Object to provide it. Or, perhaps we introduce type Writeable (and WriteableObject which composes the two), and expect *Blob, *Commit and so on to implement it.

I think that this would allow us to get rid of this implementation which immediately errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because Open returns an io.ReadWriteCloser and we're reusing the Open method from the storer interface. I'm not completely clear on why we need an io.ReadWriteCloser instead of an io.ReadCloser, but let me spend some more time looking at it to see if I can adjust it to use io.ReadCloser instead.

"strings"
)

type Storage struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's add documentation to publicly exported types and functions.

pack/storage.go Outdated
return false
}

type delayedObjectReadWriter struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's move this to pack/delayed_object.go or pack/object_delayed.go since Go typically has one type per file.

errors/errors.go Outdated

func IsNoSuchObject(e error) bool {
_, ok := e.(*NoSuchObject)
return ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we might consider is also checking that e != nil, since (*NoSuchObject)(nil) (read: "cast nil to type *NoSuchObject)" is a valid thing to accept in this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to nils of different types comparing unequal, I would have to do something like this:

	err, ok := e.(*noSuchObject)
	return ok && err != nil

I've squashed that in.

errors/errors.go Outdated

// NoSuchObject is an error type that occurs when no object with a given object
// ID is available.
type NoSuchObject struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we might think about is making this name NoSuchObject the constructor for an otherwise-private implementation, which I think would read nicely when returning an error.

For example, instead of:

return nil, &errors.NoSuchObject{Oid: oid}

we could write:

return nil, errors.NoSuchObject(oid)

pack/set.go Outdated
}

return nil, errNotFound
return nil, &errors.NoSuchObject{Oid: name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ ✨ ✨

object_db.go Outdated
return nil, err
if s.IsCompressed() {
return NewObjectReadCloser(f)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: some linters will complain that an else block that follows an if with an unconditional return is redundant, and should be rewritten as:

if foo {
        return bar
}
return baz

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for pointing this out. I always do this in C, but for some reason I haven't yet convinced myself to do it in Go.

In a future commit, we'll want to provide a storage interface that is
read-only.  If the Open method for storers returns a io.ReadWriteCloser,
then we'll be forced to implement a useless Write method.  Since we
don't actually need the ability to call Write, simply make the storer
interface return an io.ReadCloser instead, which lets us easily provide
the read-only interface we want.
There are several places where we want to expose an abstraction for
storage for the object database.  Add read-write and read-only storage
abstractions and a backend abstraction that encapsulates one or more of
these abstractions as interfaces that can be implemented.

Provide a method to determine whether the data read from the storage is
compressed, since pack data will be uncompressed for us, but most other
storage will not be.
The memory and file storers already almost completely implement the
storage interface.  Add Close methods to make them implement the
interface completely.
Add a wrapper around a pack set that provides a storage interface as
defined by the fs module.  This allows us to emulate object storage from
packs in an abstract way.
Depending on the storage backend used, we can return any of several
errors if an object doesn't exist.  For consistency, let's add an error
specifically for this condition and place it into a new errors package
so that it can be referred to from both the main package and the pack
package without import cycles.  Add a function in addition to quickly
determine if the error is due to a missing object.
Now that we have a consistent type for reporting whether an error is due
to an object being missing (as opposed to, say, a serious filesystem
error), use that error consistently whenever we have a missing object.
Update the tests to expect this new value as well.
Some storage backends need to wrap multiple storage implementations,
such as the file backend: we need to read from both loose objects and
packs.  In order to make these two backends appear to be one unified
implementation, add a wrapper implementation that reads from whichever
implementation provides the object.  Implement automatic decompression,
since we may end up with some implementations which decompress
automatically and others which do not.
Add two basic concrete backend types, one based on the filesystem using
fileStorer and pack.Storage, and the other based on memory using
memoryStorer.  This provides a standard set of backends that we can use
for reading and writing actual Git repositories or for testing using
memory locations.

Note that this code cannot live in the storage package, where one would
logically expect it to be, because the concrete types implementing it
are private to the package.
Now that we have a general storage abstraction, use that to read and
write data from an arbitrary storage backend.  Add a function that reads
from an arbitrary backend; use this when updating the tests.
@bk2204
Copy link
Member Author

bk2204 commented Sep 6, 2018

I think I've addressed all of the issues mentioned here, either through code changes or commit message updates. Do note that there are some additional commits in order to implement the suggestions made.

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and thanks for taking all of my feedback! 🚀

@bk2204 bk2204 merged commit 1e97572 into git-lfs:master Sep 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants