Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REVIEW: Refactor Git implementation, extract platform specific code #308

Merged
merged 31 commits into from Feb 23, 2015
Merged

REVIEW: Refactor Git implementation, extract platform specific code #308

merged 31 commits into from Feb 23, 2015

Conversation

adrienthebo
Copy link
Contributor

This pull request refactors the code interacting with Git. It pulls out the code that directly interacts with the Git binary (by shelling out, running commands, parsing the output, etc) into a separate set of classes.

These classes directly interact with Git:

  • BareRepository
  • WorkingRepository
  • ThinRepository (a WorkingRepository that uses Git alternates to perform shallow clones)

These classes wrap interactions with Git but aren't bound to the Git implementation:

  • Cache (wraps a BareRepository and handles updating that repository when necessary)
  • StatefulRepository (wraps a ThinRepository and acts to synchronize a Git repository with some desired state for that repository)

The goal of this refactor is to break up the low level logic that interacts with the Git binary into a distinct layer that can be swapped out, and adds a logic layer that actually provides the application logic to run everything.

@andersonmills
Copy link
Contributor

I'm currently getting two failures in the spec tests.

Failures:

  1) R10K::Git::ThinRepository cloning adds the cache repo to the alternates file
     Failure/Error: expect(subject.alternates.to_a).to eq [File.join(cacherepo.git_dir, 'objects')]

       expected: ["/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-b5gsi/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150219-41431-llg190-puppet-boolean.git/objects"]
            got: ["/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-b5gsi/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150219-41431-llg190-puppet-boolean.git/objects"]

       (compared using ==)
     # ./spec/integration/git/thin_repository_spec.rb:49:in `block (3 levels) in <top (required)>'

  2) R10K::Git::WorkingRepository cloning with a reference repository adds the reference repository to the alternates directory
     Failure/Error: expect(subject.alternates.to_a).to eq [File.join(remote, 'objects')]

       expected: ["/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-t7lsit/puppet-boolean.git/objects"]
            got: ["/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-t7lsit/puppet-boolean.git/objects"]

       (compared using ==)
     # ./spec/integration/git/working_repository_spec.rb:72:in `block (4 levels) in <top (required)>'

@andersonmills
Copy link
Contributor

Note: fixed with this

$ env | grep -i tmp
TMPDIR=/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/
$ export TMPDIR=/private${TMPDIR}
echo $TMPDIR
/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/
$ be rspec spec

@adrienthebo
Copy link
Contributor Author

The spec failures in travis are due to timing sensitive tests and were probably caused by the timing change on IO pumps.

@adrienthebo
Copy link
Contributor Author

IO pump timing moved to #315

@adrienthebo
Copy link
Contributor Author

@andersonmills I added 3a87e22 to dereference any symlinks in the fixture path, could you see if that makes the specs pass on your system?

@andersonmills
Copy link
Contributor

Still getting one failure.

 1) R10K::Git::ThinRepository cloning adds the cache repo to the alternates file
     Failure/Error: expect(subject.alternates.to_a).to eq [File.join(cacherepo.git_dir, 'objects')]

       expected: ["/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150220-43886-gcgkym/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150220-43886-1k3ss3n-puppet-boolean.git/objects"]
            got: ["/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150220-43886-gcgkym/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150220-43886-1k3ss3n-puppet-boolean.git/objects"]

       (compared using ==)
     # ./spec/integration/git/thin_repository_spec.rb:49:in `block (3 levels) in <top (required)>'

@adrienthebo
Copy link
Contributor Author

I added the File.realpath to the failing spec as well; hopefully that'll resolve that issue too.

@andersonmills
Copy link
Contributor

Yep. That fixed it.

end

return :insync
@repo.status
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you didn't use Forwardable and def_delegators for sync and status as in cache.rb above?

@andersonmills
Copy link
Contributor

These two unimportant comments would not keep my from merging, and I'm 👍 . I'll wait to do this on Monday morning after talking with you, though.

This commit adds fixture data and rspec helpers for performing
integration tests touching Git repositories. Trying to stub out every
single git command touching a git repository would be more work than
just running the commands and checking the result, so this should
simplify testing. Moreover this allows different Git implementations to
share the same tests to confirm they have the same behavior.
This commit adds a generic implementation of a bare Git repository. The
existing code that deals with bare repositories is tightly coupled with
how to cache Git repositories and is hard to test it, so this class was
extracted to just deal with how Git bare repositories behave.
The previous implementation of the Git caching code intermixed caching
logic and Git commands which made things hard to test and reason about.
This commit removes all of the Git implementation logic from the cache
class and reuses the BareRepository class, so the Cache only has to deal
with, well, caching.
Pathname objects are significantly more convenient to use for dealing with
files and file paths and I've been replacing String path members with
Pathname members. This commit updates the Alternates object to use a
Pathname instead of a string object as well.
The WorkingDir class implemented interactions with non-bare git
repositories, but it blended handling a non-bare repository, performing
sparse clones, and enforcing the state of a given repository. This
commit implements a class for just interacting with a non-bare
repository; the concerns of sparse clones and state enforcement will be
handled by separate classes.
This class extends a working repository to add sparse clones and
updates. All clone and fetch operations are done by a cached git
repository and the actual working repository simply reuses the cache
repo object database.
This adds the #resolve method to bare repositories to resolve git refs
to actual commits.
This adds the #ref_type method that takes a gif ref and will return the
type of the ref.
The remote of a bare repository is the property of an existing repo and
is not an intrinsic part of the object initialization, so it cannot be
supplied to the constructor.
This was another location where file paths were being treated as Strings
instead of Pathnames; converted this to a Pathname as well for
consistency.
This adds the #git_dir method to the WorkingRepository object, so that
ThinRepository, WorkingRepository, and BareRepository all implement the
same interface.
The BareRepository and WorkingRepository classes both need to run git
commands but the initial versions of these classes just copied the old
`#git` method. This commit extracts a base class and adds the #git
method to that class.
This commit removes the duplicate #tags implementation from the
BareRepository and WorkingRepository classes and moves it to the base
class.
This commit removes the duplicate #resolve implementation from the
BareRepository and WorkingRepository classes and moves it to the base
class.
This implements an abstract `#git_dir` method in the BaseRepository to
indicate that subclasses should implement it.
This commit removes the duplicate #branches implementation from the
BareRepository and WorkingRepository classes and moves it to the base
class.
The #origin method may need to be invoked before the repository is
initialized, so if the git command to retrieve the origin remote url
fails then just return nil.
Enumerating tags and branches requires invoking `git for-each-ref` with
slightly different args; this centralizes the shared behavior.
WorkingRepository instances could clone a repository and check out an
initial ref, but `git clone -b` can only take a branch or tag and not a
commit. In order to resolve this consistency and make
branches/tags/commit be equivalent this commit uses #checkout which
treats the three as equivalent.
The Cache object may be used as if it is a git repository itself; to
facilitate that we just forward the #resolve method to the backing bare
repository.
This class controls how Git repositories are created and updated. Class
instances take a desired Git remote and reference, compare those states
to the repository on disk, and can reconcile the two. It takes advantage
of cached Git repositories to deduplicate Git objects and speed up
operations.
Git branches are a bit strange in comparison to Git commit SHAs and tags
because they are mutable. If we've been given a Git ref that's a branch,
we have to make sure that the branch itself has been fetched so that we
actually track upstream changes to the branch.
If the desired ref cannot be resolved to a SHA, we cannot synchronize
the repository and must fail immediately.
The R10K::Environment::Git and R10K::Module::Git classes shared a lot of
behavior on how repositories are synchronized that have been subsumed
into the StatefulRepository class. We can replace the duplicate code
with calls into the stateful repo instead.
The R10K::Git::Ref classes and subclasses were an attempt to add some
optimizations to how r10k fetches and updates Git repos, but their
architecture was flawed and people weren't widely using them. The new
Git classes don't use these classes so we can just remove the classes
and use the string refs instead.
This method is common to all Git repositories; it can be shifted to the
Git base class.
OSX sets the TMPDIR path to `/var/folders/[randomstring]`, but `/var` is
a symlink to `/private/var`, and it looks like Git will dereference
paths given via `git clone --reference [filepath]`. To make sure the
specs pass on any platform we expand the fixture path as well.
Before this commit, the Git environment and module had moved most of
their behavior to the backing repo class, but did that delegation by
calling the delegated method. A number of other places were using
`def_delegators` to provide this behavior instead. While it's a small
thing, the environment and module classes were inconsistent.

This commit updates the Git environment and module classes to use
`def_delegator` for methods that are purely delegated to the backing
repo class.
andersonmills added a commit that referenced this pull request Feb 23, 2015
…it-implementation

REVIEW: Refactor Git implementation, extract platform specific code
@andersonmills andersonmills merged commit 06d43c7 into puppetlabs:master Feb 23, 2015
@adrienthebo adrienthebo deleted the issue/master/rk-16-extract-git-implementation branch February 23, 2015 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants