REVIEW: Refactor Git implementation, extract platform specific code #308

adrienthebo · 2015-02-17T23:06:36Z

This pull request refactors the code interacting with Git. It pulls out the code that directly interacts with the Git binary (by shelling out, running commands, parsing the output, etc) into a separate set of classes.

These classes directly interact with Git:

BareRepository
WorkingRepository
ThinRepository (a WorkingRepository that uses Git alternates to perform shallow clones)

These classes wrap interactions with Git but aren't bound to the Git implementation:

Cache (wraps a BareRepository and handles updating that repository when necessary)
StatefulRepository (wraps a ThinRepository and acts to synchronize a Git repository with some desired state for that repository)

The goal of this refactor is to break up the low level logic that interacts with the Git binary into a distinct layer that can be swapped out, and adds a logic layer that actually provides the application logic to run everything.

andersonmills · 2015-02-19T23:40:13Z

I'm currently getting two failures in the spec tests.

Failures:

  1) R10K::Git::ThinRepository cloning adds the cache repo to the alternates file
     Failure/Error: expect(subject.alternates.to_a).to eq [File.join(cacherepo.git_dir, 'objects')]

       expected: ["/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-b5gsi/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150219-41431-llg190-puppet-boolean.git/objects"]
            got: ["/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-b5gsi/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150219-41431-llg190-puppet-boolean.git/objects"]

       (compared using ==)
     # ./spec/integration/git/thin_repository_spec.rb:49:in `block (3 levels) in <top (required)>'

  2) R10K::Git::WorkingRepository cloning with a reference repository adds the reference repository to the alternates directory
     Failure/Error: expect(subject.alternates.to_a).to eq [File.join(remote, 'objects')]

       expected: ["/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-t7lsit/puppet-boolean.git/objects"]
            got: ["/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150219-41431-t7lsit/puppet-boolean.git/objects"]

       (compared using ==)
     # ./spec/integration/git/working_repository_spec.rb:72:in `block (4 levels) in <top (required)>'

andersonmills · 2015-02-19T23:54:01Z

Note: fixed with this

$ env | grep -i tmp
TMPDIR=/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/
$ export TMPDIR=/private${TMPDIR}
echo $TMPDIR
/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/
$ be rspec spec

adrienthebo · 2015-02-20T00:29:10Z

The spec failures in travis are due to timing sensitive tests and were probably caused by the timing change on IO pumps.

adrienthebo · 2015-02-20T18:41:27Z

IO pump timing moved to #315

adrienthebo · 2015-02-20T18:50:13Z

@andersonmills I added 3a87e22 to dereference any symlinks in the fixture path, could you see if that makes the specs pass on your system?

andersonmills · 2015-02-20T21:52:00Z

Still getting one failure.

 1) R10K::Git::ThinRepository cloning adds the cache repo to the alternates file
     Failure/Error: expect(subject.alternates.to_a).to eq [File.join(cacherepo.git_dir, 'objects')]

       expected: ["/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150220-43886-gcgkym/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150220-43886-1k3ss3n-puppet-boolean.git/objects"]
            got: ["/private/var/folders/sv/m4gkcsvd4sl5h895_xvvftbh0000gq/T/d20150220-43886-gcgkym/-var-folders-sv-m4gkcsvd4sl5h895_xvvftbh0000gq-T-d20150220-43886-1k3ss3n-puppet-boolean.git/objects"]

       (compared using ==)
     # ./spec/integration/git/thin_repository_spec.rb:49:in `block (3 levels) in <top (required)>'

adrienthebo · 2015-02-20T21:59:39Z

I added the File.realpath to the failing spec as well; hopefully that'll resolve that issue too.

andersonmills · 2015-02-20T22:03:40Z

Yep. That fixed it.

andersonmills · 2015-02-21T23:20:07Z

lib/r10k/module/git.rb

-    end
-
-    return :insync
+    @repo.status
  end



Any reason you didn't use Forwardable and def_delegators for sync and status as in cache.rb above?

andersonmills · 2015-02-22T00:06:57Z

These two unimportant comments would not keep my from merging, and I'm 👍 . I'll wait to do this on Monday morning after talking with you, though.

This commit adds fixture data and rspec helpers for performing integration tests touching Git repositories. Trying to stub out every single git command touching a git repository would be more work than just running the commands and checking the result, so this should simplify testing. Moreover this allows different Git implementations to share the same tests to confirm they have the same behavior.

This commit adds a generic implementation of a bare Git repository. The existing code that deals with bare repositories is tightly coupled with how to cache Git repositories and is hard to test it, so this class was extracted to just deal with how Git bare repositories behave.

The previous implementation of the Git caching code intermixed caching logic and Git commands which made things hard to test and reason about. This commit removes all of the Git implementation logic from the cache class and reuses the BareRepository class, so the Cache only has to deal with, well, caching.

Pathname objects are significantly more convenient to use for dealing with files and file paths and I've been replacing String path members with Pathname members. This commit updates the Alternates object to use a Pathname instead of a string object as well.

The WorkingDir class implemented interactions with non-bare git repositories, but it blended handling a non-bare repository, performing sparse clones, and enforcing the state of a given repository. This commit implements a class for just interacting with a non-bare repository; the concerns of sparse clones and state enforcement will be handled by separate classes.

This class extends a working repository to add sparse clones and updates. All clone and fetch operations are done by a cached git repository and the actual working repository simply reuses the cache repo object database.

This adds the #resolve method to bare repositories to resolve git refs to actual commits.

This adds the #ref_type method that takes a gif ref and will return the type of the ref.

The remote of a bare repository is the property of an existing repo and is not an intrinsic part of the object initialization, so it cannot be supplied to the constructor.

This was another location where file paths were being treated as Strings instead of Pathnames; converted this to a Pathname as well for consistency.

This adds the #git_dir method to the WorkingRepository object, so that ThinRepository, WorkingRepository, and BareRepository all implement the same interface.

The BareRepository and WorkingRepository classes both need to run git commands but the initial versions of these classes just copied the old `#git` method. This commit extracts a base class and adds the #git method to that class.

This commit removes the duplicate #tags implementation from the BareRepository and WorkingRepository classes and moves it to the base class.

This commit removes the duplicate #resolve implementation from the BareRepository and WorkingRepository classes and moves it to the base class.

This implements an abstract `#git_dir` method in the BaseRepository to indicate that subclasses should implement it.

This commit removes the duplicate #branches implementation from the BareRepository and WorkingRepository classes and moves it to the base class.

The #origin method may need to be invoked before the repository is initialized, so if the git command to retrieve the origin remote url fails then just return nil.

Enumerating tags and branches requires invoking `git for-each-ref` with slightly different args; this centralizes the shared behavior.

WorkingRepository instances could clone a repository and check out an initial ref, but `git clone -b` can only take a branch or tag and not a commit. In order to resolve this consistency and make branches/tags/commit be equivalent this commit uses #checkout which treats the three as equivalent.

The Cache object may be used as if it is a git repository itself; to facilitate that we just forward the #resolve method to the backing bare repository.

This class controls how Git repositories are created and updated. Class instances take a desired Git remote and reference, compare those states to the repository on disk, and can reconcile the two. It takes advantage of cached Git repositories to deduplicate Git objects and speed up operations.

Git branches are a bit strange in comparison to Git commit SHAs and tags because they are mutable. If we've been given a Git ref that's a branch, we have to make sure that the branch itself has been fetched so that we actually track upstream changes to the branch.

If the desired ref cannot be resolved to a SHA, we cannot synchronize the repository and must fail immediately.

The R10K::Environment::Git and R10K::Module::Git classes shared a lot of behavior on how repositories are synchronized that have been subsumed into the StatefulRepository class. We can replace the duplicate code with calls into the stateful repo instead.

The R10K::Git::Ref classes and subclasses were an attempt to add some optimizations to how r10k fetches and updates Git repos, but their architecture was flawed and people weren't widely using them. The new Git classes don't use these classes so we can just remove the classes and use the string refs instead.

This method is common to all Git repositories; it can be shifted to the Git base class.

OSX sets the TMPDIR path to `/var/folders/[randomstring]`, but `/var` is a symlink to `/private/var`, and it looks like Git will dereference paths given via `git clone --reference [filepath]`. To make sure the specs pass on any platform we expand the fixture path as well.

Before this commit, the Git environment and module had moved most of their behavior to the backing repo class, but did that delegation by calling the delegated method. A number of other places were using `def_delegators` to provide this behavior instead. While it's a small thing, the environment and module classes were inconsistent. This commit updates the Git environment and module classes to use `def_delegator` for methods that are purely delegated to the backing repo class.

…it-implementation REVIEW: Refactor Git implementation, extract platform specific code

andersonmills reviewed Feb 21, 2015
View reviewed changes

adrienthebo added 19 commits February 23, 2015 11:02

[git/thin_repository] (RK-16) Implement cache backed working repo

fcbfdb4

This class extends a working repository to add sparse clones and updates. All clone and fetch operations are done by a cached git repository and the actual working repository simply reuses the cache repo object database.

[git/bare_repository] (RK 16) Add ref resolution

cd0be7f

This adds the #resolve method to bare repositories to resolve git refs to actual commits.

[git/bare_repository] (RK-16) Determine what type a ref is

59e4b9c

This adds the #ref_type method that takes a gif ref and will return the type of the ref.

[git/bare_repository] (RK-16) Remove remote from initialization field

e5c9b25

The remote of a bare repository is the property of an existing repo and is not an intrinsic part of the object initialization, so it cannot be supplied to the constructor.

[git] (RK-16) convert #git_dir to always return Pathname

8881514

This was another location where file paths were being treated as Strings instead of Pathnames; converted this to a Pathname as well for consistency.

[git/working_repository] (RK-16) add #git_dir

751865a

This adds the #git_dir method to the WorkingRepository object, so that ThinRepository, WorkingRepository, and BareRepository all implement the same interface.

[git/base_repository] (RK-16) extract #tags

5bfe825

This commit removes the duplicate #tags implementation from the BareRepository and WorkingRepository classes and moves it to the base class.

[git/base_repository] (RK-16) extract #resolve method

cf303f3

This commit removes the duplicate #resolve implementation from the BareRepository and WorkingRepository classes and moves it to the base class.

[git/base_repository] (RK-16) add abstract #git_dir definition

69ace7e

This implements an abstract `#git_dir` method in the BaseRepository to indicate that subclasses should implement it.

[git/base_repository] (RK-16) centralize #branches method

0054345

This commit removes the duplicate #branches implementation from the BareRepository and WorkingRepository classes and moves it to the base class.

[working_repository] (RK-16) don't raise error on nil origin

61fd788

The #origin method may need to be invoked before the repository is initialized, so if the git command to retrieve the origin remote url fails then just return nil.

[git/base_repository] (RK-16) extract #for_each_ref helper

be0dce7

Enumerating tags and branches requires invoking `git for-each-ref` with slightly different args; this centralizes the shared behavior.

adrienthebo added 12 commits February 23, 2015 11:27

[git/cache] (RK-16) delegate #resolve to bare repo

f573fd2

The Cache object may be used as if it is a git repository itself; to facilitate that we just forward the #resolve method to the backing bare repository.

[git/stateful_repository] (RK-16) forward #head to backing repo

1283ac8

[git/cache] (RK-16) Forward ref_type call to bare repo

a22ed6e

[git/cache] (RK-16) Expose sync status, allow status reset

cfa2aef

[git/stateful_repository] (RK-16) Raise error on unresolvable ref

f659a09

If the desired ref cannot be resolved to a SHA, we cannot synchronize the repository and must fail immediately.

[git/base_repository] (RK-16) Move #ref_type to base repo

68e6284

This method is common to all Git repositories; it can be shifted to the Git base class.

andersonmills added a commit that referenced this pull request Feb 23, 2015

Merge pull request #308 from adrienthebo/issue/master/rk-16-extract-g…

06d43c7

…it-implementation REVIEW: Refactor Git implementation, extract platform specific code

andersonmills merged commit 06d43c7 into puppetlabs:master Feb 23, 2015

adrienthebo deleted the issue/master/rk-16-extract-git-implementation branch February 23, 2015 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REVIEW: Refactor Git implementation, extract platform specific code #308

REVIEW: Refactor Git implementation, extract platform specific code #308

adrienthebo commented Feb 17, 2015

andersonmills commented Feb 19, 2015

andersonmills commented Feb 19, 2015

adrienthebo commented Feb 20, 2015

adrienthebo commented Feb 20, 2015

adrienthebo commented Feb 20, 2015

andersonmills commented Feb 20, 2015

adrienthebo commented Feb 20, 2015

andersonmills commented Feb 20, 2015

andersonmills Feb 21, 2015

andersonmills commented Feb 22, 2015

REVIEW: Refactor Git implementation, extract platform specific code #308

REVIEW: Refactor Git implementation, extract platform specific code #308

Conversation

adrienthebo commented Feb 17, 2015

andersonmills commented Feb 19, 2015

andersonmills commented Feb 19, 2015

adrienthebo commented Feb 20, 2015

adrienthebo commented Feb 20, 2015

adrienthebo commented Feb 20, 2015

andersonmills commented Feb 20, 2015

adrienthebo commented Feb 20, 2015

andersonmills commented Feb 20, 2015

andersonmills Feb 21, 2015

Choose a reason for hiding this comment

andersonmills commented Feb 22, 2015