Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gogitdiff: incremental build tool #21

Merged
merged 12 commits into from
May 12, 2018
Merged

gogitdiff: incremental build tool #21

merged 12 commits into from
May 12, 2018

Conversation

prateek
Copy link
Collaborator

@prateek prateek commented Apr 24, 2018

  • Add a new tool, ggd, to compute packages affected by git changes

NB: don't worry about test coverage, this tool uses e2e integration tests using bash to create a separate gopath/git repos (see build-tools/utilities/ggd/test.sh for more details).

@codecov
Copy link

codecov bot commented Apr 24, 2018

Codecov Report

Merging #21 into master will decrease coverage by 15.22%.
The diff coverage is 7.44%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #21       +/-   ##
===========================================
- Coverage   63.63%   48.41%   -15.23%     
===========================================
  Files           3        4        +1     
  Lines         506      694      +188     
===========================================
+ Hits          322      336       +14     
- Misses        169      342      +173     
- Partials       15       16        +1

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e8fd6b3...07664e2. Read the comment docs.

@prateek prateek changed the title [WIP] gogitdelta gogitdiff: incremental build tool Apr 24, 2018
@robskillington
Copy link

Nice.

There seems to be a binary file committed btw currently: utilities/gogitdelta/gogitdelta

for _, imp := range imports {
impPath := imp.Path()
if strings.HasSuffix(impPath, "_test") {
println(fmt.Sprintf(">> %+v", imp))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dafuq

@prateek prateek force-pushed the prateek/gogitdelta branch 3 times, most recently from 26f78a2 to e22a67b Compare April 25, 2018 05:39
@prateek
Copy link
Collaborator Author

prateek commented Apr 25, 2018

@jeromefroe ended up using your dag calculation code from #17, it had minor issues but handled the x_test better than https://github.com/kisielk/gotool.

misc cleanup
Copy link
Contributor

@jeromefroe jeromefroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, just had a few nits and questions

ggd - GoGitDiff
===============

This is a tool to compute the packages affected by the difference between two Git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I found this first sentence kind of confusing. What do you think about simplifying to something like the following:

`ggd` is a tool to compute which packages have changed between two Git revisions and their respective dependencies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

`ggd` is a tool to compute Go packages affected by changes between two Git revisions. It  does so by identifying the Go packages changed between two Git revisions, and computing all other packages which depend upon them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


visited := make(map[string]string)

// ensure we add the pkg to the graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good catch, yea definitely need to add the package here, as the steps below will only add the things which import it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for tests :)

if !ok {
// It's okay for Import to return an error as not all packages that can be found in
// a package will necessarily be present. For example, packages imported only by test
// files in vendored packages will not be installed. In the case of an error, Import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (self-nit?): "In the case of an error, Import always returns a non-nil *Package. In the case of an error it will only contain partial information." -> "In the case of an error, Import always returns a non-nil *Package that contains partial information."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

edges[to] = struct{}{}
}

// Closure returns all the transitive closure of all packages reachable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/returns all the/returns the

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return res, nil
}

// CwgIsDirty returns whether the current directory is dirty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: CwgIsDirty -> CWDIsDirty ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

debug("changed packages: %v", packageChanges)

baseFullChangeSet := make(lib.ImportSet)
if !*includePrechanges {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention here? Why compute the graph and changed packages with the base reference?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original thought was we'd need to consider both DAGs to answer the question of all affected changes but I'm not convinced that's right anymore. I need to draw some graphs to convince myself before I would be willing to delete the code. Will put down a TODO for now,

debug("affected packages (including transitive changes): %v", baseFullChangeSet)
}

currentSha1, err := sha1Fn()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to checkout the current SHA again before we return?

also, super nit: currentSha1 -> currentSHA1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean, that we should retry or ensure we're on the right SHA1.

Re: retry - why? it'll only fail when the directory is dirty, at which point, retries buy you nothing.

Re: the right sha1 - we do that.

if _, ok := visited[node]; ok {
return
}
visited[node] = struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we can unconditionally add the package. For example, if we delete a package in our branch then it will be a changed package even though it no longer exists and go test will complain that it cant find it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. Lemme add a test and ensure we handle the situation correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added testcase7 for the case you're describing, here's the mini-diff: f5b1fa0

===============

This is a tool to compute the packages affected by the difference between two Git
revisions. It's primary goal is to help speed up CI jobs by figuring out the packages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It was deSigned to speed up C.I jobs by determining which packages depend on packages that were changed in the target branch vs. the base branch" or something like that.

Also would be cool to link to Digital Ocean's blog post on the topic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: nevermind you did link to it

affected by git changes, and transitively looking through the Go Imports. It takes
inspiration from `gta`, and the work done at [DigitalOcean].

NB: debug mode (`-d`) allows users to inspect how the tool arrives at the decisions it does.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"allows users to inspect the tool's decision making process"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"golang.org/x/tools/go/buildutil"
)

// ImportGraph is a map from a package -> all the packages that import it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: -> a set of all the packages that import (depend?) on it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded.

)

// ImportGraph is a map from a package -> all the packages that import it.
type ImportGraph map[string]map[string]struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the ImportSet as the value for this map?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func (g ImportGraph) maybeAddEdge(
ctx *build.Context, buildPkg *build.Package, visited map[string]string, pkg, path string,
) {
if path == "C" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, what is this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//
// also filters any `vendor/` directory paths
func filterChanges(input string) bool {
if strings.Contains(input, "/vendor/") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why filter the vendor? Can't we use the DAG to determine exactly what packages depend on them if we strip the vendor prefix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, will make it optional.

I'm thinking of the case for the repos where we include vendor (internally) v the ones we don't (OSS/GH). To your point, it'll be helpful for the former, don't want to include it in the computation for the latter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func (g ImportGraph) Closure(paths ...string) (ImportSet, error) {
// add all the paths we're starting at, as they are by definition
// reachable.
for _, p := range paths {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of including the logic to determine if a package was deleted so we can add all packages here would it make sense to instead only add the package if it exists in the graph since otherwise we know it was deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"only add the package if it exists in the graph since otherwise we know it was deleted"

I actually tried to do this the first. Unfortunately, it doesn't work in certain cases.

Can repro yourself, steps:

$ cd $GOPATH/src/github.com/m3db/build-tools
$ git checkout prateek/ggd/demo-testdata-issue
$ cd ./utilities/ggd
$ ./test.sh
# testcase6 will fail
# can see output for the run in ./utilities/ggd/bin/testcase6.out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that seems to fail because we unconditionally panic if we can't find a package, but what if we first strip the root packages which don't appear in the DAG before walking the graph? I'm thinking something like the following:

func (g ImportGraph) Closure(roots ...string) ImportSet {
	closure := make(ImportSet)
	for _, r := range roots {
                if _, ok := g[r]; !ok {
                         log.Debugf("root package '%s' is not in the import graph, it has been removed", r)
                }
		g.walk(r, closure)
	}
	return closure
}

func (g ImportGraph) walk(node string, visited ImportSet) {
	if _, ok := visited[node]; ok {
		return
	}
	if _, ok := g[node]; !ok {
		panic(fmt.Sprintf("node (%s) doesn't exist in the graph", node))
	}
	visited[node] = struct{}{}
	for to := range g[node] {
		g.walk(to, visited)
	}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked offline: if we detect changes in */testdata/*, run everything; otherwise - do ^^ some version of what Jerome's suggesting.

Copy link
Collaborator Author

@prateek prateek Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. There's one more edge case we didn't talk about: a package that's not imported by any other packages, and gets deleted (testcase7). It isn't present the import graph either. I throw an error for this case and run tests on all non-filtered packages.

//
// also filters any `vendor/` directory paths
func filterChanges(input string, filterDirPatterns []string) bool {
for _, ptrn := range filterDirPatterns {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this mean we won't run tests if we just change vendor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an option passed in from the CLI, so up to the user's discretion.

We'd use it in the following modes:
(a) CI in our internal repo which has vendor/ checked in, we would not filter vendor
(b) CI in our OSS repos (where we do not checkin vendor/), we would filter vendor/`

Local development for both would mirror the choices made in the CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we then be relying on local CI to verify changes to vendor? I'm kinda of the opinion that if you don't vendor you're dependencies in your repo you need to run tests in CI each time when you pull them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked offline: we're going to the change the default to include vendor/. Standard usage for us is going to be run all tests on master, and if we see any glide.yaml/lock changes; otherwise rely on ggd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}
if _, ok := g[node]; !ok {
// NB(prateek): this happens in the case of un-used deleted packages. Look at testcase7.
return DanglingDeleteError{node}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return an error here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to indicate to main this happened so we can act on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I guess my question is why do we need to act on it? At least, it doesn't seem like an error to me if I remove a package that wasn't imported.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, but I'm being paranoid and rerun the entire test suite if this situation occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, maybe worth just calling that out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

defaultCatchallPatterns = []string{
"/testdata/", // go list ./... is unable to gauge impact of changes to this directory
}
defaultFilterPatterns = []string{"/vendor/"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default for this be empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Going to stick with this as the default because it's typically what people want when using the tool by hand. I prefer to optimise for that over what the CI invocation for this will be.

go test does this too so it's not unexpected behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but if working locally and I update a vendored dependency wouldn't it be preferable to test the packages which depend on the changed vendor package?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But in that case, you've updated glide.yaml/glide.lock (added to catchall patterns) and we'd run all tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, they can alternatively run with ggd -f='' and get the same behaviour (w/o using glide changes as catch all)

func ChangedPackages(changedFiles []string) []string {
changedPkgsMap := make(map[string]struct{}, len(changedFiles))
for _, f := range changedFiles {
dir := filepath.Dir(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to take the whole directory path? Was thinking we might need to potentially remove GOPATH if it's in the prefix so we can ensure we have the correct import path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changed files are relative paths already. Will add a comment to indicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so then the packages in the import statements are relative as well? For example, if I do a git diff in statsdex the paths of files which have changed will be relative to the base of the repo (e.g. services/collector/etc), but if I were to import one of those packages they would use the full Git path (e.g. <hostname>/infra/statsdex/services/collector/etc).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I were to import one of those packages they would use the full Git path (e.g. /infra/statsdex/services/collector/etc

I don't follow. Go imports are always GOPATH or repo/vendor/ relative, what situation are they not going to be?

Copy link
Contributor

@jeromefroe jeromefroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prateek
Copy link
Collaborator Author

prateek commented May 12, 2018

Boo yah! Many thanks for the reviews @jeromefroe!

@prateek prateek merged commit ec41198 into master May 12, 2018
@prateek prateek deleted the prateek/gogitdelta branch May 12, 2018 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants