New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding in-process tests #525

Merged
merged 18 commits into from May 9, 2017

Conversation

Projects
None yet
3 participants
@jmank88
Copy link
Collaborator

jmank88 commented May 6, 2017

Fixes #510

This change adds in-process tests. Most of the churn is around refactoring logging away from global functions and state, and replacing various os package references (Stdout,StdErr,Getenv,Getwd,etc.).

@googlebot

This comment has been minimized.

Copy link
Collaborator

googlebot commented May 6, 2017

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@googlebot googlebot added the cla: no label May 6, 2017

}

// LogDepfln logs a formatted line, prefixed with `dep: `.
func (l *Logger) LogDepfln(format string, args ...interface{}) {

This comment has been minimized.

@jmank88

jmank88 May 6, 2017

Collaborator

This method could use a better name. Or we could just inline the format adjustments and call Logf. There are 13 usages.

This comment has been minimized.

@jmank88

jmank88 May 6, 2017

Collaborator

This minimal logger is racy, so I'm going to try dissolving this logger package completely and using the std lib log.Logger instead, so this helper will move or be in-lined.

@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 6, 2017

bump googlebot: I signed it!

@googlebot

This comment has been minimized.

Copy link
Collaborator

googlebot commented May 6, 2017

CLAs look good, thanks!

@googlebot googlebot added cla: yes and removed cla: no labels May 6, 2017

jmank88 added some commits May 6, 2017

@sdboyer
Copy link
Member

sdboyer left a comment

just some quick comments while you're on it

testProj.CompareImportPaths()
testCase.CompareVendorPaths(testProj.GetVendorPaths())
t.Run("-external", testIntegration(testName, wd, execCmd))
t.Run("-internal", testIntegration(testName, wd, runMain))

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

no dash needed, the system automatically separates with a slash

t.Fatal("rm with one arg not in manifest should have failed")
}
t.Run(testName+"-external", removeErrors(testName, wd, execCmd))
t.Run(testName+"-internal", removeErrors(testName, wd, runMain))

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

ibid - no need for dash

o string
g *graphviz
p *dep.Project
logger *log.Logger

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

I think it's probably preferable to have this remain as an io.Writer, then map that to the *log.Logger in some intermediate step.

This comment has been minimized.

@jmank88

jmank88 May 6, 2017

Collaborator

Yeah I ended up having to do that for the tabwriter anyways.

@@ -216,7 +218,7 @@ func (sm *SourceMgr) HandleSignals(sigch chan os.Signal) {

opc := sm.suprvsr.count()
if opc > 0 {
fmt.Printf("Signal received: waiting for %v ops to complete...\n", opc)
sm.logf("Signal received: waiting for %v ops to complete...\n", opc)

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

sorry, this one has to stay as-is. yes, it's obnoxious to print directly in this way, but the interface here is carefully designed; we're not injecting a logger for this one, narrow case. if it's that bad, we'll just delete this logging output entirely.

This comment has been minimized.

@jmank88

jmank88 May 6, 2017

Collaborator

It would clean up a lot to remove it, but how would it affect the user experience?
On the other hand, is there a chance that leaving it to log as before could result in arbitrary test failures?

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

No, there's no chance of it bunking up tests; the only possible way for it to occur is if a signal is sent. It's a very specific case, which is why it exists as it does in the first place.

The proper solution here involves moving direct signal handling out of gps, and into the tool-level code - not changing gps' interface for this sole logging statement.

@@ -105,12 +105,9 @@ type SolveParameters struct {
// typical case.
Downgrade bool

// Trace controls whether the solver will generate informative trace output
// as it moves through the solving process.
Trace bool

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

Let's just leave everything underneath gps/ out of this refactor, unless it absolutely must change.

This comment has been minimized.

@jmank88

jmank88 May 6, 2017

Collaborator

I can put this back, but it's not doing anything, and I suspect it will confuse others in the future as it did me. (Maybe a separate issue if it's just clouding the water on this PR)

If it was set to true, then there was a validation check for a non-nil TraceLogger (even though nil checking the logger is the actual mechanism that triggers the trace logging internally).

If it was set to false, then it wasn't used. So you could pass a non-nil logger, and it would be passed along and tracing will take place, in spite of the false trace value.

So in it's current state it really amounts to "double check that TraceLogger is non-nil", but the only usages are setting a constant true value directly adjacent a call to set TraceLogger non-nil.

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

Yep, I know that it looks like it amounts to a double non-nil check, and it may actually be that (I vaguely recall, from when I wrote it a year ago, there being an actual reason for having the extra bool). But that needs its own PR - as you said, the issue is about muddying the water.

@@ -35,8 +35,6 @@ func TestIntegration(t *testing.T) {
parse := strings.Split(path, string(filepath.Separator))
testName := strings.Join(parse[2:len(parse)-1], "/")
t.Run(testName, func(t *testing.T) {
t.Parallel()

This comment has been minimized.

@sdboyer

sdboyer May 6, 2017

Member

It's a major speedup to run these tests in parallel; there'd need to be a justification for dropping back to running them in serial.

This comment has been minimized.

@jmank88

jmank88 May 6, 2017

Collaborator

Yeah this was an attempt to mitigate the race popping up on the integration test buffers. Definitely trying to keep this in.

jmank88 added some commits May 6, 2017

@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 6, 2017

There are still currently race conditions blocking re-parallelization of the integration tests. It appears to be orphaned goroutines continuing to log after the initial call has returned, which explains why the external-process tests are not affected. It may be possible to quick-fix by swapping out the logger when returning, so any lingering logging will go to ioutil.Discard, but really each test should be waiting for all spawned goroutines to complete.

@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 6, 2017

Sometimes, but not always, the osx tests fail: https://travis-ci.org/golang/dep/jobs/229474199
Hopefully this is a race to set working dir or an env var that can be fixed, but I've not been able to reproduce it (and don't have OSX available).

@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 8, 2017

Resolved the race conditions with a few sync.WaitGroups. There are still some remaining background goroutines that aren't synced up before completion (e.g. lots of SyncSourceFor called downstream from Solve), but they don't log so they aren't racing for the stderr buffer.

Resolved the OSX issue as well - the existing hack needed only be applied to the external process tests.

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented May 8, 2017

awesome, thanks for keeping at this! 🎉 🎉 i'm working through a review. meantime, to this:

(e.g. lots of SyncSourceFor called downstream from Solve)

If gps' internals are working correctly, then these should all be cleaned up/terminated by calling sourceMgr.Release(), though the current implementation may not wait until all of the outstanding calls terminate before it returns. (If it does not, it's a bug.)

A note - the current philosophy of test design is, when the SUT is not the SourceManager itself, but some functionality that relies on it, to treat the underlying sourceMgr as safe for [heavily, aggressively] concurrent reuse across tests. This is partly for performance reasons, but also a way of casting a broad net to catch possible concurrency bugs in sourceMgr. We have tests that attempt to do this directly, as well, but the possibility space here is large, and to the extent that all of our other tests represent more realistic workloads (and they often do), they add quite a bit of value.

@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 8, 2017

I don't think that what you are describing is fundamentally a problem. I was primarily concerned with reliance on the process exiting to terminate background goroutines (either by design, or from a race). I think that some of these calls, like firing off an un-monitored go s.b.SyncSourceFor(dep.Ident) or go sm.DeduceProjectRoot(arg) to pre-fetch, are technically racy since nothing waits for them to complete. They are likely no problem under normal execution (the critical path has more work), but these parallel tests are ripe for irregular scheduling scenarios, so if we have issues in the future, I'd look to these first.

@sdboyer
Copy link
Member

sdboyer left a comment

this is looking pretty good - still chewing on it a little, but here are some comments while i do 😄

@@ -104,6 +110,8 @@ func testWriteDepTree(t *testing.T) {
if _, err = os.Stat(filepath.Join(tmp, "bitbucket.org", "sdboyer", "withbm")); err != nil {
t.Errorf("Directory for bitbucket.org/sdboyer/withbm does not exist")
}

wg.Wait()

This comment has been minimized.

@sdboyer

sdboyer May 9, 2017

Member

Mostly for my curiosity - was this causing an actual problem, or is the change here prophylactic?

The implementation of WriteDepTree() should entail that all of the SyncSourceFor() goroutines finish before WriteDepTree() does. It's possible that they wouldn't if one of the items encountered an error (and it's generally not great test design to rely on internal details of the SUT to resolve concurrency issues established by the test), but I'd definitely like to know if that entailment doesn't hold.

This comment has been minimized.

@jmank88

jmank88 May 9, 2017

Collaborator

Only the logging goroutines were causing actual race failures. I fixed this one the easy way when I started chasing these down, but I will revert it.

// this way we get the default GOPATH that was added in 1.8
buildContext := build.Default
wd, err := os.Getwd()
func NewContext(wd string, env []string) (*Ctx, error) {

This comment has been minimized.

@sdboyer

sdboyer May 9, 2017

Member

I'm glad you went to injecting this without me having to suggest it 😄 You went further than I'd envisioned by passing in a whole env []string; I was imagining just GOPATH string. But that's great - it affords us flexibility for the future, in the event that we have more env vars we want to incorporate into dep.Ctx. Well done 😄 🎉

This comment has been minimized.

@jmank88

jmank88 May 9, 2017

Collaborator

Then my unfamiliarity with the code base worked to my advantage, because I started by targeting general parity with the exec.Cmd API in use since I didn't know the minimum input requirements 😆


func testIntegration(name, wd string, externalProc bool, run test.RunFunc) func(t *testing.T) {
return func(t *testing.T) {
// Set up environment

This comment has been minimized.

@sdboyer

sdboyer May 9, 2017

Member

Should probably also t.Parallel() here.

@@ -436,43 +437,43 @@ fail:
return failerr
}

func (sw *SafeWriter) PrintPreparedActions() error {
func (sw *SafeWriter) PrintPreparedActions(stdout *log.Logger) error {

This comment has been minimized.

@sdboyer

sdboyer May 9, 2017

Member

nit: let's name the local var output instead of stdout. Even if we always (currently) end up attaching stdout here, I don't think we need to suggest that via the var name.

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented May 9, 2017

I think that some of these calls, like firing off an un-monitored go s.b.SyncSourceFor(dep.Ident) or go sm.DeduceProjectRoot(arg) to pre-fetch, are technically racy since nothing waits for them to complete.

Right, this is what I was trying to address with the comment about sm.Release() - that call should wait for all them to complete (but currently may not, and if it doesn't it's a bug). It also does more than just wait - if there are any outstanding calls, it should also send a cancelation signal. I'm sure it does that much.

So, as long as the source manager is released, we shouldn't be relying on process termination to end errant goroutines.

Are we saying the same thing?

@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 9, 2017

Are we saying the same thing?

Yes I think so, we were both just a bit ambiguous. I expect these tests to be fine, since no remaining cases fail the race detector, and none resemble the fixed cases. The Release process as it stands can short-circuit background work (so it returns an error), but there is no explicit mechanism to block the routine requesting the release all the way until that return takes place. doRelease blocks on sm.suprvsr.wait(), which does book keeping and cancels/waits for calls launched through suprvsr.do (and e.g. go SyncSourceFor calls sg.suprvsr.do internally), but the doRelease caller will be unblocked after those internal dos and before SynceSourceFor has fully completed. However, I don't see any internal state problems, since the remaining work would only be relevant if something was waiting for the returned value.

So basically I think that this PR and general program correctness should be fine. I only suspect technical incorrectness by definition, due to the orphaned top-level routines (w/o side affects. Of course the ones that were logging as well were a problem). Perhaps a separate issue to audit these is in order - some usages look like good candidates for translation into a simpler fork-join model, and others will be removed soon (e.g. as part of the remove command).

jmank88 added some commits May 9, 2017

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented May 9, 2017

doRelease blocks on sm.suprvsr.wait(), which does book keeping and cancels/waits for calls launched through suprvsr.do (and e.g. go SyncSourceFor calls sg.suprvsr.do internally)...

Ah good, I did do this when I did the latest big source manager refactor; I couldn't remember if I did or not.

some usages look like good candidates for translation into a simpler fork-join model

Ultimately, it's important to keep in mind that these initial goroutines are only spawned as helpers, potentially achieving some up-front parallelism gains on work we happen to know the SourceManager will need to do. I'd be open to an argument for pushing these down into gps directly, either into the solver and/or a new SourceManager method.

That said, what are you envisioning, and what value do you think it would add?

I only suspect technical incorrectness by definition, due to the orphaned top-level routines

Ehh...I mean, yeah, if the rule is that "logic that starts a goroutine is directly responsible for ensuring its appropriate termination." Not a bad rule, though idk if it rises to the level of correctness - it seems like mostly a question of whether it's good design to delegate that responsibility to guarantees provided by SourceManager.Release(). Which...

...but the doRelease caller will be unblocked after those internal dos and before SynceSourceFor has fully completed. However, I don't see any internal state problems, since the remaining work would only be relevant if something was waiting for the returned value.

The old system, prior to the refactor, kept an atomic counter of the number of running processes, which it incremented on ingress into one of the top-level SourceManager methods, and decremented only immediately prior to returning. I opted to remove that in favor of this internal checking, with the goal of adding the cancelation semantics. The OK-ness of that change was generally built on the assumption that any operations that occur between the supervised action and are safe to lose.

There's logic that, ostensibly, runs counter to this. In three places, we update the cache after the supervised segment ends: in getManifestAndLock(), listPackages(), and revisionPresentIn() (which actually has a totally unsupervised call into the src layer, ugh - that does need to be fixed).

This is clearly not a problem now, as the "cache" is memory-only and ephemeral - but we do plan to move to a persistent caching layer - #431. When that change happens, we could have a situation where the SourceManager releases while writing to disk. The Release() call will return, and the caller will presumably proceed to process termination immediately - but the goroutine attempting to update the cache will still be running. Arguably, this would be a reason for restoring the outermost-boundary call counters and blocking in Release() until they all clear. However, my initial tendency is to follow the principles of crash-only design, and place our focus instead on (correctly) using a (reasonably correct) transactional persistence lib, rather than trying to handle just this one relatively narrow case of graceful shutdown.

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented May 9, 2017

oh, also, i think we're good to go here 😄

@sdboyer sdboyer merged commit a919470 into golang:master May 9, 2017

4 checks passed

cla/google All necessary CLAs are signed
codeclimate no new or fixed issues
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 9, 2017

"logic that starts a goroutine is directly responsible for ensuring its appropriate termination."

That's slightly stricter than I was thinking (in that I'm not worried about ownership, so much as any eventual sync/join back into the main gouroutine), but I was flipping it around and observing:
"Any goroutine that doesn't join/sync back with the main routine directly or indirectly either (1) has observable side affects and thus is a race condition, or (2) doesn't have observable side affects, so why execute it at all?".

If nothing is waiting for the returned values from these routines, then why is there work queued or a stack to unwind and value to return in the first place? Practically speaking, likely because the method was already written and available, but can we use a similar method that doesn't queue up any extra work, and only requests what is necessary? Or maybe the async routine can be spawned internally, rather than at the current level. (Hopefully these approaches wouldn't require a messy trade-off like a bunch of parallel helper methods, but I haven't explored)

In any case, some of the current usages seem brittle, since you only have to try and log the error returned from go SyncSourceFor... to introduce the same sort of races fixed here. Maybe I'm over thinking it, and this is something that a few blank identifiers and comments can sufficiently document, or we can just trust the race detector.

@jmank88 jmank88 deleted the jmank88:in_process_tests branch May 9, 2017

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented May 9, 2017

(1) has observable side affects and thus is a race condition

I'm not sure this necessarily follows; if the effect is idempotent (as it is in this case), then there's no race. Though perhaps it doesn't apply here simply because the goroutine is indirectly joined back with the main goroutine.

then why is there work queued or a stack to unwind and value to return in the first place?

To initiate work in parallel that would have been serially initiated later anyway (by the solver), with the understanding that the later serial call will pick up the same error as was unwound and returned but discarded here.

Practically speaking, likely because the method was already written and available, but can we use a similar method that doesn't queue up any extra work, and only requests what is necessary?

If we're comparing to another hypothetical method that's still invoked in its own goroutine, but has no return value, then the only additional work that's being queued up is returning an error value through one, maybe two stack frames.

Or maybe the async routine can be spawned internally, rather than at the current level.

They could, yes - such a thing is what I was referring to with "I'd be open to an argument for pushing these down into gps directly, either into the solver and/or a new SourceManager method."

Internally, the mechanism would still be pretty much the same: initiate a bunch of goroutines to do the sync work in the background so that that work is done perhaps a bit sooner (wall time) by the time the main serial thread needs that data. And it would still be implicitly supervised and managed in the same way via Release(). The difference is that it would all be behind the SourceManager interface boundary, rather than crossing it. That's the bit of coupling here that I do agree is uncomfortable.

A possible signature for such a method would be SourceManager.PrefetchLock(gps.Lock).

Maybe I'm over thinking it

I think so 😄

@jmank88

This comment has been minimized.

Copy link
Collaborator

jmank88 commented May 9, 2017

if the effect is idempotent (as it is in this case), then there's no race.

I would consider idempotent actions as not-observable by definition in this case, since the observer can't distinguish them.

To initiate work in parallel that would have been serially initiated later anyway (by the solver)

If we're comparing to another hypothetical method that's still invoked in its own goroutine, but has no return value, then the only additional work that's being queued up is returning an error value through one, maybe two stack frames.

Yes, I'm referring strictly to the work done on the orphaned goroutines after doRelease is unblocked. Sure, in some cases this is as simple and trivial as preparing and returning a variable to a routine that immediately finishes (perhaps with a few calls on the stack to unwind first), but other cases looked more complicated (like some locks and async calls in order to prep the return value) but I didn't dissect any further.

The larger point is just that they are doing more work than they need to, which strikes me as analogous to something like leaving an unused variable hanging around - it's just something to clean up that has no effect, but otherwise can be semantically distracting or misleading.

@sdboyer

This comment has been minimized.

Copy link
Member

sdboyer commented May 9, 2017

I'm happy to look at issues/PRs for any particular issues that you identify. My general rubric will be whether the more tailored calls/logic merit both the added code weight and (if applicable) the expanded API surface area that the user has to choose from.

ibrasho pushed a commit to ibrasho-forks/dep that referenced this pull request May 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment