-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Closed
Labels
Milestone
Description
by alexandre.normand@opower.com:
I'm hoping someone can shed some light on this one but I, unfortunately, can't provide that much details nor an example that could be tried outside of our organisation. Here's the story: I have a small go program that uses encoding/json and net/http to submit jobs to one of our apps. After submission, the program polls that same app (using http/json) until it sees the jobs going to completion (or failure). This has been implemented using channels and go routines (one go routine sees that a job gets submitted and waits for its completion). It gets fancier with the notion of checkpoints that have been implemented by simply waiting for some jobs to complete before starting other jobs (again, via new go routines). At any given time, there are never more than 10 go routines running in parallel (which sleep for 1 minute between each http call). The program runs fine when executed on my MacBook Pro but since it can run for 10+ hours before completion, I've been trying to run that on one of the many Linux servers using {{{nohup}}}. It looks like when starting the process with nohup and logging out of the system, the program never runs to completion. The stdout/stderr is redirected to a file and it looks like the process vanishes after a few hours at most (sometimes after less than 1 hour). I can't really tell what the exit code is either since not having an open session when that happens, I'm not seeing it. To add to the mystery, when keeping a session open on that host running the program and tailing the log file, it seems to _then_ be able to run to completion. I can't rule out something killing that process but it I'm very confident that it's not the case since I'm running other scripts on those machines which have been running for days without interruption. What steps will reproduce the problem? 1. Use a go program that's a mix of http/json/log statements/sleeps/go routines. 2. Start it using {{{nohup program > output.log 2>&1 &}}} 3. Wait for it...and it's gone before it completes. What is the expected output? The program should either go to completion or print a panic. What do you see instead? Nothing, the process just ends abruptly with no trace. This looks like I'm running through some os.Exit() call somewhere in the go packages I'm using but it's not clear what that would be and why. Which compiler are you using (5g, 6g, 8g, gccgo)? 6g Which operating system are you using? Linux 2.6.32-220.7.1.el6.x86_64 Which version are you using? (run 'go version') go1.0.3 Please provide any additional information below. I can provide my code (one main go source and an additional package) but I would have to remove urls and obfuscate some of it which would render it pretty much usable except for visual inspection.