Skip to content

runtime: program re-enables SIGHUP #4491

@gopherbot

Description

@gopherbot

by alexandre.normand@opower.com:

I'm hoping someone can shed some light on this one but I, unfortunately, can't provide
that much details nor an example that could be tried outside of our organisation. 

Here's the story:
I have a small go program that uses encoding/json and net/http to submit jobs to one of
our apps. After submission, the program polls that same app (using http/json) until it
sees the jobs going to completion (or failure). This has been implemented using channels
and go routines (one go routine sees that a job gets submitted and waits for its
completion). It gets fancier with the notion of checkpoints that have been implemented
by simply waiting for some jobs to complete before starting other jobs (again, via new
go routines). At any given time, there are never more than 10 go routines running in
parallel (which sleep for 1 minute between each http call). 

The program runs fine when executed on my MacBook Pro but since it can run for 10+ hours
before completion, I've been trying to run that on one of the many Linux servers using
{{{nohup}}}. It looks like when starting the process with nohup and logging out of the
system, the program never runs to completion. The stdout/stderr is redirected to a file
and it looks like the process vanishes after a few hours at most (sometimes after less
than 1 hour). I can't really tell what the exit code is either since not having an open
session when that happens, I'm not seeing it. 

To add to the mystery, when keeping a session open on that host running the program and
tailing the log file, it seems to _then_ be able to run to completion. I can't rule out
something killing that process but it I'm very confident that it's not the case since
I'm running other scripts on those machines which have been running for days without
interruption.  

What steps will reproduce the problem?
1. Use a go program that's a mix of http/json/log statements/sleeps/go routines.
2. Start it using {{{nohup program > output.log 2>&1 &}}}
3. Wait for it...and it's gone before it completes. 

What is the expected output?
The program should either go to completion or print a panic.

What do you see instead?
Nothing, the process just ends abruptly with no trace. This looks like I'm running
through some os.Exit() call somewhere in the go packages I'm using but it's not clear
what that would be and why. 

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Linux  2.6.32-220.7.1.el6.x86_64

Which version are you using?  (run 'go version')
go1.0.3

Please provide any additional information below.
I can provide my code (one main go source and an additional package) but I would have to
remove urls and obfuscate some of it which would render it pretty much usable except for
visual inspection.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions