Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Process death is very quiet #104

Closed
simonmar opened this Issue Dec 20, 2012 · 11 comments

Comments

Projects
None yet
2 participants
Contributor

simonmar commented Dec 20, 2012

When a process dies due to an exception, we don't see anything on stdout or stderr at all. This makes debugging very difficult - typically the result is a deadlock. The remote framework used to issue a SYS message to the logger, which was much more helpful. I've been doing a lot of printf debugging lately that would have been helped by some built-in error reporting.

I know there are plans to move the logger out of the core, which is good, but we should think about what to do here.

Owner

hyperthunk commented Dec 20, 2012

I know there are plans to move the logger out of the core, which is good, but we should think about what to do here.

It sounds like the node controller (and maybe one or two other places) should perform internal logging under some circumstances. Is this something that the backends should have access to as well, or is that a separate concern?

I'm not overly excited about using a Process based logger for this. Is there anything wrong with using something that simply calls putStrLn or uses Debug.Trace in those places? I do realise buffering is out of our hands, but there's little we can do about that anyway. Or should we be looking at System.Log.Logger perhaps?

Owner

hyperthunk commented Dec 20, 2012

Also, come to think of it.... In OTP, you tend to see process deaths logged by supervisors, but ordinarily process death will go unnoticed (unless you're linked to it of course!) - that's fine for something that's running in production for the most part. So perhaps printing process death info to stdio should be something that is configured on a per-node basis, so you can turn it on when you're developing/debugging only.

Contributor

simonmar commented Dec 20, 2012

Optional debug tracing would be fine by me.

Owner

hyperthunk commented Jan 23, 2013

So what I've done is put some traceEvent calls into the node controller. This can be printed to stdio right? Though I've not figured out what flags to put in the cabal test definitions to make it work. -eventlog -Dx doesn't appear to work.....

Contributor

simonmar commented Jan 24, 2013

Yes, traceEvent should work. You need to compile with -eventlog and run with +RTS -v.

If traceEvent is going to be the standard way to do debug logging in distributed-process, then that should probably be documented somewhere...

Owner

hyperthunk commented Jan 24, 2013

If traceEvent is going to be the standard way to do debug logging in distributed-process, then that should probably be documented somewhere...

Well I think it's probably a good option, but that does introduce a lot of other noise that has to be filtered out, so.... What I'm thinking of doing is introducing a trace option for node creation that provides a plug-able API for tracing. This would be fairly simple, disabled by default (to avoid evaluating when we're not in debug mode) and only dependent on -eventlog if traceEvent was the chosen trace function. A choice of pre-built tracers will be available in the node creation/setup API so you can choose an existing one or wire in your own.

You're right of course that this will need to be documented clearly. I'll push these changes later on tonight if I get time to finish them off.

Owner

hyperthunk commented Jan 24, 2013

Right. I haven't wired this into the node controller yet - that requires adding a field for the tracer to the LocalNodeState which is easy enough - but the initial stab at providing either console or eventlog based tracing is comitted here: https://github.com/haskell-distributed/distributed-process/compare/debug-trace.

I'll tidy this up and integrate it tonight (or tomorrow morning) so if we do get a release out soon, it'll be included.

@ghost ghost assigned hyperthunk Jan 25, 2013

Owner

hyperthunk commented Jan 25, 2013

Ok so that branch is a bit more complete now. I will look at documenting it a bit more clearly, and pointing out that it's not a logging feature and shouldn't be used in production, yada yada.

Contributor

simonmar commented Jan 25, 2013

Looks good - how does this interact with the stuff in simplelocalnet that redirects log messages to the master node?

Owner

hyperthunk commented Jan 25, 2013

Looks good -

Cool, I'll roll it into 0.4.1 then.

how does this interact with the stuff in simplelocalnet that redirects log messages to the master node?

That's a good question. The event-log based logging should just work, as based on the 'traceEventIO' documentation (and the GHC 'eventlog' docs too) it should as though writing to the event log is thread safe, with the caveat that on SMP enabled systems you might get some duplication.

If code in (or using) simplelocalnet uses 'say' and you're printing the event log to stdout with @+RTS -v@ then the output is going to interleave. I guess folks will need to write to a binary log and/or use ghc-events or threadscope to alleviate that problem.

In terms of logging to a file, that's a little different. I'm not overly familiar with the GHC runtime's IO subsystem, but the documentation for openFile seems to indicate that opening a handle to a file that is "already in use" will raise an error. Quite how that works isn't clear to me, but if that happens when the slave nodes begin to start up then the catch in defaultTracer will revert to event-log anyway. I think this means that the only sane way to use the debug tracing facility with simplelocalnet is to stick to the event log.

We could add another option to use say for tracing. That would make life easier for simplelocalnet users, but I don't want to add any code to Node.hs for configuring tracing on node start (as that would make the API messy and overly complicated) and especially not after starting - I tried that using the sendCtlMsg primitive and the only way to do it safely is to store the Tracer in an (T)MVar which is going to hurt performance every time we trace something. Plus with the current approach, a regular function in the Process monad can use the trace primitive without having to synchronise with other threads, which seems quite handy.

Owner

hyperthunk commented Jan 25, 2013

Ok, so I've just pushed an update to the debug-trace branch that supports doing tracing to the system logger process if the DISTRIBUTED_PROCESS_TRACE_FILE environment variable is not given (or the operation to open the relevant file handle fails) and the DISTRIBUTED_PROCESS_TRACE_CONSOLE is non-empty, then trace messages are sent transparently to the registered logger process. Otherwise they fall back to traceEventIO which of course will do nothing if event logging isn't enabled.

Take a quick look and let me know if that'll do the trick for simplelocalnet, then I'll merge - all the tests still pass with all three tracing options enabled (or not). I haven't checked the benchmarks to see what the performance implications are - this laptop wouldn't do much for the figures anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment