Debugging Eucalyptus C language components
Clone this wiki locally
The following information may be of interest to developers working on Eucalyptus components written in C.
Using client binaries
CC and NC can be queried using
NCclient programs, respectively. These programs, located in source directories of the respective component, allow command-line invocation of API functions of the component. Thus,
CCclient_full impersonates CLC and
NCclient impersonates a CC. Before invoking the programs, dynamic library search path must be set to include several Axis2 libraries:
And the path to the root of the Eucalyptus installation -- which is system root
/ for a package-based installation -- must be set (so that the cryptographic credentials can be found).
export EUCALYPTUS=/opt/eucalyptus export AXIS2C_HOME=/opt/eucalyptus/packages/axis2c-1.6.0/ export LD_LIBRARY_PATH=$AXIS2C_HOME/lib:$AXIS2C_HOME/modules/rampart/
(NOTE: for most packaged installs, AXIS2C_HOME will be /usr/lib64/axis2c)
Here is an example invocation of
CCclient_full on a CC host with Eucalyptus source tree in
$EUCALYPTUS_SRC that has been compiled:
$EUCALYPTUS_SRC/cluster/CCclient_full localhost:8774 describeNetworks describenetworks returned status 1 useVlans: 1 mode: MANAGED addrspernet: 32 addrIndexMin: 9 addrIndexMax: 30 vlanMin: 2 vlanMax: 127 found 0 active nets
Here is an example invocation of
NCclient on the same CC host (note the slight change in syntax relative to
CCclient_full: endpoint is specified with
-n option, which defaults to
localhost:8775 if not specified):
grep NODES $EUCALYPTUS/etc/eucalyptus/eucalyptus.conf NODES="192.168.51.165" $EUCALYPTUS_SRC/node/NCclient -n 192.168.51.165:8775 describeResource 2012-10-10 14:20:36 DEBUG 000010036 ncStubCreate | DEBUG: requested URI http://192.168.51.165:8775/axis2/services/EucalyptusNC node status=[OK] memory=7792/7792 disk=2/2 cores=4/4 subnets=[none]
CC and NC can be debugged with gdb, which can be:
- used to analyze a core dump,
- attached to a live Apache process hosting CC or NC,
- used to start CC or NC under a debugger from the very beginning.
Each approach will be discussed in turn.
The commands below assume that
$EUCALYPTUSis set to the root of Eucalyptus installation: typically just
/for package-based installs and often
/opt/eucalyptusfor from-source installations.
Core dumps are useful when a SEGFAULT is difficult to trigger manually, especially on CC, which does a lot of forking. You know your CC or NC is segfaulting when
httpd-[cc|nc]_error_log contains lines similar to:
[Wed Aug 29 14:41:07 2012] [notice] child pid 22520 exit signal Segmentation fault (11) [Wed Aug 29 14:41:13 2012] [notice] child pid 22555 exit signal Segmentation fault (11) [Wed Aug 29 14:41:19 2012] [notice] child pid 22579 exit signal Segmentation fault (11)
To ensure that CC produces a core dump, you'll need to add the following line
echo "CoreDumpDirectory /tmp" >>$EUCALYPTUS/etc/eucalyptus/httpd-cc.conf
at the end of
create_httpd_config() function in
$EUCALYPTUS/etc/init.d/eucalyptus-cc. For NC do the same with 'nc' instead of 'cc' in the paths above. For the changes to take effect, stop the component, increase the core limit (in case it is too low), and start the component again.
$EUCALYPTUS/etc/init.d/eucalyptus-cc stop ulimit -c unlimited $EUCALYPTUS/etc/init.d/eucalyptus-cc start
After that the error in the log should change to:
[Wed Aug 29 15:39:53 2012] [notice] child pid 6926 exit signal Segmentation fault (11), possible coredump in /tmp
/tmp directory should contain the core dump that can be brought up in
gdb /usr/sbin/httpd /tmp/core.9895 .... Core was generated by `/usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf'. Program terminated with signal 11, Segmentation fault. #0 0x00007f73bf7a357c in vfprintf () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install httpd-2.2.15-15.el6.centos.1.x86_64 (gdb)
If there is no coredump (after all, the message only said it was "possible"), you may want to try the method described in section 'Run Eucalyptus component under gdb' below.
Attach gdb to a Eucalyptus component
Attaching to a running instance of the component is often sufficient to examine its memory state or to catch a reproducible SEGFAULT with the debugger attached.
The main difficulty has to do with deciding which process to attach to and how to ensure the debugger follows the forks you want. Component log files
nc.log may reveal to you the PID of the thread of control that you are looking for.
NC is easier to debug as in steady state it only consists of two heavyweight processes: the core of Apache daemon (running as
root) and the Apache deamon with the Eucalyptus shared library loaded (running as
# ps aux | grep eucalyptus/httpd root 22526 0.0 0.0 55168 1452 ? Ss 16:00 0:00 /usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf 500 22528 0.2 1.4 2105452 114548 ? Sl 16:00 0:00 /usr/sbin/httpd -f /opt/eucalyptus/etc/eucalyptus/httpd-nc.conf
gdb to the latter will allow one to pause its execution, possibly set a breakpoint or inspect state of threads, and to either detach or let it run under the debugger (it is important not to pause the component for too long, since eventually network request timeouts on the upstream component may turn the system into an unusual state):
# gdb --pid=22528 .... (gdb) info thread 3 Thread 0x7f601f224700 (LWP 22537) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 2 Thread 0x7f6018116700 (LWP 22541) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 * 1 Thread 0x7f6093de77e0 (LWP 22528) 0x00007f6092423fff in accept4 () from /lib64/libc.so.6 (gdb) cont Continuing. ^C Program received signal SIGINT, Interrupt. 0x00007f6092423fff in accept4 () from /lib64/libc.so.6 (gdb) detach Detaching from program: /usr/sbin/httpd, process 22528 (gdb) quit
NC uses multiple threads, which can be examined interactively to identify them:
(gdb) info thread 3 Thread 0x7f601f224700 (LWP 22537) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 2 Thread 0x7f6018116700 (LWP 22541) 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 * 1 Thread 0x7f6093de77e0 (LWP 22528) 0x00007f6092423fff in accept4 () from /lib64/libc.so.6 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f6018116700 (LWP 22541))]#0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 #1 0x00007f60923e6fd0 in sleep () from /lib64/libc.so.6 #2 0x00007f608e0494c6 in monitoring_thread (arg=0x7f608e304520) at handlers.c:620 #3 0x00007f60926d37f1 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f6092421ccd in clone () from /lib64/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x7f601f224700 (LWP 22537))]#0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x00007f60923e715d in nanosleep () from /lib64/libc.so.6 #1 0x00007f609241b124 in usleep () from /lib64/libc.so.6 #2 0x00007f608e07fe40 in sensor_bottom_half () at sensor.c:54 #3 0x00007f608e07fecb in sensor_thread (arg=0x0) at sensor.c:76 #4 0x00007f60926d37f1 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f6092421ccd in clone () from /lib64/libc.so.6 (gdb)
One can discern from the above that thread 2 is the
monitoring_thread and thread 3 is the
sensor_thread. If there were instances in the process of being started up or rebooted or bundled, you would also see
bundling_thread in the list.
Run Eucalyptus component under gdb
Several environment variables must be set when starting a Eucalyptus component under
gdb from the beginning:
export EUCALYPTUS=/opt/eucalyptus export AXIS2C_HOME=/opt/eucalyptus/packages/axis2c-src-1.6.0/ export LD_LIBRARY_PATH=$AXIS2C_HOME/lib:$AXIS2C_HOME/modules/rampart export PATH=$PATH:$EUCALYPTUS/usr/lib/eucalyptus
The first two are critical for any invocation, the last two may be needed, depending on the execution path of the component. Any running instance of the component must be shut down before invoking the component under the debugger. Depending on the distribution, the Apache binary may be called
# gdb /usr/sbin/httpd ... Reading symbols from /usr/sbin/httpd...(no debugging symbols found)...done. Missing separate debuginfos, use: debuginfo-install httpd-2.2.15-15.el6.centos.1.x86_64 (gdb) break monitoring_thread Function "monitoring_thread" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (monitoring_thread) pending. (gdb) run -X -f $EUCALYPTUS/etc/eucalyptus/httpd-nc.conf >/dev/null Starting program: /usr/sbin/httpd -X -f $EUCALYPTUS/etc/eucalyptus/httpd-nc.conf >/dev/null [Thread debugging using libthread_db enabled] [New Thread 0x7fff833d4700 (LWP 382)] Detaching after fork from child process 383. Detaching after fork from child process 385. [New Thread 0x7fff7c2c6700 (LWP 386)] [Switching to Thread 0x7fff7c2c6700 (LWP 386)] Breakpoint 1, monitoring_thread (arg=0x7ffff24b4520) at handlers.c:498 498 logprintfl (EUCADEBUG, "spawning monitoring thread\n"); (gdb) cont Continuing.
Note how setting breakpoints before the Eucalyptus component shared library is loaded results in 'not defined' error. Take care to type in the breakpoint information accurately. For NC, the default policy of debugger staying with the parent process is sufficient. For CC, which uses forks extensively, you may be able to reach the desired process by setting
set follow-fork-mode child option on the
Obtaining stack traces with pstack or gstack
Stack traces are useful indicators of what a process is doing at a point in time. For instance, analysis of locks being held by threads may help identify the cause of a deadlocked process. Although
gdb can be attached to a Eucalyptus process to obtain stack traces, it can be tedious when many processes are involved, as in the case of the CC. Using
gstack (for a threaded process, like NC) is a faster alternative, especially in combination with a bash for-loop. (The two commands are available as part of the
- For CC, the following command will print the top 10 stack frames of each process that makes up the CC:
for pid in `ps aux | grep euca | grep cc | cut -c 10-15 | xargs` ; do echo; echo $pid; pstack $pid | head -10 ; done | less
- For NC, the following command will do the same for stack state of both processes and threads that make up the NC:
for pid in `ps aux | grep euca | grep nc | cut -c 10-15 | xargs` ; do echo; echo $pid; gstack $pid | head -10 ; done | less
Sniffing CC's or NC's network traffic
Sniffing control network traffic between Eucalyptus components can help diagnose many problems, especially those related to syntax, signing, or timing of communication messages. Since the message are in semi-human-readable format (XML) and not encrypted or compressed (only signed), not much processing is required to make some sense of them.
Two important parameters for sniffing are ethernet device:
lofor communication between co-located components (CLC and CC)
eth0for communication between distributed components (CC and NC)
and TCP port:
8774for CLC-CC communication
8775for CC-NC communication
Even the most commonly available Unix tool,
tcpdump, results in readable output with just a few flags:
tcpdump -i eth0 -Als0 port 8775
We either pipe the output of such command into
less for searchable, paged output or save it in raw format with
-w filename.dump option for future analysis, either with
tcpdump or other tools that can read
tcpdump format, such as
Also commonly available on Unix systems is
ngrep, which is designed for searching for strings in network traffic. For instance, the following expression looks for packets containing
describe message (such as the DescribeResource, DescribeInstances, and DescribeSensors queries that periodically traverse the system):
ngrep -d eth0 -qi describe port 8775
For extracting content of specific TCP flows (i.e., data flowing in one direction on a connection), a tool called
tcpflow can be useful. It is not commonly available in package repositories, but it is easy enough to install it from source:
pushd /tmp wget https://github.com/downloads/simsong/tcpflow/tcpflow-1.3.0.tar.gz tar zxvf tcpflow-1.3.0.tar.gz cd tcpflow-1.3.0 yum -y install gcc-c++ libpcap-devel ./configure make sudo make install popd
After running the tool for a bit to capture packets, one can examine individual flows with
less or with tools capable of pretty-printing the XML which makes up SOAP messages in Eucalyptus:
mkdir tcpflows cd tcpflows tcpflow -i eth0 port 8775 ^C less *
Here we filter out messages containing the string DescribeInstance, concatenate them together, and pass to
xmlstarlet wrapped by a top-level element
<trace> (which is as good as any for the purpose).
yum install xmlstarlet echo '<trace>'`grep -li describeinstance * | xargs grep --no-filename soapenv`'</trace>' | xmlstarlet fo | less