Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring Java Servers On Mac OS Lion Causes Monitored Server To Segfault #277

Closed
giampaolo opened this issue May 23, 2014 · 5 comments
Closed

Comments

@giampaolo
Copy link
Owner

From shane.c....@gmail.com on June 10, 2012 05:16:51

Running the following code to monitor a Java based server on Mac OS Lion causes 
the monitored server to seg fault. I tried this with several industrial grade 
servers (ActiveMQ, Tomcat etc) and each time the server crashed after several 
minutes when the script is run on five second loop. 

I want to be clear - it is not the python script that fails - it is the java 
server process that is being monitored that consistently segfaults. I tried 
with non-java processes (firefox etc) and did not observe the same behavior. 

    #!/usr/bin/env python

    import psutil
    import sys

    proc = None;

    #find the server we are looking for
    for ps in psutil.process_iter():
        #print ps.name
        try:
            if( ps.name == "java" ):
                for cmd in ps.cmdline:
                    if cmd.count("apache-activemq-5.4.2") > 0:
                        proc = ps;
                        break
            if proc is not None:
                break

        except Exception, e:
            pass

    if not proc:
        print "SERVER NOT RUNNING..."
        sys.exit(1)


    print " CPU:    {0:15.1f}%".format(proc.get_cpu_percent())
    print " U Time: {0:15.1f}s".format(proc.get_cpu_times().user)
    print " S Time: {0:15.1f}s".format(proc.get_cpu_times().system)
    print " Memory: {0:15.1f}%".format(proc.get_memory_percent())
    print " Threads:{0:13d}".format( proc.get_num_threads() )
    print " Files:  {0:13d}".format( len(proc.get_open_files()) )
    print " INET:   {0:13d}".format( len(proc.get_connections()) )



What is the expected output?  
The service being monitored should continue to run 



What do you see instead?  
Segmentation fault: 11

Original issue: http://code.google.com/p/psutil/issues/detail?id=277

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on June 09, 2012 20:57:35

Hi Shane,

Since you're able to reproduce the problem simply on your system, can you try 
narrowing down the steps to reproduce to the smallest test case? For example, 
does the problem happen due to use of one of these specific calls below? 

print " CPU:    {0:15.1f}%".format(proc.get_cpu_percent())
print " U Time: {0:15.1f}s".format(proc.get_cpu_times().user)
print " S Time: {0:15.1f}s".format(proc.get_cpu_times().system)
print " Memory: {0:15.1f}%".format(proc.get_memory_percent())
print " Threads:{0:13d}".format( proc.get_num_threads() )
print " Files:  {0:13d}".format( len(proc.get_open_files()) )
print " INET:   {0:13d}".format( len(proc.get_connections()) )

It would be very helpful to determine specifically which feature of psutil 
seems to be causing a problem for the Java process. If you are getting a 
hotspot crash dump from the JVM that would also be helpful to include here. 

Thanks

@giampaolo
Copy link
Owner Author

From shane.c....@gmail.com on June 09, 2012 21:20:50

Yes. I had actually been doing this in the background - I tried running each 
one of these individually and could not reproduce the segfault after running 
~10 minutes each. Within minutes of starting them all again, the segfault 
happened again. So, it appears to not be a single call, but some combination of 
multiple. I will try combining and see what I can come up with.

@giampaolo
Copy link
Owner Author

From shane.c....@gmail.com on June 09, 2012 22:37:29

OK - I have caused it to happen with this combination:

print " S Time: {0:15.1f}s".format(proc.get_cpu_times().system)
print " Memory: {0:15.1f}%".format(proc.get_memory_percent())
print " Threads:{0:13d}".format( proc.get_num_threads() )
print " Files:  {0:13d}".format( len(proc.get_open_files()) )

This was the smallest combination that I could get it to happen with. Is it 
possible that this is a timing issue - and not really dependent on what we are 
doing - but how long we are doing it for (ie, the longer I spend working with 
the proc object, the greater the chance that the error will occur)? If so, I 
could reduce the amount of time by building the string and then printing it all 
at once - but I don't like the idea that the thing I am using to monitor my 
applications is the one that it murdering them :)

I will turn debugging on in the jvm and see if I can get more information there.

@giampaolo
Copy link
Owner Author

From g.rodola on February 24, 2013 13:59:45

Any news about this?

@giampaolo
Copy link
Owner Author

Closing as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant