Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX support #1123

Merged
merged 9 commits into from Sep 26, 2017
Merged

AIX support #1123

merged 9 commits into from Sep 26, 2017

Conversation

wiggin15
Copy link
Collaborator

@wiggin15 wiggin15 commented Sep 5, 2017

Following #605, this is the proposed branch for AIX support.

The following methods and methods are not supported in AIX in this branch:

psutil.Process.memory_maps
psutil.Process.num_ctx_switches

Not all unit tests pass on AIX at this time (I didn't make any changes to unit tests yet).

Copy link
Owner

@giampaolo giampaolo left a comment

Here's an initial review. Overall, really great work!
There are more issues but the ones I gave thus far are enough to keep you busy for quite a while.

psutil/TODO.aix Outdated
Process.io_counters read count is always 0


TestSystemAPIs.test_pid_exists_2 there are pids in /proc that don't really exist
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, say, a /proc/54 directory exists but there's no such PID?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, AIX is weird like that.

Copy link
Owner

@giampaolo giampaolo Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really unfortunate. How about doing this then?

def pids():
    """Returns a list of PIDs currently running on the system."""
    return [int(x) for x in os.listdir('/proc') if x.isdigit() \
            and  _psposix.pid_exists(int(x))]

Does this test pass?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes test_pid_exists_2 but causes test_pids to break because ps still returns the processes that "don't really exist" and then we get this:

  File "/root/psutil/psutil/tests/test_posix.py", line 297, in test_pids
     self.fail("difference: " + str(difference))
AssertionError: difference: [131076, 196614, 1048608, 1114146, 1179684, 1376298, 1638568]

psutil/TODO.aix Outdated


TestSystemAPIs.test_pid_exists_2 there are pids in /proc that don't really exist
TestProcess.test_name isolated python calls execve which changes process name
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our environment we're using a custom version of Python, which on AIX uses a trick utilizing execve. This means the test will work on regular environments but not on ours specifically. I'm not sure this file should really be included in the repo (I'm also not sure it's up-to-date), this file was created for note taking when debugging the tests.

Copy link
Owner

@giampaolo giampaolo Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this then. Simply make sure name() returns the expected string (in case of a Python process it should be "python".

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execve causes the name to change so sys.executable and name don't match.

AssertionError: ('python2.7', 'python2.7.bin')

This is implementation detail in our environment and can be ignored.
I found out that the set of tests that fail is very different from before (it's more stable now too, i.e. less "flaky" tests). This file will be updated in the next commit.

psutil/TODO.aix Outdated

TestSystemAPIs.test_pid_exists_2 there are pids in /proc that don't really exist
TestProcess.test_name isolated python calls execve which changes process name
TestProcess.test_num_fds opening a socket doesn't create fd in /proc/pid/fd (until data is sent??)
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean num_fds() is unreliable?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess so

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this doesn't fail any more

psutil/TODO.aix Outdated
TestSystemAPIs.test_pid_exists_2 there are pids in /proc that don't really exist
TestProcess.test_name isolated python calls execve which changes process name
TestProcess.test_num_fds opening a socket doesn't create fd in /proc/pid/fd (until data is sent??)
TestProcess.test_open_files /dev/null shows in open_files but it isn't a file
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just skip it?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There are probably many changes to the tests I can do to make them pass. I deliberately did not change the unit tests for now to see what breaks - but I should before this will be really ready to be merged.

psutil/TODO.aix Outdated
TestProcess.test_name isolated python calls execve which changes process name
TestProcess.test_num_fds opening a socket doesn't create fd in /proc/pid/fd (until data is sent??)
TestProcess.test_open_files /dev/null shows in open_files but it isn't a file
TestProcess.test_pid_0 pid 0 doesn't have a name on AIX
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ps list it? If not is there some other task manager on AIX that does? If not I would advice to return "kernel".

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ps" calls PID 0 "swapper". I can return this name specifically for the case of pid=0.

Copy link
Owner

@giampaolo giampaolo Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Also make sure pids() return PID 0 and pid_exists(0) return True. If they don't by default then you can simply hard-code it. ps should be used as the leading reference and psutil should behave the same as ps.

error:
Py_XDECREF(py_cputime);
Py_DECREF(py_retlist);
free(cpu);
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should free() it only if it hasn't been malloc()'ed before. You can set cpu = NULL; earlier and do if (cpu != NULL) {free(cpu);} here

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't get to error without allocating cpu, so this is safe.

diskt = (perfstat_disk_t *)calloc(disk_count,
sizeof(perfstat_disk_t));
if (diskt == NULL) {
PyErr_SetFromErrno(PyExc_OSError);
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyErr_NoMemory

);
if (py_disk_info == NULL)
goto error;
if (PyDict_SetItemString(py_retdict, diskt[i].name,
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,25 @@
/*
* Copyright (c) 2009, Giampaolo Rodola'. All rights reserved.
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add your name. Also do the same in other new files (copy license and add your name).

@@ -688,7 +692,7 @@ void init_psutil_posix(void)
PyObject *module = Py_InitModule("_psutil_posix", PsutilMethods);
#endif

#if defined(PSUTIL_BSD) || defined(PSUTIL_OSX) || defined(PSUTIL_SUNOS)
#if defined(PSUTIL_BSD) || defined(PSUTIL_OSX) || defined(PSUTIL_SUNOS) || defined(_AIX)
Copy link
Owner

@giampaolo giampaolo Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should define PSUTIL_AIX in setup.py and use it in here

@wiggin15
Copy link
Collaborator Author

wiggin15 commented Sep 7, 2017

I added a new commit. Still no AIX-specific tests and no unicode string handling, but I fixed some broken functions and started changing the unit tests a bit. I think this looks better.

# an empty list means there were no connections for process or
# process is no longer active so we force NSP in case the PID
# is no longer there.
if not ret:
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove the if not ret line. There may be some connections but the process may have disappeared beforehand so we basically want to always call os.stat

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again the same as in _pssunos.py

psutil/_psaix.py Outdated
@wrap_exceptions
def terminal(self):
psinfo = self._proc_basic_info()
ttydev = psinfo[-1]
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you better use a "fixed" integer (5, 6 or whatever) because if you add a new element to the returned list this will break

psutil/_psaix.py Outdated
if PY3:
stdout, stderr = [x.decode(sys.stdout.encoding)
for x in (stdout, stderr)]
if "no such process" in stderr:
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps do stderr.lower()?

psutil/_psaix.py Outdated
path = path.strip()
if path.startswith("//"):
path = path[1:]
if path == "Cannot be retrieved":
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps do path.lower()?

Py_XDECREF(py_tuple);
Py_DECREF(py_retlist);
if (ut != NULL)
endutent();
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use endutxent(); on line 288, not this. Why?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're right.
Again, this is the same as in the sunos implementation.

}
}
endutxent();
if (boot_time != 0.0) {
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where time can be 0.0? Perhaps we need a comment here?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. This too is copied from _psutil_sunos.c.

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boot_time will be 0.0 if the loop breaks without finding BOOT_TIME. We can add a comment, but again this is common implementation with Solaris.

Copy link
Owner

@giampaolo giampaolo Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. I now realize that sunos has different "minor" issues like this one (and others I mentioned here). I will stop being picky and just focus on more important aspects. =)

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are definitely some comments that are relevant and should be fixed in Solaris too. There are also a number of tests that fail in Solaris as well as on AIX that can be fixed in both. I'm keeping track of your comments and may submit a separate PR to fix them after this one.

cpu.syscall,
cpu.devintrs,
cpu.softintrs,
cpu.traps
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TODO.aix you state all of these numbers are 0, correct?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all of them, only syscall and traps. I'm not sure why.

Copy link
Owner

@giampaolo giampaolo Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then. If it's an OS problem there's nothing we can do.

Copy link
Owner

@giampaolo giampaolo Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just add 2 XXX comments to signal that syscall and traps fields are always 0.

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. My bad, there was a bug here (I don't like it when I have bugs :/). The values are ulonglong, so should be built with K instead of I. The values are not 0.

Copy link
Owner

@giampaolo giampaolo Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Just update the doc/TODO.aix then

inet_ntop(fam, raddr, raddr_str, sizeof(raddr_str));
return Py_BuildValue("(iii(si)(si)ii)", fd, fam,
s.so_type, laddr_str, lport, raddr_str,
rport, state, pid);
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect returning 2 tuples in a single shot like this ((si)) will leak memory. You need 3 tuples, 2 per addresses and 1 per connection. It's tedious work but you can pretty much copy & paste from here:

psutil/psutil/_psutil_osx.c

Lines 1332 to 1349 in f435c2b

py_laddr = Py_BuildValue("(si)", lip, lport);
if (!py_laddr)
goto error;
if (rport != 0)
py_raddr = Py_BuildValue("(si)", rip, rport);
else
py_raddr = Py_BuildValue("()");
if (!py_raddr)
goto error;
// construct the python list
py_tuple = Py_BuildValue(
"(iiiNNi)", fd, family, type, py_laddr, py_raddr, state);
if (!py_tuple)
goto error;
if (PyList_Append(py_retlist, py_tuple))
goto error;
Py_DECREF(py_tuple);

Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: you can use make test-memleaks to make sure of this.

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the result of make test-memleaks if there is a memory leak? Running this command returned 3 failed test:

test_memory_leaks.TestProcessObjectLeaks.test_num_ctx_switches
test_memory_leaks.TestTerminatedProcessLeaks.test_io_counters
test_memory_leaks.TestTerminatedProcessLeaks.test_num_ctx_switches

The failures don't seem to be due to memory leaks.

Copy link
Owner

@giampaolo giampaolo Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about memory leaks later, after you fixed unicode strings and /proc fs location.

msz = (size_t)(PROCSIZE * PROCINFO_INCR);
processes = (struct procentry64 *)malloc(msz);
if (!processes) {
PyErr_SetFromErrno(PyExc_OSError);
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyErr_NoMemory()

msz = (size_t)(PROCSIZE * (Np + PROCINFO_INCR));
processes = (struct procentry64 *)realloc((char *)processes, msz);
if (!processes) {
PyErr_SetFromErrno(PyExc_OSError);
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyErr_NoMemory()

if (!fds) {
fds = (struct fdsinfo64 *)malloc((size_t)FDSINFOSIZE);
if (!fds) {
PyErr_SetFromErrno(PyExc_OSError);
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyErr_NoMemory()

setup.py Outdated
@@ -254,6 +264,8 @@ def get_ethtool_macro():
if platform.release() == '5.10':
posix_extension.sources.append('psutil/arch/solaris/v10/ifaddrs.c')
posix_extension.define_macros.append(('PSUTIL_SUNOS10', 1))
if AIX:
Copy link
Owner

@giampaolo giampaolo Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif is better

@giampaolo
Copy link
Owner

giampaolo commented Sep 8, 2017

Good work on addressing the first set of comments! I'm gonna add another big request for change.
Right now "/proc" is hard coded both in the py and the C files. It should be possible to change that via psutil.PROCFS_PATH (it's already possible to do this on Linux and SunOS). Here's how to do that.

Copy this in _psaix.py:
https://github.com/Infinidat/psutil/blob/cac89d522d64571a53011cd0914a654a9c5e2869/psutil/_pslinux.py#L215
If you prefer (even better) you can move that function in _common.py so that you can share it between all the 3 platforms.

Then define a new attr in the Process class like this:
https://github.com/Infinidat/psutil/blob/cac89d522d64571a53011cd0914a654a9c5e2869/psutil/_pslinux.py#L1392
...and pass that string to the underlying C functions that need it.

On the C side of things you'll now have to accept 2 args, the PID and /proc path prefix, like this:
https://github.com/Infinidat/psutil/blob/cac89d522d64571a53011cd0914a654a9c5e2869/psutil/_psutil_sunos.c#L96-L101

Not sure if I've been clear (I hope =)).

@wiggin15
Copy link
Collaborator Author

wiggin15 commented Sep 10, 2017

What's the reason for making /proc changeable? Is this configurable on the operating system?

@giampaolo
Copy link
Owner

giampaolo commented Sep 10, 2017

What's the reason for making /proc changeable?

#558

@wiggin15
Copy link
Collaborator Author

wiggin15 commented Sep 10, 2017

I think, instead of moving the get function to _common, we can move this function to the main __init__.py file and then we won't have to use the sys.modules trick. Something like this in __init__:

if LINUX or SOLARIS or AIX:
    # This is a public writable constant which is read from platform-specific code
    PROCFS_PATH = "/proc"

    def get_procfs_path():
        return PROCFS_PATH

What do you think?
In addition, I noticed that the BSD code is still using hard-coded /proc. Is it not relevant in that platform? Maybe we can change to if POSIX instead of Linux/Solaris/AIX if the BSD module is changed.

@giampaolo
Copy link
Owner

giampaolo commented Sep 10, 2017

How can you use that from _ps*.py modules if it's defined in __init__.py? It's __init__ which imports those module not vice versa. Anyway it looks like something which can be dealt with in another PR so for the time being you can also define it in _psaix.py and then we can move it later.

In addition, I noticed that the BSD code is still using hard-coded /proc. Is it not relevant in that platform?

It's not, BSD systems do not rely on /proc.

@wiggin15
Copy link
Collaborator Author

wiggin15 commented Sep 10, 2017

How can you use that from _ps*.py modules if it's defined in init.py?

It's a matter of import order. If you define it in __init__ before importing the platform-specific modules, they will succeed in importing from __init__. Maybe not the best way to do it though.

BSD systems do not rely on /proc.

_psbsd.py has a few readlinks and opens from /proc (the C extension doesn't, but still..)
EDIT: I see it's only on NetBSD

@wiggin15
Copy link
Collaborator Author

wiggin15 commented Sep 10, 2017

Added more commits :)

Copy link
Owner

@giampaolo giampaolo left a comment

Great work with unicode handling and /proc changes!
I added some comments about the C code.
When those are fixed I suppose we can start looking at the the output of make test and make test-memleaks.

psutil/TODO.aix Outdated
test_procinfo missing API
test_cpu_stats cpu_stats always returns ctx_switches=0
test_disk_usage df returns "-" for all procfs fields but API returns something real
test_pid_exists_2 there are pids in /proc that don't really exist
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still worries me and should be discussed. I suppose I'll have a better idea of what this means once I look at the test results.

switches performed by this process.
"""
return self._proc.num_ctx_switches()
if not AIX:
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using if hasattr(_psplatform.Process, "num_ctx_switches"): instead

@@ -2110,7 +2120,7 @@ Constants
.. _const-procfs_path:
.. data:: PROCFS_PATH

The path of the /proc filesystem on Linux and Solaris (defaults to
The path of the /proc filesystem on Linux, Solaris and AIX (defaults to
``"/proc"``).
You may want to re-set this constant right after importing psutil in case
your /proc filesystem is mounted elsewhere or if you want to retrieve
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add the new AIX constant in the doc and mark it as .. versionadded:: 5.4.0.

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs have a section called .. _const-oses:. Is it ok if I add AIX to this section with:

.. versionchanged:: 5.4.0 added AIX

in order to keep this const in the same group as the rest?

fam = ifr->ifr_addr.sa_family;

if (fam == AF_INET || fam == AF_INET6) {
cifa = (struct ifaddrs *) calloc(1, sizeof(struct ifaddrs));
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you need to check calloc return value

if (lseek64(Kd, (off64_t)addr, L_SET) == (off64_t)-1)
return(1);
br = read(Kd, buf, len);
return((br == len) ? 0 : 1);
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly you return 1 (meaning error) in case of len/size mismatch. Whereas the previous lseek and read calls set errno, this line (53) won't, so I think you need to do this:

if (br != len) {
    PyErr_SetString(PyExc_RuntimeError, 
                    "size mismatch when reading kernel memory fd");
    return 1;
}
return 0;

Also, since you set the exception in here, you should do the same for the 2 calls above (lseek and read) and do PyErr_SetFromErrno(PyExc_OSError);.
Of course the callers of this function should just return NULL.

if (fam == AF_INET || fam == AF_INET6) {
/* Read protocol control block */
if (!s.so_pcb
|| psutil_kread(Kd, (KA_T) s.so_pcb, (char *) &inp, sizeof(inp))) {
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

return NULL;
}
if ((KA_T) f.f_data != (KA_T) unp.unp_socket) {
PyErr_SetFromErrno(PyExc_OSError);
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm I suppose errno will not be set here, so this will fail with an "empty" exception. Perhaps you want RuntimeError?

size_t msz;
pid32_t requested_pid;
pid32_t pid;
int Np = 0; /* number of processes */
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call it num_procs? It would be less confusing also because there's another np variable.

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from lsof code, which is pretty terrible:
https://fossies.org/dox/lsof_4.89_src/aix_2dproc_8c_source.html
Interestingly both np and Np are for the number of processes in processes. Np is the number allocated and np is the number read. I know these are not good names but I prefer to stick to the original names for comparing to the original code (it helped just now).

Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. OK then. And yes, I remember lsof source code: I remember I gave up because I didn't understand it. =)


Kd = open(KMEM, O_RDONLY, 0);
if (Kd < 0) {
PyErr_SetFromErrno(PyExc_OSError);
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you can use PyErr_SetFromErrnoWithFilename instead

}

if (i > 0)
np += i;
Copy link
Owner

@giampaolo giampaolo Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this on purpose?

Copy link
Collaborator Author

@wiggin15 wiggin15 Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also from lsof code, line 476. This code is tricky but it's correct. We're reading the processes into p in PROCINFO_INCR increments. We realloc the next "block" for processes every time we read a full PROCINFO_INCR block. The last call can return less than PROCINFO_INCR processes, but we still read that block and need to go over it, so np must be increased by the number of processes in the last block - which is in i. See the docs for getprocs64
Confusing, I know.

@wiggin15
Copy link
Collaborator Author

wiggin15 commented Sep 12, 2017

Added a new commit and uploaded the latest test results to https://gist.github.com/wiggin15/1d76d1dc1ad50e6665fe4dc5c6fa898a
please ignore 2 tests that fail with python2.7.bin, this is caused by a trick we're using on our environment.

@giampaolo
Copy link
Owner

giampaolo commented Sep 13, 2017

  File "/root/psutil/scripts/procinfo.py", line 228, in run
    print_("ctx-switches", str_ntuple(pinfo['num_ctx_switches']))
KeyError: 'num_ctx_switches'

This is easy: just change scripts/procinfo.py so that it won't print this info on AIX

  File "/root/psutil/psutil/tests/test_posix.py", line 368, in df
    total = int(fields[1]) * 1024
ValueError: invalid literal for int() with base 10: '-'

This means df -k output is different on AIX compared to other POSIXes. You can simply skip thi test on AIX.

  File "/root/psutil/psutil/tests/test_contracts.py", line 541, in cwd
    self.assertIsInstance(ret, str)
AssertionError: None is not an instance of <type 'str'>

Just return an empty string in cwd() method of _psaix.py instead of None.

======================================================================
FAIL: psutil.tests.test_misc.TestMisc.test_setup_script
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/psutil/psutil/tests/test_misc.py", line 369, in test_setup_script
    self.assertRaises(SystemExit, module.setup)
AssertionError: SystemExit not raised

Just skip this one.

======================================================================
FAIL: psutil.tests.test_misc.TestProcessUtils.test_create_zombie_proc
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/psutil/psutil/tests/test_misc.py", line 950, in test_create_zombie_proc
    zpid = create_zombie_proc()
  File "/root/psutil/psutil/tests/__init__.py", line 366, in create_zombie_proc
    call_until(lambda: zproc.status(), "ret == psutil.STATUS_ZOMBIE")
  File "/root/psutil/psutil/tests/__init__.py", line 593, in wrapper
    return fun(*args, **kwargs)
  File "/root/psutil/psutil/tests/__init__.py", line 643, in call_until
    assert eval(expr)
AssertionError

OK, you said create_zombie_proc() doesn't work on AIX. Have you checked whether ps is able to detect it though? I just want to be sure it's not status() method which is broken. If we make sure it's not and we don't find a way to correctly create a zombie process via unit tests then let' just skip this.


  File "/root/psutil/psutil/tests/test_process.py", line 720, in test_cmdline
    ' '.join(cmdline))
AssertionError: '/root/python/bin/python2.7 -c' != '/root/python/bin/python2.7 -c import time; time.sleep(60)'

I would just change the test and do a comparison of the first 30 chars if you're on AIX, else just do what we're currently doing.

  File "/root/psutil/psutil/tests/test_process.py", line 322, in test_io_counters
    self.assertGreater(io2.read_count, io1.read_count)
AssertionError: 0L not greater than 0L

Same here. If we know io_counters()'s read_count is always zero just skip this specific test line when on AIX.

======================================================================
FAIL: psutil.tests.test_process.TestProcess.test_prog_w_funky_name
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/psutil/psutil/tests/test_process.py", line 755, in test_prog_w_funky_name
    self.assertEqual(p.cmdline(), cmdline)
AssertionError: Lists differ: [''] != ['/root/psutil/$testfnfoo bar ...

This is bad and should be fixed. The test may also not pass but we should never have [''] returned.


======================================================================
FAIL: psutil.tests.test_system.TestSystemAPIs.test_cpu_stats
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/psutil/psutil/tests/test_system.py", line 746, in test_cpu_stats
    self.assertGreater(value, 0)
AssertionError: 0 not greater than 0

Again, if we know 'ctx_switches' and 'interrupts' are always 0 just skip this test on solaris (just add a comment / note)

======================================================================
FAIL: psutil.tests.test_system.TestSystemAPIs.test_pid_exists_2
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/psutil/psutil/tests/test_system.py", line 238, in test_pid_exists_2
    self.fail(pid)
AssertionError: 260

This is bad. It means pid_exists() is broken. Does AIX have /proc/{pid}/status? Linux uses a specific logic for pid_exists() as it takes the Tid (thread ID) into account. I wonder if AIX does the same.

What about memory leak tests?

@giampaolo
Copy link
Owner

giampaolo commented Sep 13, 2017

Note: it may also be pids() returning a PID which does not exist, not necessarily pid_exists which is broken. Anyway, it's either one or the other. ps is your friend and should tell you who's wrong.

@wiggin15
Copy link
Collaborator Author

wiggin15 commented Sep 14, 2017

Added a new commit

Just return an empty string in cwd() method of _psaix.py instead of None.

I noticed that the test for cwd already checks whether the return value is None (which can also happen on BSD and Solaris), but the test checks if it's None after making sure it's a string, which means it's a bug in the test. I fixed the test instead. I also fixed some other failures in test_fetch_all

test_setup_script

I realized why it failed. It's because I used python setup.py test instead of make test to run the unit tests. Running the former makes argv ok for setup.setup() so it doesn't raise SystemExit with usage. I'll use make test from now on.

OK, you said create_zombie_proc() doesn't work on AIX. Have you checked whether ps is able to detect it though? I just want to be sure it's not status()

OK, so it turns out you can create zombie processes and there was indeed a bug in status. It's not simple to determine actual process state from /proc, but I think I added the code required.

I would just change the test and do a comparison of the first 30 chars if you're on AIX

The problem isn't that we cut to 30 characters (according to the struct in /proc the max should be 80), but that there is a single argument import time; time.sleep(60) which is too long so it's somehow truncated from psinfo. If I add another argument to argv after this argument, it will show in psinfo while "skipping" the long argument.

It means pid_exists() is broken

The PIDs in question do exist, in ps and in /proc. They have /proc/<pid>/psinfo, and even psutil.Process(pid) works on them. Yet pid_exists returns False, because the "send a signal" check, even when trying with Python's os.kill, raises OSError: no such process. I noticed these processes are all named wait and it appears that there is 1 per logical CPU. It's an AIX thing...

The test may also not pass but we should never have [''] returned.

I traced this back to an issue with our environment, again. create_exe used to compile C code for the test executable but now it's using sys.executable instead (also, there's a bug in that function where the default code is never used). Apparently sys.executable is not copy-able on our environment (again the execve python2.7.bin trick) so when running it it immediately exits, and that's why we get the empty cmdline. When changing create_exe (not committed, of course), those tests pass except test_prog_w_funky_name which, again, doesn't pass on Solaris either.

What about memory leak tests?

Those tests pass now after a few small changes.

New test results in: https://gist.github.com/wiggin15/d1054f0a4a172a572dd610aad649dfa7

@giampaolo
Copy link
Owner

giampaolo commented Sep 14, 2017

The PIDs in question do exist, in ps and in /proc. They have /proc//psinfo, and even psutil.Process(pid) works on them. Yet pid_exists returns False, because the "send a signal" check, even when trying with Python's os.kill, raises OSError: no such process. I noticed these processes are all named wait and it appears that there is 1 per logical CPU. It's an AIX thing...

OK, so I suppose the os.kill trick is not reliable. What about return os.path.exitsts("/proc/{pid}")? Does it work? Last possibility is doing return pid in pids() but that's slow.... :-
I will reply to other comments later (need to sleep =)).

Copy link
Owner

@giampaolo giampaolo left a comment

Some more comments. We're almost there.

psutil/TODO.aix Outdated
@@ -0,0 +1,14 @@
AIX support is experimental and incomplete at this time.
Copy link
Owner

@giampaolo giampaolo Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just say it's experimental at this point, but not incomplete.

psutil/_psaix.py Outdated
avail = free * PAGE_SIZE
used = inuse * PAGE_SIZE
percent = usage_percent((total - avail), total, _round=1)
return svmem(total, avail, percent, used, free)
Copy link
Owner

@giampaolo giampaolo Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to stress this a bit more because I really think it's important. Why can't we check for "close" values? How much do the two values differ? Note, the idiom current used in tests is this one (taken from BSD):


    @unittest.skipIf(not MUSE_AVAILABLE, "muse not installed")
    @retry_before_failing()
    def test_muse_vmem_active(self):
        num = muse('Active')
        self.assertAlmostEqual(psutil.virtual_memory().active, num,
                               delta=MEMORY_TOLERANCE)

I wuld like to have something like this at least for total memory (that value should NOT change).

total, free, sin, sout = cext.swap_mem()
used = total - free
percent = usage_percent(used, total, _round=1)
return _common.sswap(total, used, free, percent, sin, sout)
Copy link
Owner

@giampaolo giampaolo Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe thi