Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Process.environ() on *BSD family #1800

Merged
merged 6 commits into from Aug 13, 2020

Conversation

ArminGruner
Copy link
Contributor

Hi, I've added support for FreeBSD almost two years ago, but forgot to create a pull request ;-)
While I've been there, I also added support for NetBSD and OpenBSD lately.

Also, there are some fixes to make the C extension compile again on older FreeBSD, and recent NetBSD versions.

This closes #893

Greetings from Munich,
Armin

@giampaolo
Copy link
Owner

giampaolo commented Aug 1, 2020

Oh nice! I see some errors (errno 0):
https://github.com/giampaolo/psutil/pull/1800/checks?check_run_id=935540843
Since kvm is available from FreeBSD 8 and libprocstat requires FreeBSD9, why don't avoid using libprocstat and only rely on kvm? (+ we already use kvm extensively)

@ArminGruner
Copy link
Contributor Author

ArminGruner commented Aug 1, 2020

Oh nice! I see some errors (errno 0):
https://github.com/giampaolo/psutil/pull/1800/checks?check_run_id=935540843

Hi,
actually I had some issues with the KVM interface with later FreeBSD versions. But I don't remember why I used the procstat interface (probably because I've checked both "ps" and "procstat" utils and copied code from there..).
But you're right, it looks the KVM interfaces with some minor tweaks also works for later FreeBSD versions.
Let me check whether it doesn't break OpenBSD or NetBSD, I will update the pull request soon!
Thanks!

@giampaolo
Copy link
Owner

Sounds good. ;)

@ArminGruner ArminGruner force-pushed the master branch 6 times, most recently from 0a768c7 to 1dc6963 Compare August 1, 2020 22:05
@ArminGruner ArminGruner force-pushed the master branch 3 times, most recently from 09fc282 to 5a20c69 Compare August 2, 2020 00:38
Copy link
Owner

@giampaolo giampaolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review. Little style request - please use:

if (something) {
    ...
}

Instead of:

if (something) 
{
    ...
}

kd = kvm_openfiles(NULL, NULL, NULL, KVM_NO_FILES, errbuf);
#endif
if (!kd) {
PyErr_SetString(PyExc_RuntimeError, errbuf);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a utility function, currently exposed for OpenBSD only, which can convert this kvm error:

static void
convert_kvm_err(const char *syscall, char *errbuf) {
char fullmsg[8192];
sprintf(fullmsg, "(originated from %s: %s)", syscall, errbuf);
if (strstr(errbuf, "Permission denied") != NULL)
AccessDenied(fullmsg);
else if (strstr(errbuf, "Operation not permitted") != NULL)
AccessDenied(fullmsg);
else
PyErr_Format(PyExc_RuntimeError, fullmsg);
}

I think it makes sense to move convert_kvm_err out of psutil/arch/openbsd/specific.c and put it into psutil/_psutil_common.c (guarded by an #ifdef BSD...) so that you can invoke it from this file, like this:

    if (! kd) {
        convert_kvm_err("kvm_openfiles", errbuf);
        return NULL;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, nice, I've moved it.


if (!p)
{
PyObject *ret = NoSuchProcess(kvm_geterr(kd));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure this is the correct action.
kvm_getprocs doc is not clear on whether you can/should use kvm_geterr, but I suppose you can. What I'm worried about here is that the error may not necessarily be a string like "no such process". Perhaps it can also be "access denied". Or something else (in which case we want RuntimeError). So you should probably parse the string and decide what to do.

You could try a test: call this function for a PID which does not exist. If kvm_geterr returns a string indicating that, then it means you can rely on kvm_geterr.

In that case, what I recommend, is reusing and extending convert_kvm_error utility function (check if string is something like "no such process"), after you move it into psutil/_psutil_common.c (see my comment above).

The resulting code here should be something like this:

    if (!p) {
        convert_kvm_err("kvm_getprocs", kvm_geterr(kd));
        kvm_close(kd);
        return NULL;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the libkvm sources and /usr/src/bin/ps/ps.c on all three BSD variants; they all use kvm_err() if there's a failing kvm_*() call.

I've written a tiny standalone test program to check for all return values (on all three platforms).
Fortunately, they all behave the same, they set errno.
Unfortunately, OpenBSD returns different errnos than NetBSD and FreeBSD.

  • kvm_open* isn't likely to fail with EACCESS, as it doesn't open /dev/[k]mem as in the old days.
    It will only fail if the library cannot allocate internal memory, or cannot open the kernel boot file. errno will be set appropriate.
    The question is which exception should be raised - RuntimeError or OSError?

  • kvm_getprocs isn't likely to fail itself.
    If the search for the PID fails, cnt is set to 0. If the function itself returns NULL, it means that it was not able to allocate memory for the result, or the call parameters were invalid due to a programming error (&cnt pointing to invalid storage, kd invalid or NULL, meaning a previous kvm_open* did not suceed).
    We can raise RuntimeError or OSError if the call itself failed.
    If the call reports cnt == 0, this maps to ESRCH ("no such process"). This will also happen if the process with the given PID exists, but sysctl security.bsd.see_other_uids == 0.
    Unfortunately, the call behaviour is not consistent in all test cases of psutil/tests/test_process.py.
    I had to tweak error behaviour a bit so test results are consistent which what other platforms do, though I'm not perfect happy with that. On the other hand, a Python application which uses psutil has generic error handling and should not need platform specific error handling for the same error situation.
    E.g. TestProcess::test_halfway_terminated_process results in different behaviour between *BSD and Linux platforms. I've tweaked return results in a way that the unit test passes.

  • kvm_getenvv may return EPERM if the process belongs to another user. It may return ENOMEM if it cannot allocate room for the environment strings.
    It may also return EINVAL (see below)

I understand your concern mapping system errors to different Python errors;
now from what I've tested, errno is set for all failing libkvm calls, thus mapping EINVAL and ESRCH to NoSuchProcess() seems fine.

There is just one race which comes to my mind: When calling kvm_getprocs(), it may return success.
But before our call to kvm_getenvv(), we might have been preempted by the kernel, and the process in question might have terminated.
In this case, our later call to kvm_getenvv() will return EINVAL, and we probably should remap this to ESRCH?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kvm_open* isn't likely to fail with EACCESS, as it doesn't open /dev/[k]mem as in the old days.
It will only fail if the library cannot allocate internal memory, or cannot open the kernel boot file. errno will be set appropriate.
The question is which exception should be raised - RuntimeError or OSError?

If errno is set then it's easy: just use PyErr_SetFromErrno(PyExc_OSError). My understanding though is that kvm_ functions do not set errno, so we need to parse kvm_geterr. As a general rule, if errno is not set, parse kvm_geterr result and:

  • set ESRCH if the string indicates the process is gone (use NoSuchProcess("...");)
  • set EPERM if the string indicates some lack of privileges (use AccessDenied("...");)
  • raise/call PyErr_NoMemory() if the string indicates it could not allocate memory
  • raise RuntimeError in all other circumstances; if you can, also include the result of kvm_geterr

This logic (3 ifs, 1 else) should be handled by convert_kvm_err utility function, and (if you feel like it, while you're at it =)) reused by all functions that use kvm_* APIs.
Maybe convert_kvm_err could have another if condition, which is executed first:

if (errno != 0)
       PyErr_SetFromErrno(PyExc_OSError)

...but this assumes that one of the previous kvm_ calls did set errno, but again, my understanding is that they usually don't (the various man pages never mention errno).

kvm_getprocs isn't likely to fail itself. If the search for the PID fails, cnt is set to 0 [...]

Then raise NoSuchProcess("...");

[...] If the function itself returns NULL, it means that it was not able to allocate memory for the result, or the call parameters were invalid due to a programming error (&cnt pointing to invalid storage, kd invalid or NULL, meaning a previous kvm_open* did not suceed). We can raise RuntimeError or OSError if the call itself failed.

Mmmm... if we have a way to differentiate with kvm_geterr (is it ENOMEM? is it EINVAL? ....) then I would say to just stick with convert_kvm_err utility fun.

If the call reports cnt == 0, this maps to ESRCH ("no such process"). This will also happen if the process with the given PID exists, but sysctl security.bsd.see_other_uids == 0.

Damn... This I don't know. =)

There is just one race which comes to my mind: When calling kvm_getprocs(), it may return success.
But before our call to kvm_getenvv(), we might have been preempted by the kernel, and the process in question might have terminated. In this case, our later call to kvm_getenvv() will return EINVAL, and we probably should remap this to ESRCH?

Mmm... hard call. I would say yes, let's map it to ESRCH.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest update of the pull requests contains the changes I've mentioned so far.

Now psutil/tests/test_process.py passes all Process().environ() related stuff on all *BSD variants.

Unfortunately, CirrusCI is now not happy with these latest changes.
Interestingly, the version yesterday passed the checks, which makes me a bit puzzled.

ERROR: psutil.tests.test_contracts.TestFetchAllProcesses.test_all

File "/tmp/cirrus-ci-build/psutil/_psbsd.py", line 674, in environ
    return cext.proc_environ(self.pid)
OSError: [Errno 22] Invalid argument

Is it possible that there is some incosistency what test_process.py checks, and what test_contracts.py checks? It looks that someone is expecting an OSError exception where the other one expects a NoSuchProcess exception?

I will have to check whether OSError should not be raised altogether. With Linux as reference platform in mind, it will never raise this exception anyway, as the system interface is completely different (and open(/proc/<pid>/...) will raise IOError, which is probably remapped later)

The whole test suite does not pass altogether, though it looks that it doesn't pass on the 5.7.2 release tag either?
That happens on all *BSD platforms here..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding though is that kvm_ functions do not set errno, so we need to parse kvm_geterr

Actually, almost all library functions do some system calls, especially the kvm* stuff.
So, after any system call which failed, there will be errno set.

Usually library utility functions do not clear errno if they return an error condition themself. They probably return addition higher-level error information, like kvm_geterr(), which in this case is of no useful information, judging from the source code review of the kvm library code: "cannot read zombproc", "nprocs corrupt", "cannot read nprocs", "out of memory", "cannot allocate memory", "unsupported libelf", "exec filename too long", "bad flags arg", "unsupported architecture", "invalid address", "kernel is not an ELF file". ;-)

This avoids inventing a dozen more library error codes if most stuff may be mapped on typical errno codes anyway.

Copy link
Owner

@giampaolo giampaolo Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, almost all library functions do some system calls, especially the kvm* stuff. So, after any system call which failed, there will be errno set.

Got it. Then I think it makes sense to do this:

#if defined(PSUTIL_FREEBSD) || defined(PSUTIL_OPENBSD) || defined(PSUTIL_NETBSD)
void
convert_kvm_err(const char *syscall, char *errbuf) {
    char fullmsg[8192];

    sprintf(fullmsg, "(originated from %s: %s)", syscall, errbuf);
    if (errno != 0)                                             // <- new
        PyErr_SetFromOSErrnoWithSyscall(syscall);               // <- new
    else if (strstr(errbuf, "Permission denied") != NULL)
        AccessDenied(fullmsg);
    else if (strstr(errbuf, "Operation not permitted") != NULL)
        AccessDenied(fullmsg);
    else
        PyErr_Format(PyExc_RuntimeError, fullmsg);
}
#endif

As for EINVAL, it's not clear to me where it originates from in psutil_proc_environ function. You should put printf()s to figure that out. At that point I'd say you'll have to decide what to do: whether to raise NoSuchProcess or AccessDenied.
From C you can use psutil_pid_exists to check if the PID exists. If it does raise AD, else NSP. It's done elsewhere in the C code: grep for psutil_pid_exists.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...e.g. here:

if (sysctl(mib, 3, procargs, &argmax, NULL, 0) < 0) {
// In case of zombie process we'll get EINVAL. We translate it
// to NSP and _psosx.py will translate it to ZP.
if ((errno == EINVAL) && (psutil_pid_exists(pid)))
NoSuchProcess("sysctl");
else
PyErr_SetFromErrno(PyExc_OSError);
goto error;
}

Even if this is OSX, it may be that on BSD kvm_getenvv may raise EINVAL in case of zombie process.

psutil/_psutil_bsd.c Show resolved Hide resolved
psutil/_psutil_bsd.c Outdated Show resolved Hide resolved
psutil/_psutil_bsd.c Outdated Show resolved Hide resolved
psutil/_psutil_bsd.c Outdated Show resolved Hide resolved
psutil/_psutil_bsd.c Outdated Show resolved Hide resolved
psutil/arch/freebsd/specific.c Show resolved Hide resolved
setup.py Outdated
Comment on lines 192 to 194
os_major = int(platform.release().split(".")[0])
if os_major >= 8:
libraries.append("kvm")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? It wasn't thus far. I'm worried about the string parsing. If it fails, the whole installation breaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder.
As mentioned, I actually more or less rebased the changes I made years ago, and for whatever reason it seemed neccessary then - probably I used a .0 FreeBSD version which had some library glitches.

With all the (FreeBSD) versions mentioned in the commit log, it is not neccessary to specific libkvm at all, as libdevstat pulls it already!

@@ -73,7 +74,11 @@
#include <utmp.h> // system users
#else
#include <utmpx.h>
#endif
#endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to indent =)

@ArminGruner ArminGruner force-pushed the master branch 4 times, most recently from 83b0db3 to c08876b Compare August 3, 2020 08:25
@giampaolo
Copy link
Owner

giampaolo commented Aug 3, 2020

Judging from the latest test failure it seems you cannot rely on errno after kvm_getenvv, or at least not all the time.
I took a peek at FreeBSD source code for kvm_getenvv. It seems sometimes you can rely on errno:
https://github.com/freebsd/freebsd/blob/b4d009417b7bed8b076f0f5637885ce165ed3b02/lib/libkvm/kvm_proc.c#L755
...sometimes you don't:
https://github.com/freebsd/freebsd/blob/b4d009417b7bed8b076f0f5637885ce165ed3b02/lib/libkvm/kvm_proc.c#L757
...and sometimes you can only rely on the error string:
https://github.com/freebsd/freebsd/blob/b4d009417b7bed8b076f0f5637885ce165ed3b02/lib/libkvm/kvm_proc.c#L723

So maybe, something like this (not tested)?

#if defined(PSUTIL_NETBSD)
    envs = kvm_getenvv2(kd, p, 0);
#else
    envs = kvm_getenvv(kd, p, 0);
#endif
    if (!envs) {
        err = kvm_geterr(kd);
        if (err) {  // probably wrong, want to check if `strlen() > 0`
            convert_kvm_err("kvm_getenvv", err);
            goto error;
        }
        if (psutil_pid_exists(pid))
            py_err = AccessDenied("kvm_getenvv");
        else
            py_err = NoSuchProcess("kvm_getenvv");
        goto error;
    }

@ArminGruner
Copy link
Contributor Author

ArminGruner commented Aug 3, 2020

Well, let's have some fun ;-)

For a moment, let's assume that the code is correct,

  • probably because we have a gut feeling it must be right
    (and the code is so short anyway, it uses system interfaces which exist stable for multiple decades),
  • because our standalone test code says so and works exactly what we expect,
  • and probably also because code review from other sources indicate that it should work how it's written
    (namely, /usr/src/bin/ps/ps.c, /usr/src/usr.bin/procstat/*.c).

So, why there are failures then? Well, probably because of good old [nightmare] reasons:

  • unstable build environment
  • unknown run environment
  • unstable test suite
  • wrong tests, assuming same standard behavior for all supported platforms, but platforms don't implement the same POSIX version

For example:

PLATFORM A:
FreeBSD ikarus.fritz.box 12.1-RELEASE-p5 FreeBSD 12.1-RELEASE-p5 GENERIC amd64
(my notebook, Thinkpad T530 i7-3520M, Standard FreeBSD installation)
Python 3.7.8 (default, Jul 11 2020, 01:22:11)
Sources: origin https://github.com/giampaolo/psutil.git
Head: a096704 - (3 weeks ago) pre-release - Giampaolo Rodola (HEAD -> master, tag: release-5.7.2, origin/master, origin/HEAD)

> gmake clean install test
======================================================================
FAIL: psutil.tests.test_posix.TestSystemAPIs.test_disk_usage
Ran 542 tests in 7.023s
FAILED (failures=1, skipped=220)

Oh well, one test probably broken. Again:

> gmake test
======================================================================
FAIL: psutil.tests.test_posix.TestSystemAPIs.test_disk_usage
AssertionError: 0.0 != 100.0 within 1 delta (100.0 difference)
======================================================================
FAIL: psutil.tests.test_bsd.BSDTestCase.test_disks
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ag/repos/git-repos/giampaolo-psutil/psutil/tests/test_bsd.py", line 116, in test_disks
    self.assertEqual(usage.total, total)
AssertionError: 17994129408 != 17993093120
----------------------------------------------------------------------
Ran 542 tests in 6.367s
FAILED (failures=2, skipped=220)

Hm, well, now two test failing. Interesting! So let's just run tests in a loop to see what's happening!

> while :
do
gmake test 2>&1 | fgrep "FAILED "
done
FAILED (failures=3, skipped=220)
FAILED (failures=2, skipped=220)
FAILED (failures=2, skipped=220)
FAILED (failures=3, errors=1, skipped=220)
FAILED (failures=3, skipped=220)

Okay, so now we have at least one more test constantly failing.. uhm, probably related to our redirected stdout?!
(I will send a fix for this when I'm back in swing mood..)

Now let's be more brave, another platform with the same OS and Python version!

PLATFORM B:
FreeBSD magnus.fritz.box 12.1-RELEASE-p3 FreeBSD 12.1-RELEASE-p3 GENERIC amd64
(my server, tons of disk space, Core i7, Standard FreeBSD installation)
Python 3.7.7 (default, Apr 28 2020, 01:32:17)
So, nearly identical (OS some patch levels behind, Python one minor version behind).

Sources: origin https://github.com/giampaolo/psutil.git
Head: a096704 - (3 weeks ago) pre-release - Giampaolo Rodola (HEAD -> master, tag: release-5.7.2, origin/master, origin/HEAD)

gmake test 2>&1  | fgrep FAIL
psutil.tests.test_contracts.TestFetchAllProcesses.test_all ... FAIL
psutil.tests.test_posix.TestSystemAPIs.test_disk_usage ... FAIL
psutil.tests.test_process.TestProcess.test_terminal ... FAIL
psutil.tests.test_bsd.BSDTestCase.test_disks ... FAIL
psutil.tests.test_bsd.FreeBSDSystemTestCase.test_cpu_stats_ctx_switches ... FAIL
psutil.tests.test_bsd.FreeBSDSystemTestCase.test_cpu_stats_interrupts ... FAIL
FAIL
FAIL: psutil.tests.test_contracts.TestFetchAllProcesses.test_all
    FAIL: test_exe pid=1426, ret='/usr/local/sbin/cupsd'
FAIL: psutil.tests.test_posix.TestSystemAPIs.test_disk_usage
FAIL: psutil.tests.test_process.TestProcess.test_terminal
FAIL: psutil.tests.test_bsd.BSDTestCase.test_disks
FAIL: psutil.tests.test_bsd.FreeBSDSystemTestCase.test_cpu_stats_ctx_switches
FAIL: psutil.tests.test_bsd.FreeBSDSystemTestCase.test_cpu_stats_interrupts
FAIL: psutil.tests.test_bsd.FreeBSDSystemTestCase.test_cpu_stats_syscalls
FAILED (failures=7, errors=2, skipped=223)

Same platform, same OS versions, same python, a server with some more long-running processes, but now 7 failures? Oh heck..

Okay, well. Now let's try a different BSD!

OpenBSD openbsd6.localdomain 6.7 GENERIC.MP#182 amd64
(vagrant virtualbox)
Python 3.7.7 (default, May 8 2020, 13:05:15)

Sources: origin https://github.com/giampaolo/psutil.git
Head: a096704 - (3 weeks ago) pre-release - Giampaolo Rodola (HEAD -> master, tag: release-5.7.2, origin/master, origin/HEAD)

gmake clean all install
psutil.tests.test_process.TestProcess.test_terminal ... FAIL
psutil.tests.test_system.TestCpuAPIs.test_cpu_times_comparison ... FAIL
psutil.tests.test_connections.TestSystemWideConnections.test_it ... FAIL
FAIL
psutil.tests.test_contracts.TestSystemAPITypes.test_net_connections ... FAIL
FAIL: psutil.tests.test_process.TestProcess.test_terminal
FAIL: psutil.tests.test_system.TestCpuAPIs.test_cpu_times_comparison
FAIL: psutil.tests.test_connections.TestSystemWideConnections.test_it
FAIL: psutil.tests.test_contracts.TestFetchAllProcesses.test_all
    FAIL: test_connections pid=0
FAIL: psutil.tests.test_contracts.TestSystemAPITypes.test_net_connections
FAILED (failures=5, errors=2, skipped=266)

Okay, 5 failures on recent OpenBSD version. Now let's see if the failure rate keeps stable:

while :; do make test 2>&1 | fgrep --line-buffered "FAIL "; done

No output anymore! Output redirection now causes the test suite to hang.. oh well.
Let's check it other way:

script -c "while :; do gmake test; done" /tmp/make.out
~~~ wait 5 minutes ~~~
CTRL-C
fgrep "FAILED " /tmp/make.out
FAILED (failures=4, errors=2, skipped=266)
FAILED (failures=4, errors=2, skipped=266)
FAILED (failures=4, errors=2, skipped=266)
FAILED (failures=4, errors=2, skipped=266)
FAILED (failures=4, errors=2, skipped=266)

Okay, on this platform there seems to be no timing or other race in either the code under test, or the test suite.

Now let's move on! Back to the laptop, now with the new sources:

Let's check what's Cirrus CI says! (no idea why it's labelled "freebsd_13_py2", actually we shall not fool ourself, actually it's
12.1 and py3!

 Cirrus CI / freebsd_13_py3 failed 14 hours ago in 2m 59s
Task Summary

Instruction main failed in 00:09
Details

00:08 clone
02:39 install
x 00:09 main

    return fun(self, *args, **kwargs)
  File "/tmp/cirrus-ci-build/psutil/_psbsd.py", line 673, in environ
    return cext.proc_environ(self.pid)
ProcessLookupError: [Errno 3] No such process (originated from kvm_getenvv)
During handling of the above exception, another exception occurred:
psutil.NoSuchProcess: psutil.NoSuchProcess process no longer exists (pid=1, name='init')
During handling of the above exception, another exception occurred:
AssertionError: NoSuchProcess not raised by Process

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/cirrus-ci-build/psutil/tests/test_contracts.py", line 397, in test_all
----------------------------------------------------------------------
Ran 619 tests in 5.078s

FAILED (failures=1, errors=2, skipped=240)
FAILED
*** Error code 1

Okay, then now let's run this test suite locally!

Sources: origin git@github.com:ArminGruner/psutil.git
Head: c08876b - (2 days ago) Implement Process.environ() on *BSD family - Armin Gruner (HEAD -> master, origin/master, origin/HEAD)

gmake clean install
(.venv)git-repos/psutil - [master●] » py psutil/tests/test_contracts.py
psutil.tests.test_contracts.TestAvailConstantsAPIs.test_PROCFS_PATH ... OK
psutil.tests.test_contracts.TestAvailConstantsAPIs.test_linux_ioprio_linux ... OK
psutil.tests.test_contracts.TestAvailConstantsAPIs.test_linux_ioprio_windows ... OK
psutil.tests.test_contracts.TestAvailConstantsAPIs.test_linux_rlimit ... OK
psutil.tests.test_contracts.TestAvailConstantsAPIs.test_win_priority ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_cpu_affinity ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_cpu_num ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_environ ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_gids ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_io_counters ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_ionice ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_memory_maps ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_num_fds ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_num_handles ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_rlimit ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_terminal ... OK
psutil.tests.test_contracts.TestAvailProcessAPIs.test_uids ... OK
psutil.tests.test_contracts.TestAvailSystemAPIs.test_battery ... OK
psutil.tests.test_contracts.TestAvailSystemAPIs.test_cpu_freq ... OK
psutil.tests.test_contracts.TestAvailSystemAPIs.test_sensors_fans ... OK
psutil.tests.test_contracts.TestAvailSystemAPIs.test_sensors_temperatures ... OK
psutil.tests.test_contracts.TestAvailSystemAPIs.test_win_service_get ... OK
psutil.tests.test_contracts.TestAvailSystemAPIs.test_win_service_iter ... OK
psutil.tests.test_contracts.TestFetchAllProcesses.test_all ... OK
psutil.tests.test_contracts.TestProcessWaitType.test_negative_signal ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_boot_time ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_cpu_count ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_cpu_freq ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_cpu_percent ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_cpu_times ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_cpu_times_percent ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_disk_io_counters ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_disk_partitions ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_net_connections ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_net_if_addrs ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_net_if_stats ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_net_io_counters ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_sensors_fans ... skipped: not supported
psutil.tests.test_contracts.TestSystemAPITypes.test_sensors_temperatures ... OK
psutil.tests.test_contracts.TestSystemAPITypes.test_users ... OK

----------------------------------------------------------------------
Ran 40 tests in 1.276s

OK (skipped=1)
SUCCESS


(.venv)git-repos/psutil - [master●] » py psutil/tests/test_process.py
----------------------------------------------------------------------
Ran 80 tests in 3.280s

OK (skipped=10)
SUCCESS

(.venv)git-repos/psutil - [master●] » py3 --version
Python 3.7.8
(.venv)git-repos/psutil - [master●] » uname -a
FreeBSD ikarus.fritz.box 12.1-RELEASE-p5 FreeBSD 12.1-RELEASE-p5 GENERIC  amd64

Okay. So, where do we go from here? (-: To be continued... (the only thing which comes to my mind: We don't know the exact FreeBSD platform on Cirrus CI; actually they state in their docs it's some standard google cloud FreeBSD box, but actually I have no access to one to check; probably Cirrus CI is just confused because I've updated master multiple times when updating the pull request)

@giampaolo
Copy link
Owner

Hehe. Keeping psutil unit tests stable has always been like a never ending battle that I never really managed to win. There's always something that breaks occasionally. The "matrix" (different OSes, OS versions, CI services, python versions, etc.) is very complex, and 95% of the times I'm on Linux (my laptop) so I'm not sure how stable tests are on other platforms.
FreeBSD is definitively the platform which received more love amongst *BSDs, but I'm not sure when was the last time I turned on my virtualized OpenBSD and NetBSD boxes, so it's very likely that some tests on those platforms are broken and require a separate fix (either in the implementation or in the test).

With that said, for this specific PR feel free to ignore any failure which is unrelated with this change (memory, disk, etc...).
Judging from the latest test failures:
https://cirrus-ci.com/task/5945661432528896
https://cirrus-ci.com/task/4819761525686272
I see 2 problems:

  • it seems we cannot determine environ for PID 0; this is common also on other platforms. For this I suggest to add a check at the top of the function and return an empty dict if PID == 0
  • as for the second error:
======================================================================
FAIL: psutil.tests.test_contracts.TestFetchAllProcesses.test_all
----------------------------------------------------------------------
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/tmp/cirrus-ci-build/psutil/_psbsd.py", line 551, in wrapper
    return fun(self, *args, **kwargs)
  File "/tmp/cirrus-ci-build/psutil/_psbsd.py", line 673, in environ
    return cext.proc_environ(self.pid)
ProcessLookupError: [Errno 3] No such process (originated from kvm_getenvv)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/cirrus-ci-build/psutil/tests/test_contracts.py", line 367, in proc_info
    info[fun_name] = fun()
  File "/tmp/cirrus-ci-build/psutil/__init__.py", line 862, in environ
    return self._proc.environ()
  File "/tmp/cirrus-ci-build/psutil/_psbsd.py", line 556, in wrapper
    raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: psutil.NoSuchProcess process no longer exists (pid=1, name='init')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/tmp/cirrus-ci-build/psutil/tests/test_contracts.py", line 369, in proc_info
    check_exception(exc, proc, name, ppid)
  File "/tmp/cirrus-ci-build/psutil/tests/test_contracts.py", line 344, in check_exception
    tcase.assertProcessGone(proc)
  File "/tmp/cirrus-ci-build/psutil/tests/__init__.py", line 902, in assertProcessGone
    self.assertRaises(psutil.NoSuchProcess, psutil.Process, proc.pid)
  File "/usr/local/lib/python3.6/unittest/case.py", line 733, in assertRaises
    return context.handle('assertRaises', args, kwargs)
  File "/usr/local/lib/python3.6/unittest/case.py", line 178, in handle
    callable_obj(*args, **kwargs)
  File "/usr/local/lib/python3.6/unittest/case.py", line 201, in __exit__
    self.obj_name))
  File "/usr/local/lib/python3.6/unittest/case.py", line 135, in _raiseFailure
    raise self.test_case.failureException(msg)
AssertionError: NoSuchProcess not raised by Process

I'm pretty sure the test is legitimate (aka not broken). What is broken is the implementation. I think that's because this part is not correct:

    if (!envs) {
        if (errno == EPERM)
            py_err = AccessDenied("kvm_getenvv");
        else
            py_err = NoSuchProcess("kvm_getenvv");
        goto error;
    }

I think a possible solution is what I suggested in my previous comment #1800 (comment). I didn't try it myself so I can be wrong (but I can try myself after pulling your branch if you're still in trouble - not sure when though). I think we're basically there though.

@ArminGruner ArminGruner force-pushed the master branch 4 times, most recently from 7d1128e to 2378208 Compare August 4, 2020 07:15
@ArminGruner
Copy link
Contributor Author

ArminGruner commented Aug 4, 2020

I'm pretty sure the test is legitimate (aka not broken). What is broken is the implementation. I think that's because this part is not correct:

if (!envs) {
    if (errno == EPERM)
        py_err = AccessDenied("kvm_getenvv");
    else
        py_err = NoSuchProcess("kvm_getenvv");
    goto error;
}

You're surely right.
In languages without support for promises or similar stuff one should never assume preconditions are met,
if they are actually not explicitly checked ;-)

Judging from the latest test failure it seems you cannot rely on errno after kvm_getenvv, or at least not all the time.

You know that I'm a great lover of *BSD.
I usually get into defend BSD mode if I read sentences like this ;-)
Mostly, but not always, this trigger is correct as *BSD library code/release engineering quality justifies it... ;-)

So:

I took a peek at FreeBSD source code for kvm_getenvv. It seems sometimes you can rely on errno:
https://github.com/freebsd/freebsd/blob/b4d009417b7bed8b076f0f5637885ce165ed3b02/lib/libkvm/kvm_proc.c#L755

Well, no, and ... yes ;-)

If kvm_getenvv() returns NULL, but errno == 0, this means the result did not fit into the buffer which was allocated.

You're right, my code assumed that it always fits, because kenv_getenvv() is called with nchr=0.

This actually indicates to the library that it should allocate whatever how much it takes (quoting Mario Draghi ;-)),
but it may fail in the third code branch you are mentioning (see below).
So my calling code must also assume (and handle) different errno codes than just ESRCH or EPERM.

...sometimes you don't:
https://github.com/freebsd/freebsd/blob/b4d009417b7bed8b076f0f5637885ce165ed3b02/lib/libkvm/kvm_proc.c#L757

No, this is clearly stated in the man page.

It states if the result does not fit into the allocated buffer, it will be truncated.
That's what the code is doing, and obviously it has to clear errno here,
because sysctl() has set it to indicate that the returned result
did not fit into the supplied buffer and actually was truncated.

...and sometimes you can only rely on the error string:
https://github.com/freebsd/freebsd/blob/b4d009417b7bed8b076f0f5637885ce165ed3b02/lib/libkvm/kvm_proc.c#L723

No, this is not true, because if bufp is NULL, this means that bufp = malloc(sizeof(char *) * argc) failed,
and thus malloc() has set errno = ENOMEM. ;-)

... and that's exactly what Cirrus CI shows with the latest pull request update.

I will have to investigate later, now for some less serious work, but vital to earn some money... ;-)

Thanks for all your feedback so far!

FreeBSD is definitively the platform which received more love amongst *BSDs,
but I'm not sure when was the last time I turned on my virtualized OpenBSD and NetBSD boxes,
so it's very likely that some tests on those platforms are broken and require a separate fix
(either in the implementation or in the test).

Actually the experience I reported clearly show how vital it is to do continuous testing on all supported platforms.
Perhaps I can help here and setup some public (well, for the psutil project) NetBSD and OpenBSD runners... will have to check how they may fit into GitHubs support for this!

@ArminGruner ArminGruner force-pushed the master branch 2 times, most recently from 64bc731 to 7aae715 Compare August 4, 2020 20:10
@ArminGruner ArminGruner force-pushed the master branch 5 times, most recently from 945a7a8 to 68c65a8 Compare August 5, 2020 23:52
@ArminGruner
Copy link
Contributor Author

Good evening .. well, what do you think? -> https://cirrus-ci.com/task/5900109814693888

@giampaolo
Copy link
Owner

giampaolo commented Aug 6, 2020

Hey Armin. Mmm... since it seems we cannot avoid ENOMEM for certain PIDs, and since it's a critical error which would inevitably break code using psutil.process_iter(), I think the best compromise is to turn ENOMEM into AccessDenied. Something like this (not tested):

    if (!envs) {
        /* Map the most obvious stuff to general high-level exceptions exported by our "psutil" */
        if (errno == EPERM)
            AccessDenied("kvm_getenvv");
        else if (errno == ESRCH)
            NoSuchProcess("kvm_getenvv");
        else if (errno == ENOMEM) {
            psutil_debug("kvm_getenvv -> ENOMEM, turned into EPERM");
            AccessDenied("kvm_getenvv -> ENOMEM, turned into EPERM");
        }
        else {
            sprintf(errbuf, "kvm_getenvv(pid=%d)", pid);
            PyErr_SetFromOSErrnoWithSyscall(errbuf);
        }
        goto error;
    }

@ArminGruner ArminGruner force-pushed the master branch 2 times, most recently from 5038c40 to f761464 Compare August 8, 2020 14:10
@ArminGruner
Copy link
Contributor Author

ArminGruner commented Aug 8, 2020

Hi Giampaolo,
I've added some workarounds to avoid this error so far.

I've filed a bug report on FreeBSD (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248537), let's see if they confirm that there is a problem or how to handle the situation [**] (meanwhile I'm pretty sure there is; I also find stuff like

» fgrep getenvv /var/log/messages
Jul 23 15:03:47 ikarus console-kit-daemon[1625]: WARNING: kvm_getenvv failed: 

in syslog files)

I need to check the other *BSD platforms if they are still ok. The CI/CD builds/tests are all green now (well some macosx build seems to be flaky, but it's not related to my changes)

[**]
... okay, I've got feedback from a FreeBSD kernel maintainer, who explains what's happening. I understand the situation, and I see that there is a dilemma altogether. Just for curiousity, I will check how Linux actually implements the /proc/<pid>/environ interface, because it's practically impossible to retrieve a modified environ for a foreign process, if stuff like setenv() is actually implemented in userspace (i.e. just by [libc] library code for the application itself, which is the typical implementation), and not via a system call.

» cat clearenv.c
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int
main(int argc, char **argv, char **env)
{
    fprintf(stderr, "pid=%d\n", getpid());
    *env = NULL;
    sleep(9999);
}
» ./clearenv &
pid=40980
» python -c 'from psutil import Process; print(Process(40980).environ())'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ag/repos/git-repos/psutil/psutil/__init__.py", line 862, in environ
    return self._proc.environ()
  File "/home/ag/repos/git-repos/psutil/psutil/_psbsd.py", line 551, in wrapper
    return fun(self, *args, **kwargs)
  File "/home/ag/repos/git-repos/psutil/psutil/_psbsd.py", line 673, in environ
    return cext.proc_environ(self.pid)
OSError: [Errno 0] No error: 0 (originated from kvm_getenvv(pid=40980))
 » cat overflowenv.c
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

int
main(int argc, char **argv, char **env)
{
    int i;
    int arg_max = sysconf(_SC_ARG_MAX);
    char *v = malloc(arg_max * 2);

    assert(v != NULL);
    for (i = 0; i < arg_max * 2; i++)
        v[i] = 'z';

    *env = v;
    fprintf(stderr, "pid=%d\n", getpid());
    sleep(9999);
}
» ./overflowenv &
pid=48755

» procstat -e 48755
  PID COMM             ENVIRONMENT                                          
procstat: sysctl(kern.proc.env): Cannot allocate memory
48755 overflowenv      -

@ArminGruner ArminGruner force-pushed the master branch 3 times, most recently from 1e783a3 to c090ca1 Compare August 8, 2020 17:34
@giampaolo
Copy link
Owner

Hey Armin. Thanks for investigating this issue so thoroughly. I seem to remember there was a similar problem with "corrupted env" on other platforms, but I don't remember the exact details (or maybe it was the cmdline(), anyway, I remember a similar problem occurring when a process env or cmdline was changed after the process was started).

AFAICT, given how thorny this issue appears to be, I think it would be fine to just raise NoSuchProcess if PID does not exist, else give up and do AccessDenied, without additional logic which tries to determine what went wrong exactly.
In the end the result is the same: we won't get the information, so I think it's better to fail with AccessDenied rather than RuntimeError.

Copy link
Owner

@giampaolo giampaolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Armin. I added a couple more (minor) comments. The rest LGTM.

#include <sys/param.h>
#include <sys/sysctl.h>
#include <sys/user.h>
#define PSUTIL_PROC_SKIP(p) (!((p)->ki_flag & P_INMEM) || ((p)->ki_flag & P_SYSTEM))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If it is, I would rather call it PSUTIL_PROC_ENV_SKIP

@@ -83,6 +89,7 @@
#include <sys/file.h>
#undef _KERNEL
#include <sys/sched.h> // for CPUSTATES & CP_*
#define PSUTIL_PROC_SKIP(p) ((p)->p_flag & P_SYSTEM)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -93,6 +100,7 @@
#ifndef DTYPE_VNODE
#define DTYPE_VNODE 1
#endif
#define PSUTIL_PROC_SKIP(p) ((p)->p_stat == SZOMB)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

p = kvm_getproc2(kd, KERN_PROC_PID, pid, sizeof(*p), &cnt);
#endif
if (!p) {
py_err = NoSuchProcess("kvm_getprocs");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid using py_err = .... Just Calling NSP/AD will set the exception.

* On NetBSD, we cannot call kvm_getenvv2() for a zombie process.
*
* To make unittest suite happy, return an empty environment.
*/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor style complaint: could you please use // comments and limit the length of the string to 80 chars?

Py_XDECREF(py_value);
Py_XDECREF(py_retdict);
kvm_close(kd);
return py_err;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return NULL; and get rid of py_err

@giampaolo
Copy link
Owner

Are we adding support for FreeBSD 7 and/or other BSD platforms? (asking for the HISTORY.rst file entry)

Tested-on: FreeBSD-13-CURRENT (r363681), FreeBSD-12.1,
           FreeBSD-11.4, FreeBSD-11.3, FreeBSD-10.4,
           FreeBSD-9.3, FreeBSD-8.4, FreeBSD-7.1,
           OpenBSD-6.7, OpenBSD-5.7, NetBSD-9.0, NetBSD-8.2
@ArminGruner
Copy link
Contributor Author

ArminGruner commented Aug 12, 2020

Are we adding support for FreeBSD 7 and/or other BSD platforms? (asking for the HISTORY.rst file entry)

Well, the code compiles under FreeBSD 7.1, and:

> uname -a
FreeBSD FreeBSD-7.1-RELEASE-amd64 7.1-RELEASE FreeBSD 7.1-RELEASE #0: Thu Jan  1 08:58:24 UTC 2009     root@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
> make PYTHON=python2.7 test
----------------------------------------------------------------------
Ran 559 tests in 7.601s
FAIL: psutil.tests.test_system.TestMiscAPIs.test_users
FAIL: psutil.tests.test_misc.TestMisc.test_serialization
FAIL: psutil.tests.test_process.TestProcess.test_long_cmdline
FAIL: psutil.tests.test_contracts.TestSystemAPITypes.test_users
FAIL: psutil.tests.test_posix.TestSystemAPIs.test_users
FAILED (failures=5, errors=30, skipped=241)

For NetBSD and OpenBSD, I was able to run the test suite, and the Process.environ() related changes work; though there are some tests failing which are unrelated to Process.environ().

@giampaolo giampaolo merged commit b09f5a2 into giampaolo:master Aug 13, 2020
giampaolo added a commit that referenced this pull request Aug 13, 2020
@giampaolo
Copy link
Owner

Merged. Thanks a lot for the great work Armin. It was a good patch. If you want to fix other BSD-related stuff you're very welcome. ;-) BSD* definitively need more maintenance.

@ArminGruner
Copy link
Contributor Author

ArminGruner commented Aug 13, 2020

Hi Giampaolo, thanks for merging!

I'm still in evaluation for a CI runner and automated CI/CD test setup for the other *BSD family members.
Unfortunately, the renowned candidates such as Appveyor or even GitHub CI require a dotnet 3.x environment (sic!).
There has been some effort to bring dotnet to both OpenBSD and NetBSD, I will check this after my vacation to Denmark ;-)

@giampaolo
Copy link
Owner

To be honest, having / maintaining extra CI for Open/NetBSD would be an additional effort which I'd prefer to avoid. I'd rather fix whatever is broken on those platforms and then leave them alone. The code for Open/NetBSD changes pretty rarely anyway.
Have a good time in Denmark. I'm staying in Italy this time, and will spend occasional weekends camping in the nearby (north) mountains. =)

@giampaolo giampaolo added the bsd label Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Process.environ() on FreeBSD
2 participants