Incorrect calls to getrusage() #217

PeterCoghlan · 2017-04-11T15:22:28Z

I had a problem with the CPU timer (SPT and STPT instructions). I found the error which was one I made myself and I fixed it. In the course of investigating the problem, I came across code like this in two locations in clock.c:

    rc = getrusage((int)sysblk.cputid[regs->cpuad], &rusage);
    if (unlikely(rc == -1))
        result = host_tod();
    else
        result  = timeval2etod(&rusage.ru_utime);

The first argument appears to be the thread-id of a CPU thread, however, all the documentation I can find for getrusage() suggests that the first argument is supposed to be a small integer constant and that a thread-id is not valid in this position.

I inserted some debugging code to output the value of rc and errno and found that (under Linux on a Raspberry Pi at least), getrusage() fails every time it is called with rc = -1 and errno = EINVAL. This is in accordance expected behaviour documented in 'man getrusage' on the same platform. If nothing else, this means the use of unlikely() is inappropriate here.

It seems that there is no getrusage() in Windows and a getrusage() replacement is provided in w32util.c. However, this variant appears to have been coded to cope with a thread-id in the first argument position. (I don't have anything running Windows so I can't test whether this theory is correct.) My feeling is that this is not a valid implementation of getrusage() and if it is to be retained the way it is, it should be renamed to something else to avoid confusion.

There is a similar call to getrusage() in hsccmd.c for the qproc command:

            if (getrusage((int)sysblk.cputid[i], &rusage) == 0)
            {
             ...
             }

As no fallback method is provided in this case, it appears to me that under Linux anyway, getrusage() will always fail here and the block of code will never get executed.

There are two more calls to getrusage() in diagmssf.c like this:

            if (IS_CPU_ONLINE(i))
            {
                /* Get CPU times in microseconds */
                getrusage((int)sysblk.cputid[i], &usage);
                uCPU[i] = timeval2us(&usage.ru_utime);
                tCPU[i] = uCPU[i] + timeval2us(&usage.ru_stime);
            }

As there is no check to see if getrusage() failed here and it looks like it always will, on Linux anyway so
results are likely to be incorrect.

I think it may be possible to rework the code in clock.c so that getrusage() is called correctly but I don't see how to do this in the other cases.

My suggestion is to remove the calls to getrusage() in clock.c, hsccmd.c and diagmssf.c and replace them with calls to a new function implemented elsewhere which takes a thread-id as an argument and calls platform specific code depending on a macro set in hostopts.h.

The text was updated successfully, but these errors were encountered:

jphartmann · 2017-04-11T16:25:58Z

Originally, getrusage() got the CPU usage of the process or the process and its children.

As of the linux 2.6.26 kernel, you can also obtain thread usage by:

getrusage(RUSAGE_THREAD_SELF, ...

The getrusage you quote above is not likely to give a meaningful answer on Linux. How about checking the return code? EINVAL means that you are asking for something you cannot get.

Thread usage wast ported to/from FreeBSD 8.1. Other UNIXen must be investigated; it is not clear to me that they would return EINVAL, but it is likely that RUSAGE_THREAD_SELF is undefined on such a beast.

ivan-w · 2017-04-11T16:32:03Z

I read RUSAGE_THREAD (not RUSAGE_THREAD_SELF) in my documentation for getrusage() - but it's a detail.

I'm interested in seeing this through because I'm getting some issues running z/VM under z/VM which MAY be related to a CPU Timer issue (but I haven't finished my analysis yet).

dasdman · 2017-04-11T19:19:09Z

Peter: Be careful on this path (I was the one who originally put getrusage into Hercules for the CPU Timer usage; the architecturally incorrect call to host_tod was the call that was in use at the time the changes were made to use getrusage). Historically, getrusage accepts[ed] the thread ID (but one had to read past the first level *nix documentation). As far as the "int" for /who/ goes, that means 16-bits in a 32-bit compilation, and 32-bits in a 64-bit compilation -- thus actually matching t_id within the system. The reason getrusage is *supposed* to still handle this correctly is that one, by original design, is to be able to get the CPU usage of an individual child thread (or any other thread) within the same process. This does not say, however, that the "gods of Linux" haven't forgotten this and in the "name of security" made changes that should have never been made, or something was munged in the Raspberry Pi porting of Linux. Now the reason why this works[ed] is that thread_ids are positive, while the getrusage /who/ macro definitions use(d) are usually -2, -1, 0, and 1, all of which are special cases that don't occur for user thread id's (http://lxr.free-electrons.com/source/include/uapi/linux/resource.h#L18). It is also possible on the Raspberry Pi that there are other definitions in play that are affecting getrusage. Sidebar: In a quick review of some of the Linux kernel code, it appears that some of the modules have now incorrectly hardcoded some of the thread ids specifically as u32. If this is what is going on with the Raspberry Pi issue you are having, it is specifically an error in Linux, be it either with the getrusage definition or with the kernel. But this also explains why there is other software that fails specifically on the Raspberry Pi. Somebody forgot about certain definitions between the user calls and the kernel calls. In addition, Hercules is Linux based by design, and Windows support is provided as a "layer" to provide equivalent Linux service calls (should this be called "Line" instead of "Wine" ("WINdows Emulator)?). We ended up with the creation of the layer for Windows emulating the Linux calls due to various issues with Cygwin and other utilities under Windows (and, yes, we tracked down many of the issues to fundamental design and implementation flaws in Cygwin, et al.). Placing the Linux specific calls into a separate layer as was done for Windows at this particular point in time is not recommended as there are more productive areas for improvement and advancement of Hercules.

dasdman · 2017-04-11T20:00:36Z

Peter and Ivan: Additional note. If getrusage fails, doublecheck errno for the reason. Also, the definition of /who/ does not preclude thread_id; RUSAGE_THREAD, et al., are actually shorthand indicators such that one does not need the thread_id to obtain the proper results for the indicated thread(s). However, in Hercules, we need to individually get the updates for threads which are not necessarily the running thread.

jphartmann · 2017-04-11T21:13:46Z

Neither the FreeBSD manpage nor the Linux one mention specifying a thread ID. If it "works", that is by the UNIX weenie's definition of "works". Stay away from that.

dasdman · 2017-04-11T21:26:21Z

Just verified both the BSD and Linux kernel source code. BOTH have been modified in the existing call sequence to define our usage as EINVAL. This was not the case at the original point in time, which is why we did not "stay away". Since then there has been chatter about adding a "proper" call on the Linux lists, but I have not found the actual implementation of it in my brief search so far. Grrrr.

ivan-w · 2017-04-11T22:11:58Z

I think there are multiple situations :

Getting CPU/Wait time usage from within the thread (for SPT/STPT)
-- The OS provides a facility for getting thread time info from within the thread (Linux, Windows, ?)
-- The OS does not provide such facility
Getting CPU/Wait time from another thread (diagmss, panel command, panel display,...)
-- The OS provides a facility for getting thread time info from another thread (Windows)
-- The OS provides a facility for getting thread time info from within the thread only (Linux)
-- The OS does not provide such facility

If there is no facility to retrieve thread run time information - there is not much that can be done - but only to fall back to process times or wall times.
For systems that only provide thread time info to self, for a CPU thread, it's probably possible to use some exchange mechanism to request the information from the thread (it's already done for some other purpose such as CPU synchronization, Broadcast PTLB, etc..).

I'm personally more concerned about the 1st case (and its sub-cases)... The rest is usually only needed for information purpose, not for anything nanosecond accurate !

dasdman · 2017-04-12T11:32:45Z

Following some very early morning research, for someone to work on, try, and adapt. Sample replacement code: TOD thread_cputime(const REGS *regs) { TOD result; clockid_t cid; ||struct timespec cputime; int rc; rc = pthread_getcpuclockid(sysblk.cputid[regs->cpuad], &cid); if (likely(rc == 0)) { rc = clock_gettime(cid, &cputime); } result = (likely(rc == 0) ? timeval2etod(&cputime) : host_tod(); return (result); } This code should readily work; Windows equivalent will also need to be created. On another note, the use of the register specification for variables, unfortunately, has been deprecated. With -O0 and -O1, this is a significant loss; with -O2 and -O3, not as much, but I've noticed more than a few instances where GCC no longer optimizes when it should.

jphartmann · 2017-04-12T15:56:36Z

You'll find the standard here:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/getrusage.html

Note that it does not mention RUSAGE_THREAD.

Also note that the contents of the returned structure is not specified.

ivan-w · 2017-04-12T16:20:21Z

John,

Of course it doesn't. RUSAGE_THREAD is a linuxism (and possibly a GNUism). But I cannot find in posix (overall) or posix threads a capability to retrieve thread specific run time information.

The definition of the layout of "rusage" structure isn't defined per se in getrusage(), but the fields the structure should include are defined in :
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_resource.h.html

jphartmann · 2017-04-12T16:24:47Z

getrusage() has been there as long as I remember. It is both in SVr4 and 4.3BSD and thus AIX, but not the thread support, of course.

PeterCoghlan · 2017-04-13T10:04:33Z

I have one more suggestion.

It appears to me that at the moment the fallback method in the event that thread specific CPU time information is not available is to use wall clock time for the CPU timer. My suggestion is that the fallback method should be to use wall clock time less whatever time the particular Hercules CPU in question has spent in a wait state. This ought to be a much better approximation than assuming the CPU has been running all the time.

ivan-w · 2017-04-13T11:00:03Z

Peter,
That would be odd since the CPU timer is supposed to keep running when the CPU is in a wait state.

PeterCoghlan · 2017-04-13T11:20:49Z

Well, now I'm confused!

Do the times returned by getrusage() include the time that a CPU thread spends waiting I wonder? I assumed that they don't and that the reason for having the calls to getrusage() was because the CPU timer only ran when the CPU was executing instructions.

If the CPU timer runs all the time then perhaps using wall clock time is the right thing to do and there is no reason to be calling getrusage() at all?

ivan-w · 2017-04-13T11:30:45Z

Peter,

No.. Because the wall time may not reflect the time the CPU thread is actually executing instructions. The thread may be preempted by the OS to execute another thread, some paging may occur, the CPU instruction execution code may make system calls which may suspend the thread, or the system running hercules may itself be virtualized.

Using the 'process' getrusage would on the other hand mean you are counting for 1 CPU thread the amount of time running ALL threads in hercules between 2 points in time.
The only solution for an accurate CPU Timer maintenance is to be able to get the actual usage time for the thread while it is executing instructions.

PeterCoghlan · 2017-04-13T14:12:26Z

Ok. It sounds like the time spent executing instructions by the CPU thread would need to have the time that spent by the same CPU in a wait state added to it then to come up with an accurate figure.

ivan-w · 2017-04-13T14:39:59Z

Peter..

Yeah... We know that ! The issue (as reported) is exactly that : Getting the time spent by the CPU thread executing instructions. The rest is housekeeping.

ivan-w · 2017-04-13T15:54:57Z

For info... I tried that : (just to see) :

#if defined(RUSAGE_THREAD)
    if(pthread_equal(sysblk.cputid[regs->cpuad],pthread_self()))
    {
        rc = getrusage(RUSAGE_THREAD, &rusage);
    }
    else
#endif
    {
        rc = getrusage((int)sysblk.cputid[regs->cpuad], &rusage);
    }
    if (unlikely(rc == -1))
        result = host_tod();
    else
        result  = timeval2etod(&rusage.ru_utime);

    return (result);

It doesn't work.. AT ALL ! (the CPU Timer gets really crazy values).

(Example : a program running in a loop, disabled...)

(Wait ~16 seconds)
17:45:04 HHC02324I CP00: PSW=0008000080042010 INST=B208C42A SPT 1066(12) set_cpu_timer
17:45:04 HHC02326I CP00: R:00042430:K:06=00000010 00000000 00000000 00000000 ................
...
(1st STPT)
17:45:04 HHC02324I CP00: PSW=0008100080042418 INST=B209C432 STPT 1074(12) store_cpu_timer
17:45:04 HHC02326I CP00: R:00042438:K:06=00000000 00000000 00000010 00000001 ................
(0 ?????)
....
(2nd STPT)
17:45:04 HHC02324I CP00: PSW=0008100080042418 INST=B209C432 STPT 1074(12) store_cpu_timer
17:45:04 HHC02326I CP00: R:00042438:K:06=2D9E184C 009EB600 00000010 00000001 ...<............
(What the ????)
....
(3rd STPT)
17:45:04 HHC02324I CP00: PSW=0008100080042418 INST=B209C432 STPT 1074(12) store_cpu_timer
17:45:04 HHC02326I CP00: R:00042438:K:06=2D9E184B FB0B2700 00000010 00000001 ................
(We're decrementing alright...)

--Ivan

ivan-w · 2017-04-13T16:29:16Z

Side comment... Apparently Posix Threads have a per thread clocking mechanism

pthread_getcpuclockid() used in conjunction with clock_gettime()... Could you use this ?

--Ivan

Fish-Git · 2017-04-13T17:18:26Z

(sigh!) Sorry about that. Here are the corrected (non-framed) links:

clock_getcpuclockid - access a process CPU-time clock (ADVANCED REALTIME)
pthread_getcpuclockid - access a thread CPU-time clock (ADVANCED REALTIME THREADS)
clock_getres, clock_gettime, clock_settime - clock and timer functions

dasdman · 2017-04-13T19:05:18Z

From the comments, it appears that folks are repeating the research already done to even post what I already posted (2017-04-12 11:32:40 UTC) -- well BEFORE the current barrage of responses! Would it be possible to actually move the comments and research ahead?

ivan-w · 2017-04-13T19:18:16Z

True ! My bad - missed your previous comment about pthread_getcpuclockid() et al.

So..
Question 1 : Do we need to invoke the pthread_getcpuclockid() function every time or just at thread creation (and store it in an array - one for each CPU - like we do for the thread id) ?
Question 2 (for Fish) : What would it take to implement a fthread__getcpuclockid() and clock_gettime() ?

--Ivan

jphartmann · 2017-04-13T19:22:18Z

Q1: Yes.

ivan-w · 2017-04-13T19:26:46Z

John,

Why ? Are there any conditions that would cause the clock id returned by the pthread_getcpuclockid() to change within the lifespan of the thread ?

jphartmann · 2017-04-13T19:49:05Z

Sorry. Perhaps you could ask one question only?

Q1: No, Yes.

Fish-Git · 2017-04-13T19:54:26Z

Question 2 (for Fish) : What would it take to implement a fthread__getcpuclockid() and clock_gettime()?

Coding identical functionality in Windows should be trivial I should think. We're basically almost already doing the same thing with our current getrusage() implementation.

Peter-J-Jansen · 2018-01-10T15:19:34Z

I think to have fixed this problem.

The issue was first described by Peter Coghlan (as Issue #217 in the official Hyperion GitHub), and has now been solved along the lines as initially described by Mark "dasdman" L. Gaubatz, and commented on by Ivan, John, and Fish (all of which can be read up on GitHub).

The solution as described by Mark needed some amendments, and I would like to receive feedback on the choices I made in the process. The extra problems to solve were these :

(1) Mark's original getrusage() produced two statistics : thread user time, and thread system time.  Fish's Windows equivalent implementation for getrusage() supplies the Windows thread user and kernel times for these.  Only when running under Windows does the Hercules qproc command correctly show these CPU thread user and kernel times.  This qproc behavior has been retained as is.

(2) The clock_gettime() replacement for getrusage() has been mentioned on the 'net to produce the sum of both user and system time.  I have assumed that this is correct, even though I'd like further confirmation this to be correct.  The Windows clock_gettime() equivalent for thread CPU time therefore returns the sum of the Windows user and kernel times. 

(3) The original "diagmssf.c" code for DIAG 204 delivered the VM virtual CPU time as the getrusage() user time (uCPU); the VM total CPU time (tCPU) was computed as the sum of the getrusage() user+system time; the VM wait time (wCPU) was computed as the difference tCPU-uCPU.  This would imply that the VM wait time equals the getrusage() system time.  I don't think that is correct.  In order to correct this I have added and computed regs->waittime_accumulated in "timer.c" to set wCPU. (This requires OPTION_MIPS_COUNTING to be #defined, but after inspection, I think that is already implicitly required anyway in other places.)  With uCPU set to clock_gettime() thread user+system time, tCPU is now computed as uCPU+wCPU. I believe this delivers the more accurate DIAG 204 data.

(4) The clock_gettime() for thread user+system CPU time needs the thread clockid.  Following the thinking as voiced by Ivan and John, I've added sysblk.cpuclockid[cpu] so that only one pthread_getcpuclockid() call per CPU thread is needed.  

(5) The Windows equivalent for thread clockid simulates this as the negative of the thread_id, and the clock_gettime() equivalent then uses this negative to call the existing Windows getrusage() to obtain and sum the Windows thread user and kernel times.  Fish may wish to improve or beautify this simple but straightforward concept.

I'd welcome feedback on the above.

Cheers,

Peter J.

The issue was first described by Peter Coghlan (as Issue #217 in the official Hyperion GitHub), and has now been solved along the lines as initially described by Mark "dasdman" L. Gaubatz, and commented on by Ivan, John, and Fish (all of which can be read up on GitHub). The solution as described by Mark needed some amendments, and required some extra problems to be solved. These included : (1) Mark's original getrusage() produced two statistics : thread user time, and thread system time. Fish's Windows equivalent implementation for getrusage() supplies the Windows thread user and kernel times for these. Only when running under Windows does the Hercules qproc command correctly show these CPU thread user and kernel times. This qproc behavior has been retained as is. (2) The clock_gettime() replacement for getrusage() has been mentioned on the 'net to produce the sum of both user and system time. I have assumed that this is correct, even though I'd like further confirmation this to be correct. The Windows clock_gettime() equivalent for thread CPU time therefore returns the sum of the Windows user and kernel times. (3) The original "diagmssf.c" code for DIAG 204 delivered the VM virtual CPU time as the getrusage() user time (uCPU); the VM total CPU time (tCPU) was computed as the sum of the getrusage() user+system time; the VM wait time (wCPU) was computed as the difference tCPU-uCPU. This would imply that the VM wait time equals the getrusage() system time. I don't think that is correct. In order to correct this I have added and computed regs->waittime_accumulated in "timer.c" to set wCPU. (This requires OPTION_MIPS_COUNTING to be #defined, but after inspection, I think that is already implicitly required anyway in other places.) With uCPU set to clock_gettime() thread user+system time, tCPU is now computed as uCPU+wCPU. I believe this delivers the more accurate DIAG 204 data. (4) The clock_gettime() for thread user+system CPU time needs the thread clockid. Following the thinking as voiced by Ivan and John, I've added sysblk.cpuclockid[cpu] so that only one pthread_getcpuclockid() call per CPU thread is needed. (5) The Windows equivalent for thread clockid simulates this as the negative of the thread_id, and the clock_gettime() equivalent then uses this negative to call the existing Windows getrusage() to obtain and sum the Windows thread user and kernel times.

There are (were) problems when running VM 2nd level under VM, in that the 2nd level VM would suffer form ABENDs (EXT105, DSP005 and PRG009). They would occur intermittently, even on idling systems. When left that way overnight, the 2nd level VM system would automatically restart, but in the morning the CPDUMP files would show these problems. Additionally, it was observed that the 2nd level VM system would intermittently go into a CPU loop, exhibiting 100% CPU but on a single CPU only. The ABENDs were observed to occur at that time. The analysis of the ABENDs showed a probably incorrect CPU Timer emulation in Hyperion. This was originally believed to be related to Hyperion issue #217, but this was proven to not be the case. Rather, the problem has now been determined to be caused by the CPU Timer implementation being based on thread_cputime(). Such CPU Timer emulation relied on three variables which were lacking being treated in a critical section. Also, basing the CPU Timer emulation on thread_cputime() would imply that a running mainframe CPU Timer would be stopped when the Hyperion host would pre-empt the host CPU thread emulating the mainframe CPU, which is incorrect. Additionally, the Windows emulation of thread_cputime() suffers from a poor resolution (1/16th of a second). This commit reverts the CPU Timer emulation to the earlier one, to not rely on thread_cputime(), but on hw_clock() and hw_tod. The processing around DIAG 204 and CPU MAXRATES has not been altered and still uses thread_cputime_us().

superdave · 2018-02-05T19:55:31Z

There seem to be a few Unix implementations (including OS X 10.11, which I'm running, but also at least Solaris 11) that don't have one or more of pthread_getcpuclockid() or clock_gettime().

superdave · 2018-02-05T20:07:17Z

On earlier OS X, it looks like Mach's thread_info() is the way to go (yuck).

https://github.com/st3fan/osx-10.9/blob/master/xnu-2422.1.72/osfmk/mach/thread_info.h

On Solaris, I'm not quite sure what the answer is; it's surmised that getrusage() with RUSAGE_LWP is probably correct, since a thread is an LWP in Solaris.

I don't think there's a consistent solution for every Unix OS for this since threads arrived well after the various paths of Unix diverged and GNU/Linux does its own thing anyway.

PeterCoghlan · 2018-02-06T10:37:12Z

Thanks to everyone for their contributions to this issue.

I originally reported this issue only because I found that the calls to getrusage() were failing all the time resulting in inefficiency due to falling back to an alternative method of calculating the CPU timer value.

Having read the many responses and studied the issue some more, I have become convinced that trying to use thread specific CPU usage time to calculate the value of the Hercules CPU timer is not the right way to go and will never yield the correct results.

Given that the CPU timer is to run both when the CPU in question is running and when the CPU in question is waiting, I believe that the correct solution is to use a CPU specific offset from wall clock time to calculate the current value of the CPU timer except when the CPU in question is stopped, in which case, it's CPU timer is also stopped.

That methods of calculating thread specific CPU usage are variably implemented and system specific is all the more reason not to use them, on top of their not being suitable for the task at hand.

dasdman · 2018-02-07T06:25:34Z

First off, no implementation of the CPU timer will be "perfect" under emulation. Everything is best effort, and varies according to OS and chipset. The most accurate is provided by *most* operating systems -- the thread time, but that, as you noticed is not always possible. Thus we must delve into secondary methods when and only when that facility is not available, and that is where your suggestion does indeed have merit. But just maintaining a simple offset value is insufficient. A combination of methods is actually needed. Failover is occurring as the getrusage form used was correct WHEN it was introduced, but was using "loopholes" in the specification based on reading the underlying code implementing getrusage. This worked well for many years. The form used by failover is the *ORIGINAL* Hercules implementation for the CPU Timer. The CPU Timer thread needs a mechanism to establish the amount of time the CPU is in a runnable or wait state, vs. not dispatched/running or stopped. The proposed mechanism deals only with stopped. The real reason for using getrusage under Linux and systems supporting the facility is that it provides access to a built in timing facility that does not run when a given (CPU) thread is not running. That is key for the CPU Timer, as the CPU Timer does not run when an LPAR or CPU is *not dispatched/running* on the real hardware, just as a userid's CPU on VM (take a peek at VM's CP source code some time in regards to CPU Timer handling). So under an alternative, what method are you suggesting to deal with this point to achieve somewhat accurate times?

…

On 02/06/2018 02:37 AM, PeterCoghlan wrote: Thanks to everyone for their contributions to this issue. I originally reported this issue only because I found that the calls to getrusage() were failing all the time resulting in inefficiency due to falling back to an alternative method of calculating the CPU timer value. Having read the many responses and studied the issue some more, I have become convinced that trying to use thread specific CPU usage time to calculate the value of the Hercules CPU timer is not the right way to go and will never yield the correct results. Given that the CPU timer is to run both when the CPU in question is running and when the CPU in question is waiting, I believe that the correct solution is to use a CPU specific offset from wall clock time to calculate the current value of the CPU timer except when the CPU in question is stopped, in which case, it's CPU timer is also stopped. That methods of calculating thread specific CPU usage are variably implemented and system specific is all the more reason not to use them, on top of their not being suitable for the task at hand.

PeterCoghlan · 2018-02-07T11:34:43Z

The understanding I have come to is that "CPU" in the title "CPU timer" is there purely to indicate that each CPU has one. It is not there to suggest that this timer has anything to do with measuring anything associated with the CPU. It is just a timer. It's purpose is to measure time, like the clock on the wall. It should keep going at the same rate whether the CPU is running or waiting and it only stops when the CPU is manually stopped.

It could be argued that it is a more realistic emulation of hardware by Hercules to slow a particular CPU timer when the associated Hercules CPU emulation thread is not getting much execution time but I disagree. A real hardware CPU timer is not going to diverge from running at wall clock speed as long as the associated CPU is not manually stopped. Attempting to measure CPU usage of CPU threads may lead to more realistic emulation in one respect but it is being traded for a loss of accuracy elsewhere. I feel the smaller gain in accuracy does not justify the larger loss of accuracy.

Regarding CP emulation of CPU timers, the CPU that CP is emulating has the same characteristics, architecture and instruction set as the real CPU running the emulation so perhaps it may be feasable for CP to adjust it's emulated CPU timers based on what it's emulated CPUs are doing. However, in Hercules, the CPU(s) running the emulation is/are complete unknown quantities with characteristics, architectures and instruction sets which are almost always completely different to the CPUs being emulated. I feel it is unrealistic and prone to error to equate usage of CPU time by an emulation thread in the Hercules environment with emulated CPU time made available by that thread.

jphartmann · 2018-02-07T12:11:32Z

In an overcommitted LPAR environment, the CPU timer will not follow the wall clock.

Incidentally, there is no longer a CPU timer register; the value is computed by millicode on demand.

PeterCoghlan · 2018-02-07T12:29:43Z

Can I put it another way and say that when we are not emulating an LPAR environment, the CPU timer will follow the wall clock?

If it is desirable to to be able to very accurately emulate an overcommitted LPAR environment, I think we need a way to measure emulated CPU time made available by Hercules CPU emulation threads. I don't think measuring the CPU usage time of the unknown quantity emulating CPU or CPUs is the way to do this. Perhaps it would be possible to assign a theoretical time taken by each instruction emulated by Hercules, individually sum the times required for all the instructions emulated by each particular CPU thread, add time spent waiting by each emulated CPU (should that be wall clock time waiting or some other definintion of time waiting based on what?) and use that figure to drive the CPU timer for each Hercules emulated CPU? Will this produce a more accurate result in the case of an overcommitted LPAR environment? Perhaps, depending on the accuracy of the instruction timings that were assigned but noting that in real hardware, differing operand data will likely result in differing timings and we have not accounted for that. Will it produce a more accurate result in a non-LPAR environment? If it is more accurate than using the wall clock, I'll eat my hat.

ivan-w · 2018-02-07T14:56:41Z

In an overcommitted LPAR environment, the CPU timer will not follow the wall clock.

That's not what the POO says...

When the TOD-clock-steering facility is not installed,
and both the CPU timer and the TOD clock are running,
the stepping rates are synchronized such that
both are stepped at the same rate.

So unless someone issued a PTFF instruction (to account for a leap second or a NTP update..)

And no matter what...

How can the CPU timer run SLOWER when executing instructions rather than being in a wait state ?

--Ivan

jphartmann · 2018-02-07T15:23:04Z

In an LPAR environment, a partition can be suspended; in that state its CPU timer is not decremented.

As there no longer are CPU timer registers in the hardware, it is futile to argue what something that isn't there does or does not.

PeterCoghlan · 2018-02-07T15:25:21Z

There is not necessarily a conflict there. Perhaps the CPU timers (all of them) and TOD clock will all run equally slow in an overcommitted LPAR environment when compared to the wall clock?

Thinking some more, if all the CPU timers in a particular environment - LPAR, Hercules instance or whatever don't run at the same rate all the time (excluding the stopped CPU case), timing similar events on different (emulated) CPUs using the CPU timer will yield different results. This is unlikely to be the desired effect.

Here's another case to chew on. Consider Hercules emulating two CPUs while running single CPU hardware. Using the thread CPU usage model, each emulated CPU timer could end up running at less than half wall clock speed. Exactly the same setup running on a four CPU host would likely have it's emulated CPU timers running much faster.

ivan-w · 2018-02-07T15:41:10Z

In an LPAR environment, a partition can be suspended; in that state its CPU timer is not decremented.

That's the equivalent of a manual stopped state... Nothing to do with a CPU overcommit in a PR/SM or z/VM hypervisor...

As there no longer are CPU timer registers in the hardware, it is futile to argue what something that isn't there does or does not.

That's an implementation detail of some three letter vendor - nothing to do with what this implementation does !

Fish-Git · 2018-02-07T15:51:21Z

For those of us with little or no experience with LPAR hardware, what is "an overcommitted LPAR environment"?

ivan-w · 2018-02-07T16:31:48Z

For those of us with little or no experience with LPAR hardware, what is "an overcommitted LPAR environment"?

That's a good question. I understand memory overcommit (where more memory is allocated than what is available) - which may entail paging/swapping if the hypervisor allows it - and permanent storage overcommit in the case of thin provisionned storage resources - but CPU ?

I guess it would mean the sum of LPAR absolute share exceed 100% of the available resources in the assigned resource pool... But that's dubious !

Peter-J-Jansen · 2018-02-13T15:30:12Z

Am I correct in assuming that when an LPAR partition is in the suspended state, all it's CPU's are in the stopped state ? If not, in what state would all it's CPU's be otherwise ?

The PoP specifies that a CPU timer is only not being decremented when it's CPU is in the stopped or check-stop states, and I believe that this is being catered for in the Hercules implementation.

The issue was first described by Peter Coghlan (as Issue #217 in the official Hyperion GitHub), and has now been solved along the lines as initially described by Mark "dasdman" L. Gaubatz, and commented on by Ivan, John, and Fish (all of which can be read up on GitHub). The solution as described by Mark needed some amendments, and required some extra problems to be solved. These included : (1) Mark's original getrusage() produced two statistics : thread user time, and thread system time. Fish's Windows equivalent implementation for getrusage() supplies the Windows thread user and kernel times for these. Only when running under Windows does the Hercules qproc command correctly show these CPU thread user and kernel times. This qproc behavior has been retained as is. (2) The clock_gettime() replacement for getrusage() has been mentioned on the 'net to produce the sum of both user and system time. I have assumed that this is correct, even though I'd like further confirmation this to be correct. The Windows clock_gettime() equivalent for thread CPU time therefore returns the sum of the Windows user and kernel times. (3) The original "diagmssf.c" code for DIAG 204 delivered the VM virtual CPU time as the getrusage() user time (uCPU); the VM total CPU time (tCPU) was computed as the sum of the getrusage() user+system time; the VM wait time (wCPU) was computed as the difference tCPU-uCPU. This would imply that the VM wait time equals the getrusage() system time. I don't think that is correct. In order to correct this I have added and computed regs->waittime_accumulated in "timer.c" to set wCPU. (This requires OPTION_MIPS_COUNTING to be #defined, but after inspection, I think that is already implicitly required anyway in other places.) With uCPU set to clock_gettime() thread user+system time, tCPU is now computed as uCPU+wCPU. I believe this delivers the more accurate DIAG 204 data. (4) The clock_gettime() for thread user+system CPU time needs the thread clockid. Following the thinking as voiced by Ivan and John, I've added sysblk.cpuclockid[cpu] so that only one pthread_getcpuclockid() call per CPU thread is needed. (5) The Windows equivalent for thread clockid simulates this as the negative of the thread_id, and the clock_gettime() equivalent then uses this negative to call the existing Windows getrusage() to obtain and sum the Windows thread user and kernel times. Fish may wish to improve or beautify this simple but straightforward concept.

Closes https://github.com/Fish-Git/hyperion/pull/83 There are (were) problems when running VM 2nd level under VM, in that the 2nd level VM would suffer form ABENDs (EXT105, DSP005 and PRG009). They would occur intermittently, even on idling systems. When left that way overnight, the 2nd level VM system would automatically restart, but in the morning the CPDUMP files would show these problems. Additionally, it was observed that the 2nd level VM system would intermittently go into a CPU loop, exhibiting 100% CPU but on a single CPU only. The ABENDs were observed to occur at that time. The analysis of the ABENDs showed a probably incorrect CPU Timer emulation in Hyperion. This was originally believed to be related to Hyperion issue hercules-390/hyperion#217, but this was proven to not be the case. Rather, the problem has now been determined to be caused by the CPU Timer implementation being based on thread_cputime(). Such CPU Timer emulation relied on three variables which were lacking being treated in a critical section. Also, basing the CPU Timer emulation on thread_cputime() would imply that a running mainframe CPU Timer would be stopped when the Hyperion host would pre-empt the host CPU thread emulating the mainframe CPU, which is incorrect. Additionally, the Windows emulation of thread_cputime() suffers from a poor resolution (1/16th of a second). This commit reverts the CPU Timer emulation to the earlier one, to not rely on thread_cputime(), but on hw_clock() and hw_tod. The processing around DIAG 204 and CPU MAXRATES has not been altered and still uses thread_cputime_us().

Various fixes for local 3270 devices connected using tn3270:

PeterCoghlan mentioned this issue Apr 12, 2017

Incorrect calls to getrusage() SDL-Hercules-390/hyperion#10

Closed

Peter-J-Jansen mentioned this issue Jan 10, 2018

Incorrect CPU Timer - Now Corrected SDL-Hercules-390/hyperion#79

Closed

Peter-J-Jansen mentioned this issue Jan 23, 2018

CPU Timer Fix Stopping 2nd Level VM ABENDs SDL-Hercules-390/hyperion#83

Closed

Peter-J-Jansen mentioned this issue Feb 13, 2018

macOS 10.13.3 Clang - Undefined symbols for architecture x86_64 #246

Closed

gappelman pushed a commit to gappelman/hyperion that referenced this issue Apr 10, 2020

Merge pull request hercules-390#217 from Juergen-Git/master

018606e

Various fixes for local 3270 devices connected using tn3270:

Fish-Git mentioned this issue Jun 17, 2023

Misuse of the getrusage() function SDL-Hercules-390/hyperion#573

Closed

Incorrect calls to getrusage() #217

Incorrect calls to getrusage() #217

Comments

PeterCoghlan commented Apr 11, 2017

jphartmann commented Apr 11, 2017

ivan-w commented Apr 11, 2017

dasdman commented Apr 11, 2017 via email • edited Loading

dasdman commented Apr 11, 2017 via email

jphartmann commented Apr 11, 2017

dasdman commented Apr 11, 2017 via email

ivan-w commented Apr 11, 2017

dasdman commented Apr 12, 2017 via email

jphartmann commented Apr 12, 2017

ivan-w commented Apr 12, 2017

jphartmann commented Apr 12, 2017

PeterCoghlan commented Apr 13, 2017

ivan-w commented Apr 13, 2017

PeterCoghlan commented Apr 13, 2017

ivan-w commented Apr 13, 2017

PeterCoghlan commented Apr 13, 2017

ivan-w commented Apr 13, 2017

ivan-w commented Apr 13, 2017 • edited Loading

ivan-w commented Apr 13, 2017

Fish-Git commented Apr 13, 2017 • edited Loading

dasdman commented Apr 13, 2017 • edited Loading

ivan-w commented Apr 13, 2017

jphartmann commented Apr 13, 2017

ivan-w commented Apr 13, 2017

jphartmann commented Apr 13, 2017

Fish-Git commented Apr 13, 2017 • edited Loading

Peter-J-Jansen commented Jan 10, 2018

superdave commented Feb 5, 2018

superdave commented Feb 5, 2018

PeterCoghlan commented Feb 6, 2018

dasdman commented Feb 7, 2018 via email

PeterCoghlan commented Feb 7, 2018

jphartmann commented Feb 7, 2018

PeterCoghlan commented Feb 7, 2018

ivan-w commented Feb 7, 2018

jphartmann commented Feb 7, 2018

PeterCoghlan commented Feb 7, 2018

ivan-w commented Feb 7, 2018

Fish-Git commented Feb 7, 2018

ivan-w commented Feb 7, 2018 • edited Loading

Peter-J-Jansen commented Feb 13, 2018

dasdman commented Apr 11, 2017 via email •

edited

Loading

ivan-w commented Apr 13, 2017 •

edited

Loading

Fish-Git commented Apr 13, 2017 •

edited

Loading

dasdman commented Apr 13, 2017 •

edited

Loading

Fish-Git commented Apr 13, 2017 •

edited

Loading

ivan-w commented Feb 7, 2018 •

edited

Loading