Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auxiliary threads eating up CPU time #589

Closed
ivan-w opened this issue Aug 2, 2023 · 25 comments
Closed

Auxiliary threads eating up CPU time #589

ivan-w opened this issue Aug 2, 2023 · 25 comments
Assignees
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected. TXF Bug related to, or likely caused by, our current Transaction-Execution Facility implementation (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it.

Comments

@ivan-w
Copy link
Member

ivan-w commented Aug 2, 2023

On my ARM system (4 cores) I have 1 instance of hercules running (z/arch)

I see 2 threads running 100% of a core :

  • panel_display
  • rubato_thread

I did issue a timerint 100000 - but still the same

I understand the rubato_thread is transaction facility related.. but is it normal they (timer and.. rubato ?) both are is taking a full core ?

--Ivan

@Fish-Git Fish-Git changed the title Auxialry threads eating up up CPU time Auxiliary threads eating up CPU time Aug 2, 2023
@Fish-Git Fish-Git added the Researching... The issue is being looked into or additional information is being gathered/located. label Aug 2, 2023
@Fish-Git
Copy link
Member

Fish-Git commented Aug 2, 2023

Ivan (@ivan-w)

Just out of curiosity, please apply the following research patch,
and tell me if your system crashes or not:

 
(EDIT: below patch was edited from original posting)

--- hyperion-git/timer.c	2023-06-11 11:04:10.508749500 -0700
+++ hyperion-1/timer.c	2023-08-02 10:21:04.960290000 -0700
@@ -322,16 +322,36 @@
         } /* end if (intv_secs >= one_sec) */
 
         /* Sleep for another timer update interval... */
-
+        {
+            int usecs, rc;
+            bool txf;
 #if defined( _FEATURE_073_TRANSACT_EXEC_FACILITY )
         /* Do we need to temporarily reduce the frequency of timer
            interrupts? (By waiting slightly longer than normal?)
         */
         if (txf_PPA)
-            usleep( sysblk.txf_timerint );
+        {
+            txf = true;
+            usecs = sysblk.txf_timerint;
+        }
         else
 #endif
-            usleep ( sysblk.timerint );
+        {
+            txf = false;
+            usecs = sysblk.timerint;
+        }
+
+        rc = usleep( usecs );
+
+        if (rc != 0)
+        {
+            LOGMSG("+++ TIMER_THREAD: usleep( %d ) ==> %d!! (usecs %d from %s)\n", usecs, rc,
+                usecs, txf ? "sysblk.txf_timerint" : "sysblk.timerint" );
+            SLEEP(1);
+            CRASH();
+        }
+
+        }
 
     } /* end while */
 
@@ -429,7 +449,16 @@
             */
             release_lock( &sysblk.rublock );
             {
-                usleep( sysblk.txf_timerint );
+                int rc;
+                rc = usleep( sysblk.txf_timerint );
+
+                if (rc != 0)
+                {
+                    LOGMSG("+++ RUBATO_THREAD: usleep( %d ) ==> %d!!\n",
+                        sysblk.txf_timerint, rc );
+                    SLEEP(1);
+                    CRASH();
+                }
             }
             obtain_lock( &sysblk.rublock );
         }

If it crashes, please report the LOGMSG that got logged just before it did.

Thanks.

@Fish-Git
Copy link
Member

Fish-Git commented Aug 2, 2023

I did issue a timerint 100000 - but still the same

We shouldn't be allowing a timerint value of 1,000,000 (one million). According to everything I'm reading, the number of microseconds that is specified for usleep must be less than 1,000,000 (one million).

We should probably define MAX_TOD_UPDATE_USECS (in featall.h) to 999,999 instead (and update our documentation accordingly).

@Fish-Git
Copy link
Member

Fish-Git commented Aug 2, 2023

@ivan-w: Please note the updated research patch: I added a test in the rubato_thread as well.

Fish-Git added a commit that referenced this issue Aug 5, 2023
The root cause of the bug was in my incorrect #define of MAX_TOD_UPDATE_USECS in featall.h, which was #defined to 1000000 (one million) instead of 999999, thereby causing the TXF rubato_thread to do a usleep for 1000000 (one million) microseconds, which, being invalid, was causing it to not sleep at all and instead return immediately with an error, causing it to loop continuously without sleeping, consuming all available host CPU due to the timer threads being the highest priority thread in Hercules.

comm3705.c also had a similar bug wherein it was using "min(1000000,delay)" for one of its usleep().

In addition to fixing the previously mentioned bug, I have also introduced a new USLEEP() macro which calls a new "herc_usleep()" function that will report any usleep() error that might occur. (I purposely did not code it to fix (hide) the problem but rather to just report it so that the caller can thus be fixed to not make invalid usleep calls.)

This should close GitHub Issue #589 "Auxiliary threads eating up CPU time" as well as the long running "CPU utilization on Raspberry Pi" thread that took place in the main Hercules group recently.
@Fish-Git Fish-Git added BUG The issue describes likely incorrect product functionality that likely needs corrected. and removed Researching... The issue is being looked into or additional information is being gathered/located. labels Aug 5, 2023
@Fish-Git
Copy link
Member

Fish-Git commented Aug 5, 2023

Fixed by commit 87d04ed.

Closing.

@Fish-Git Fish-Git closed this as completed Aug 5, 2023
@Peter-J-Jansen
Copy link
Collaborator

Peter-J-Jansen commented Aug 6, 2023

I presume this is an important enough bug to start using this latest commit for it. But unfortunately the Hercules log now shows numerous HHC00075E Error in function usleep(): Interrupted system call error messages, or can I just ignore this for the time being?

Cheers,
Peter

@Peter-J-Jansen Peter-J-Jansen reopened this Aug 6, 2023
@Fish-Git
Copy link
Member

Fish-Git commented Aug 6, 2023

Should hopefully be fixed now by commit c962c5d

Re-closing.

@Fish-Git Fish-Git closed this as completed Aug 6, 2023
@ivan-w
Copy link
Member Author

ivan-w commented Aug 6, 2023

And now the threads are using DOUBLE of what they used to (on an intel xeon system).. and even more on my aarch64 system!

@Fish-Git
Copy link
Member

Fish-Git commented Aug 6, 2023

And now the threads are using DOUBLE of what they used to (on an intel xeon system).. and even more on my aarch64 system!

Are you sure you're using commit c962c5d?

Are there any HHC00075E messages being issued? If so, what are they saying? They should be reporting where the problematic USLEEP() is occurring.

If you ARE using commit c962c5d, AND you are not seeing any HHC00075E messages however, then I don't know what the fuck is going on, and I give up.

In fact, I'm strongly considering resigning completely from Hercules development altogether. I'm serious. This Linux shit is getting old REAL fucking fast.

You're a fucking Hercules developer. YOU figure it out and fucking fix it.

@Peter-J-Jansen
Copy link
Collaborator

After reading up on "usleep" for Ubuntu and some other Linux variants, I think that automatically retrying following a non-zero RC should not be done. Even the error reporting could perhaps be useful only in the case of errno!=EINTR. If I had known this prior to my "HHC00075E" message, some frustrations could have been avoided. I'm sorry about that, with apologies to both Fish and Ivan.

Cheers,

Peter

@Fish-Git
Copy link
Member

Fish-Git commented Aug 7, 2023

I think that automatically retrying following a non-zero RC should not be done.

Then what should be done?! How should EINTR be handled?

There is no difference between us automatically retrying it, and the caller not sleeping and calling us again. Either way, the usleep will be immediately retried. Over and over and over again. Continuously. Causing host CPU to be unnecessarily consumed.

Same with EINVAL too.

For EINVAL, we could report it, and then attempt a valid usleep, but even if we do a valid usleep (i.e. even if we fix the caller's error so that EINVAL does not occur), we are still left with the possibility that EINTR could occur!

What do we do if/when EINTR occurs?!

If retrying it keeps continuing to fail, then we're screwed! We're fucked! We can't call nanosleep instead, because the exact same thing will occur! Should we change it to a sleep call instead (which sleeps for a number of WHOLE SECONDS)? Based on what I'm reading, it's guaranteed to always succeed and never fail. Why does usleep and nanosleep fail? How can we RELIABLY sleep for less than one second?!

@wrljet
Copy link
Member

wrljet commented Aug 7, 2023

What do I need to do to test/try this?

With the latest commit, on a Debian x86_64 VM, I don't see any error msgs in the Hercules log, and just sitting there not IPLed, it's using very little CPU.

Bill

@Fish-Git
Copy link
Member

Fish-Git commented Aug 8, 2023

What do I need to do to test/try this?

I have no freaking idea.

@ivan-w never attached a Hercules log.

Peter attached a log to his comment that reports his version as:

14:34:41 HHC01417I Built with: Clang 10.0.0
14:34:41 HHC01417I Build type: GNU/Linux x86_64 host architecture build
[...]
14:34:41 HHC01417I Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 cmpxchg16 hatomics=C11
14:34:41 HHC01417I Running on: pjamdz3a (Linux-5.15.0-78-generic x86_64) MP=16

Why it works fine on some systems but not on others I have no freaking idea. All I know is shit like this makes me scream and want to pull out what little gray hair I have left, and after dealing with it for the past 20 years I'm getting damn sick and tired of it.

@ivan-w
Copy link
Member Author

ivan-w commented Aug 8, 2023

Ok ok. Let's be calm about all this !

It is a problem that affects less than 1% of people.. I have a PI 4B and - well.. It sill works just fine !

The entire thing is about forensic (or external observation) of threads that seem to be consuming more than the usual CPU cycles..

It's an oddity.. it's a strange behavior..

I'd like to get to the bottom of this (because of personal pride), but it doesn't need the immediate and urgent attention of anyone as far as I understand (except me maybe)...

I'll keep the issue open and assign it to mysef.. Again there is no immediate or urgent action needed (classify this as "odd behavior, no important or immediate action needed")..

@ivan-w ivan-w self-assigned this Aug 8, 2023
@ivan-w ivan-w added Researching... The issue is being looked into or additional information is being gathered/located. (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it. TXF Bug related to, or likely caused by, our current Transaction-Execution Facility implementation and removed BUG The issue describes likely incorrect product functionality that likely needs corrected. labels Aug 8, 2023
@Fish-Git
Copy link
Member

Fish-Git commented Aug 8, 2023

Maybe the problem is how CPU utilization is being measured or similar type bug?

@Peter-J-Jansen
Copy link
Collaborator

Peter-J-Jansen commented Aug 8, 2023

How should EINTR be handled?

I propose that the current implementation behavior not be changed, i.e. do nothing. The usleep man pages (like here), state that EINTR is a signal interruption and that usleep's behavior is then unspecified. As the problem as reported is clearly not EINTR related, I propose to not attempting to correct a non-existing problem. I suspect that an EINTR interrupted usleep may effectively result in a shorter than requested sleep time. Just retrying the usleep, until one is not EINTR interrupted, will cause effective sleep times significantly longer than the originally requested sleep time. This would explain the reported doubled thread times.

The man page also mentions that usleep is obsolete and should be replaced with nanosleep, which does have the ability to restart after an EINTR interruption for the remainder of the time that the thread was not yet sleeping. When replacing usleep with re-started nanosleep for the remaining sleep times, the result can guarantee that the requested sleep time is being observed, but in general the sleep time will be larger that the actually requested sleep time. Also usleep when not EINTR interrupted (i.e. ending with rc==0) behaves that way, but a multiple times interrupted nanosleep may end up with an effective sleep time which is a lot more than the originally requested sleep time.

To me it is clear that neither usleep, nor even (repetitive) nanosleep, can guarantee exact sleep times: I suspect real time required accuracy sleep implementations would need to resort to more elaborate or different implementations.

In short: I propose to not alter the current usleep behavior when an EINTR occurs, as we currently have no known or reported problem with it.

I have therefore tested the following fix, which should get rid of the doubled thread times:

--- SDL-hyperion/hscutl.c	2023-08-08 10:39:00.713698533 +0200
+++ hscutl_pjj_fix.c	2023-08-08 10:36:26.469081326 +0200
@@ -609,13 +609,9 @@
 {
     int  rc, save_errno;
 
-    while (1
+    if (1
         && (rc = usleep( usecs )) != 0
-        && (save_errno = errno) == EINTR
-    )
-        continue;
-
-    if (rc != 0)
+        && (save_errno = errno) != EINTR)
     {
         char fnc[128], msg[128];

The above has been working fine on my Ubuntu 20.04 system: I will try it next on my Raspberri PI 4 - 8GB Raspbian OS 64-bit system (i.e. Debian based).

Us trying to support so many different OS's is bound to cause difficulties. Even the original bug of an usleep time exceeding 999999 micro seconds apparently will not be problem on all Linux or Unix systems. In retrospect, we could've just fixed that bug, without adding the extra macro to check for EINVAL invalid sleep times, as the vast majority of the usleep() invocations are with hardcoded numbers or constants less than 1000000.

Cheers,

Peter

@Fish-Git
Copy link
Member

Fish-Git commented Aug 8, 2023

In short: I propose to not alter the current usleep behavior when an EINTR occurs, as we currently have no known or reported problem with it.

I have therefore tested the following fix, which should get rid of the doubled thread times:

<snip patch>

The above has been working fine on my Ubuntu 20.04 system: I will try it next on my Raspberri PI 4 - 8GB Raspbian OS 64-bit system (i.e. Debian based).

So basically ignore EINTR altogether and only report any error that is not EINTR? Sounds good to me! Unless someone else objects or comes up with a better solution, I say go ahead and make the change, Peter! Thanks!

Us trying to support so many different OS's is bound to cause difficulties.

No shit!  :)

Even the original bug of an usleep time exceeding 999999 micro seconds apparently will not be problem on all Linux or Unix systems.

Yes, I saw that too. But we of course cannot rely on that. In my own quick testing (on Windows) before I fixed MAX_TOD_UPDATE_USECS for example, the rubato_thread would set sysblk.txf_timerint to 1000000 (one million), resulting in a EINVAL error and no sleep occurring. And of course the timer_thread would end up using that bad sysblk.txf_timerint value in its usleep too, causing the exact same problem to occur, thereby compounding the problem, leading to a hard CPU loop in both threads, consuming all available host CPU time for each of them.

Thus the reason for reporting all EINVAL (i.e. non-EINTR) errors, so such bugs could be identified and eventually fixed.

But I figured EINTR was only a temporary error that likely would not occur on retry. If however, it continually occurs (i.e. if the cause cannot be "cleared" somehow), then yes, we should probably ignore it I guess (which is what your proposed patch does and is what I agree we should probably do instead).

When replacing usleep with re-started nanosleep for the remaining sleep times, the result can guarantee that the requested sleep time is being observed, but in general the sleep time will be larger that the actually requested sleep time.

I would rather have the thread sleep for slightly longer than requested than for it to not sleep at all! I believe it is the "not sleeping at all" that is causing the unusually high CPU time to be consumed. With that in mind, I believe we should strongly consider changing our herc_usleep function to call nanosleep instead. At least we can then guarantee that the thread will sleep for at least the requested amount of time.

So... which should we do? Quick fix herc_usleep to not retry EINTR? (i.e. your proposed change/fix), or change it to do a restartable nanosleep instead?

Me? I vote for the nanosleep approach (with the same check for, and reporting of, EINVAL).

How do others feel?

(Yes, I've calmed down a bit now. I've just been having a lot of bad luck with Linux lately, which was greatly affecting me mood-wise. I think the recent threads on the main Hercules group these past few weeks got to me. I apologize to you and Ivan for blowing up like that, but I find myself becoming upset much more easily as I get older, so if it happens again, just remind me to take a few deep breaths.)  ;-)

@ivan-w
Copy link
Member Author

ivan-w commented Aug 8, 2023

We all have our bad hair day - no ,biggy !

Peter-J-Jansen added a commit that referenced this issue Aug 8, 2023
usleep() should not be retried when rc!=0, and only issue HHC00075E
when errno!=EINTR.  This approach may be improved upon alltogether
by replacing the usleep() call in herc_usleep with nanosleep().
@Peter-J-Jansen
Copy link
Collaborator

Peter-J-Jansen commented Aug 8, 2023

Couldn't agree with you more Ivan!

And I also agree with Fish in that the nanosleep() approach is the better solution, also in order to stop using the deprecated obsolete usleep(). Nevertheless I just committed my proposed stop-gap solution as my tests were successful, so that Ivan can check whether the doubled thread times indeed disappear. My tests neither showed HHC00075E's or increased CPU times.

When nanosleep() calls need to be repeated due to EINTR interrupts, I think one could perhaps generate a warning message, or even just an informational messages, when such repeats occur more than say 3 times or so for a given herc_usleep call.

Cheers,

Peter

@Fish-Git
Copy link
Member

Fish-Git commented Aug 8, 2023

And I also agree with Fish in that the nanosleep() approach is the better solution, also in order to stop using the deprecated obsolete usleep().

Good point. I remember seeing that too. usleep was originally defned as a legitimate POSIX function, but was later removed and replaced with a recommendation to use nanosleep instead.

When nanosleep() calls need to be repeated due to EINTR interrupts, I think one could perhaps generate a warning message, or even just an informational messages, when such repeats occur more than say 3 times or so for a given herc_usleep call.

Another excellent idea. The EINTR could occur more than once for a single call, and we don't want to issue a (warning? informational?) message each time. A message should only be issued if the EINTR occurs more than 'n' number of times in a row (there 'n' is some value such as 3 as you suggested), and even then, it should be issued only once. (If it occurs, say, 10 times in a row on a given call, the (warning? informational?) message should only be issued once.)

Since you suggested it, may I presume you will implement that suggested change, Peter, after Ivan has confirmed your temporary stop-gap solution resolves his double CPU issue?

I should also mention that, depending on how often EINTR occurs during normal Hercules processing, that such a "restartable" (retry) nanosleep approach will of course more than likely cause a noticeable increase in CPU time! How much of an increase it will cause is unknown, but there will of course be an increase. Hopefully it will only be a slight (insignificant) increase, but we won't know that until we actually give it a try.

Maybe it can be coded to skip the restart (retry) if the remaining number of nanoseconds is less than a certain threshold value? What should that threshold value be? 1000? (i.e. one microsecond?) Or some slightly larger value?

@Peter-J-Jansen
Copy link
Collaborator

Peter-J-Jansen commented Aug 11, 2023

Even whist awaiting feedback on my last stop-gap commit bf1a5d7 I tried a nanosleep() replacement for usleep() but one that was not good, as it left two threads running at 100% even after performing a shutdown command on my Ubuntu 20.04 system.

The nanosleep() implementation I tried went as follows:

--- SDL-hyperion/hscutl.c       2023-08-11 15:55:40.673462497 +0200
+++ hscutl.c    2023-08-11 12:54:50.518908396 +0200
@@ -607,24 +607,54 @@
 /* Hercules microsecond sleep */
 DLL_EXPORT int herc_usleep( useconds_t usecs, const char* file, int line )
 {
-    int  rc, save_errno;
 
-    if (1
-        && (rc = usleep( usecs )) != 0
-        && (save_errno = errno) != EINTR)
+#define USLEEP_MIN 1
+#define NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD 3
+
+    int  rc, save_errno, eintr_retrys = 0;
+
+    struct timespec usecs_left;
+    usecs_left.tv_nsec += usecs * 1000L;
+    usecs_left.tv_sec += usecs_left.tv_nsec / 1000000000;
+    usecs_left.tv_nsec %= 1000000000;
+
+    while ( 1
+        && ( rc = nanosleep( &usecs_left, &usecs_left ) != 0 )
+        && ( 0 
+           || usecs_left.tv_sec != 0  
+           || usecs_left.tv_nsec > ( USLEEP_MIN * 1000 ) )                 
+        && ( eintr_retrys++ ) 
+        && ( save_errno = errno ) != EINTR );
+
+    if ( 0
+       || rc != 0  
+       || eintr_retrys > NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD )            
     {
         char fnc[128], msg[128];
 
         MSGBUF( fnc, "USLEEP() at %s(%d)",
             TRIMLOC( file ), line);
+         
+        if ( save_errno != EINTR )
+        {
+            MSGBUF( msg, "rc=%d, errno=%d: %s",
+                rc, save_errno, strerror( save_errno ));
+
+            // "Error in function %s: %s"
+            WRMSG( HHC00075, "E", fnc, msg );
+            errno = save_errno;
+        }
 
-        MSGBUF( msg, "rc=%d, errno=%d: %s",
-            rc, save_errno, strerror( save_errno ));
+        if ( eintr_retrys > NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD )
+        {
+            MSGBUF( msg, "rc=%d, EINTR retrys count=%d",
+                rc, eintr_retrys );
 
-        // "Error in function %s: %s"
-        WRMSG( HHC00075, "E", fnc, msg );
-        errno = save_errno;
-    }
+#define HHC00092 "Warning in function %s: %s" 
+            // "Warning in function %s: %s"           
+            WRMSG( HHC00092, "W", fnc, msg );
+        }
+    }     
     return rc;
 }

But as stated, not good. (The #define statements above would need to be placed in the correct files.)

Cheers,

Peter


P.S. I will be off-line in a few days time for about 5 weeks (only reading s-mails now and then), but before that I'll still open an Issue which is now perfectly reproducible, but one I have not able to been able to fix. So in case you're bored, please check out GitHub Issue #590: "Guest IPL under z/VM causes FRE016 Hard Abend".

@Fish-Git
Copy link
Member

Fish-Git commented Aug 11, 2023

The nanosleep() implementation I tried went as follows:

Surprisingly, when I compile the above with clang, I get no warnings whatsoever!

When I compile it with gcc however, I correctly get the following warnings:

hscutl.c: In function ‘herc_usleep’:
hscutl.c:617:24: warning: ‘usecs_left.tv_nsec’ is used uninitialized [-Wuninitialized]
hscutl.c:616:21: note: ‘usecs_left’ declared here
hscutl.c:618:23: warning: ‘usecs_left.tv_sec’ is used uninitialized [-Wuninitialized]
hscutl.c:616:21: note: ‘usecs_left’ declared here
hscutl.c:638:12: warning: ‘save_errno’ may be used uninitialized [-Wmaybe-uninitialized]

Trying to compile it on Windows also correctly fails with:

hscutl.c(616) : error C2220: warning treated as error - no 'object' file generated
hscutl.c(616) : warning C4700: uninitialized local variable 'usecs_left' used
NMAKE : fatal error U1077: '"cl.EXE"' : return code '0x2'

(it still misses the "may be used uninitialized" for save_erro though, but then I think it has unfortunately always done that.)

Perhaps the above bug(s) are what was causing your problem Peter?

@Fish-Git
Copy link
Member

Fish-Git commented Aug 11, 2023

FYI:  The following seems to work well for me on Kubuntu 21.10 under VMware:

--- hyperion-1/hscutl.c	2023-08-08 09:25:22.238314000 -0700
+++ hyperion-0/hscutl.c	2023-08-11 15:41:02.086933900 -0700
@@ -607,22 +607,54 @@
 /* Hercules microsecond sleep */
 DLL_EXPORT int herc_usleep( useconds_t usecs, const char* file, int line )
 {
-    int  rc, save_errno;
+#define USLEEP_MIN                              1
+#define NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD  3
 
-    if (1
-        && (rc = usleep( usecs )) != 0
-        && (save_errno = errno) != EINTR)
+    int  rc, save_errno = 0, eintr_retrys = 0;
+    struct timespec usecs_left;
+
+    usecs_left.tv_nsec = usecs * 1000L;
+    usecs_left.tv_sec  = usecs_left.tv_nsec / ONE_BILLION;
+    usecs_left.tv_nsec %=                     ONE_BILLION;
+
+    while (1
+        && (rc = nanosleep( &usecs_left, &usecs_left )) != 0
+        && (save_errno = errno) == EINTR
+        && (0
+            || usecs_left.tv_sec != 0
+            || usecs_left.tv_nsec > (USLEEP_MIN * 1000)
+           )
+    )
+        ++eintr_retrys; // (keep retrying until done or min reached)
+
+    if (0
+        || rc != 0
+        || eintr_retrys > NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD
+    )
     {
         char fnc[128], msg[128];
 
         MSGBUF( fnc, "USLEEP() at %s(%d)",
             TRIMLOC( file ), line);
 
-        MSGBUF( msg, "rc=%d, errno=%d: %s",
-            rc, save_errno, strerror( save_errno ));
+        if (save_errno != EINTR)
+        {
+            MSGBUF( msg, "rc=%d, errno=%d: %s",
+                rc, save_errno, strerror( save_errno ));
+
+            // "Error in function %s: %s"
+            WRMSG( HHC00075, "E", fnc, msg );
+        }
+
+        if (eintr_retrys > NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD)
+        {
+            MSGBUF( msg, "rc=%d, EINTR retrys count=%d",
+                rc, eintr_retrys );
+
+            // "Warning in function %s: %s"
+            WRMSG( HHC00092, "W", fnc, msg );
+        }
 
-        // "Error in function %s: %s"
-        WRMSG( HHC00075, "E", fnc, msg );
         errno = save_errno;
     }
     return rc;
@@ -2317,12 +2349,12 @@
 DLL_EXPORT const char* perc2name( BYTE perc, char* buf, size_t bufsiz )
 {   /*
            Hex    Bit   PER Event
-            80     0    Successful-branching 
+            80     0    Successful-branching
             40     1    Instruction-fetching
             20     2    Storage-alteration
             10     3    Storage-key-alteration
-            08     4    Store-using-real-address 
-            04     5    Zero-address-detection 
+            08     4    Store-using-real-address
+            04     5    Zero-address-detection
             02     6    Transaction-end
             01     7    Instruction-fetching nullification (PER-3) */
 
@@ -3325,7 +3357,7 @@
 }
 
 //---------------------------------------------------------------------
-//                    I/O Interrupt 
+//                    I/O Interrupt
 //            (handles both HHC00805 and HHC00806)
 //---------------------------------------------------------------------
 DLL_EXPORT bool tf_0806( REGS* regs, U32 ioid, U32 ioparm, U32 iointid )
@@ -4114,7 +4146,7 @@
 
     if (max_recsize < sizeof( TF01321 ))
         max_recsize = sizeof( TF01321 );
-    
+
     if (max_recsize < sizeof( TF01329 ))
         max_recsize = sizeof( TF01329 );
 
--- hyperion-1/msgenu.h	2023-07-09 17:33:35.000000000 -0700
+++ hyperion-0/msgenu.h	2023-08-11 14:56:41.453140900 -0700
@@ -287,7 +287,8 @@
 #define HHC00089 "The are no HAO rules defined"
 #define HHC00090 "HAO thread waiting for logger facility to become active"
 #define HHC00091 "Logger facility now active; HAO thread proceeding"
-//efine HHC00092 - HHC00099 (available)
+#define HHC00092 "Warning in function %s: %s"
+//efine HHC00093 - HHC00099 (available)
 
 // reserve 100-129 thread related
 #define HHC00100 "Thread id "TIDPAT", prio %d, name '%s' started"

(The non-related changes are simply removal of trailing blanks)

@Peter-J-Jansen
Copy link
Collaborator

Peter-J-Jansen commented Aug 12, 2023

Yes, thanks for catching my errors. This works for me fine as well (under 1st level (no VMware) Lubunto 20.04). I got a few HHC00092W's, perhaps a slightly increased NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD 6 would be more appropriate?

Cheers,

Peter

@Fish-Git
Copy link
Member

I got a few HHC00092W's, perhaps a slightly increased NANOSLEEP_EINTR_RETRY_WARNING_TRESHOLD 6 would be more appropriate?

Sounds good to me!

(And I would suggest moving the #defines to hscutil.h where the herc_usleep function is declared too.)

My goal today is to try and reproduce your issue #590 problem.

@Fish-Git Fish-Git added BUG The issue describes likely incorrect product functionality that likely needs corrected. and removed Researching... The issue is being looked into or additional information is being gathered/located. labels Aug 24, 2023
@Fish-Git
Copy link
Member

Fish-Git commented Aug 24, 2023

FYI:

I have removed the "Researching..." label and have added the "BUG" label as I believe we may have more than likely identified the root cause of the excessive host CPU consumption by the timer thread.

Since I am not 100% certain of this however (we will have to monitor our "fix" to see whether it does indeed fix the problem or not), I am leaving the "Unknown" label for now.

Once we have enough evidence (feedback) to reliably conclude the problem has indeed been fixed, I will then remove the "Uknown" label at that time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected. TXF Bug related to, or likely caused by, our current Transaction-Execution Facility implementation (Unknown) Unresolved. It might be a bug. It might not. We don't know. We couldn't reproduce it.
Projects
None yet
Development

No branches or pull requests

4 participants