Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in wtp #1274

Merged
merged 1 commit into from Nov 24, 2016
Merged

Race condition in wtp #1274

merged 1 commit into from Nov 24, 2016

Conversation

tomassykora
Copy link

The shutdown sequence in wtp relies for its operation on signals and cancelation cleanup handlers. Previously, trying to shutdown a thread that has not yet been properly initialized could lead to a deadlock. This change makes wtpStartWrkr() synchronous in regard to the initialization of the newly created thread.

This issue extends an older one here: #966

cancelation cleanup handlers. Previously, trying to shutdown a thread
that has not yet been properly initialized could lead to a deadlock.
This change makes wtpStartWrkr() synchronous in regard to the
initialization of the newly created thread.

Thanks to Rado Sroka for the analysis and an initial patch.
@rgerhards rgerhards added the bug label Nov 24, 2016
@rgerhards rgerhards added this to the v8.24 milestone Nov 24, 2016
@rgerhards
Copy link
Member

Note: the debian CI failuer is due to a change in CI environment, for which your version of master did not have the necesary patch (we changed CI to detect a previously undetected error condition). IAW: you do not need to care about that failure. All is well.

@radosroka
Copy link
Contributor

I did an analysis for this race.

ShutdownWorkers(qqueue_t *pthis)
	tryShutdownWorkersWithinQueueTimeout(pThis)
		wtpSetState(pThis->pWtpDA, wtpState_SHUTDOWN_IMMEDIATE);
                wtpAdviseMaxWorkers(pThis->pWtpDA, 1);
			wtpStartWrkr(pThis)
				//find free spot
				if(i == 0 || pThis->toWrkShutdown == -1)wtiSetAlwaysRunning(pThis->pWrkr[i]);???
				wtiSetState(pWti, WRKTHRD_RUNNING);
				<----------------------------------------------------------------------------------- start of race
				pthread_create(&(pWti->thrdID), &pThis->attrThrd, wtpWorker, (void*) pWti);	----------------------------------------------------------> wtpWorker()			

		wtpSetState(pThis->pqDA->pWtpReg, wtpState_SHUTDOWN);
                wtpAdviseMaxWorkers(pThis->pqDA->pWtpReg, 1);
			wtpStartWrkr(pThis)
				//find free spot
				if(i == 0 || pThis->toWrkShutdown == -1)wtiSetAlwaysRunning(pThis->pWrkr[i]);???
				wtiSetState(pWti, WRKTHRD_RUNNING);
				<----------------------------------------------------------------------------------- start of race
				pthread_create(&(pWti->thrdID), &pThis->attrThrd, wtpWorker, (void*) pWti);	----------------------------------------------------------> wtpWorker()

	cancelWorkers(pthis)
		wtpCancelAll(pThis->pWtpReg);
			wtiCancelThrd(pThis->pWrkr[i])
				if(wtiGetState(pThis))pthread_kill(pThis->thrdID, SIGTTIN)
				while(wtiGetState(pThis))srSleep(0, 10000)
  
		wtpCancelAll(pThis->pqDA->pWtpReg);
			wtiCancelThrd(pThis->pWrkr[i]);
				if(wtiGetState(pThis))pthread_kill(pThis->thrdID, SIGTTIN)
				while(wtiGetState(pThis))srSleep(0, 10000)





static void *
  wtpWorker(void *arg) /* the arg is actually a wti object, even though we are in wtp! */
  {
          wti_t *pWti = (wti_t*) arg;
          wtp_t *pThis;
          sigset_t sigSet;
  #       if HAVE_PRCTL && defined PR_SET_NAME
          uchar *pszDbgHdr;
          uchar thrdName[32] = "rs:";
  #       endif
  
          BEGINfunc
          ISOBJ_TYPE_assert(pWti, wti);
          pThis = pWti->pWtp;
          ISOBJ_TYPE_assert(pThis, wtp);
  
          /* block all signals */
          sigfillset(&sigSet);
          pthread_sigmask(SIG_BLOCK, &sigSet, NULL);
  
          /* but ignore SIGTTN, which we (ab)use to signal the thread to shutdown -- rgerhards, 2009-07-20 */
          sigemptyset(&sigSet);
          sigaddset(&sigSet, SIGTTIN);
          pthread_sigmask(SIG_UNBLOCK, &sigSet, NULL);
  
  #       if HAVE_PRCTL && defined PR_SET_NAME
          /* set thread name - we ignore if the call fails, has no harsh consequences... */
          pszDbgHdr = wtpGetDbgHdr(pThis);
          ustrncpy(thrdName+3, pszDbgHdr, 20);
          if(prctl(PR_SET_NAME, thrdName, 0, 0, 0) != 0) {
                  DBGPRINTF("prctl failed, not setting thread name for '%s'\n", wtpGetDbgHdr(pThis));
          }
          dbgOutputTID((char*)thrdName);
  #       endif
  <------------------------------------------------------------------------------------------------------------------------------- end of race
          pthread_cleanup_push(wtpWrkrExecCancelCleanup, pWti);
          wtiWorker(pWti);
          pthread_cleanup_pop(0);
          wtpWrkrExecCleanup(pWti);
  
          ENDfunc
          /* NOTE: we must call ENDfunc FIRST, because otherwise the schedule may activate the main
           * thread after the broadcast, which could destroy the debug class, resulting in a potential
           * segfault. So we need to do the broadcast as actually the last action in our processing
           */
          pthread_cond_broadcast(&pThis->condThrdTrm); /* activate anyone waiting on thread shutdown */
          pthread_exit(0);
  }

@radosroka
Copy link
Contributor

radosroka commented Nov 24, 2016

What fix do you have @rgerhards ?

This cause the problem with system restart that rsyslog hangs, we had this problem in RHEL long ago and it's reproducible with newer versions of rsyslog.

@rgerhards
Copy link
Member

The problem on the Debian Buildbot Slave is unrelated. This is the issue: #1271

I need to review this PR, but quick overview made me think it is good ... and I appreciate this fix, especially as we were quite puzzled on the original issue.

Copy link
Member

@rgerhards rgerhards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good (actually excellent ;-))

@rgerhards rgerhards merged commit 547380d into rsyslog:master Nov 24, 2016
/* wait for the new thread to initialize its signal mask and
* cancelation cleanup handler before proceeding
*/
d_pthread_cond_wait(&pThis->condThrdInitDone, &pThis->mutWtp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I realize that I am a bit late to the game here, but the condition wait call does not appear to be in a loop. Perhaps I am missing the loop? POSIX allows for spurious wake ups to occur for callers of pbench_cond_wait(), so the recommended sequence/hand shake is to:

WAITER:
lock
   loop
      check predicate (memory location value)
      if not expected state of predicate
        then call condition wait (which unlocks mutex and reacquires on wakeup)
        else exit loop
unlock

SIGNALER:
lock
    change wait predicate
    signal or broadcast
unlock

This code does not appear to do that, so it is likely wide open to race conditions.

@lock
Copy link

lock bot commented Dec 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants