Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Await Barrier never releases #10

Closed
imavroukakis opened this issue Jun 12, 2020 · 9 comments
Closed

Await Barrier never releases #10

imavroukakis opened this issue Jun 12, 2020 · 9 comments

Comments

@imavroukakis
Copy link

Hello! I'm trying to wait for a barrier within a script section, but it seems to hang there forever, even when I set a timeout. My code looks something like this, nested within a declarative pipeline steps{} block.

def testBarrier = createBarrier count: numberOfTestNodes; // 5 in my case
def testGroups = [:]
script {
    for (int i = 0; i < numberOfTestNodes; i++) {
        def num = i
        testGroups["node $num"] = {
            node('workers') {
                def javaHome = tool name: 'openjdk-11'
                // do some prep work
                awaitBarrier barrier: testBarrier, timeout: 10, unit: 'SECONDS'
                // main work goes here
                stash name: "node $num", includes: '**/simulation.log'
            }
        }
    }
    parallel testGroups
}

In the Jenkins build console log, I can see awaitBarrier being printed out, but it hangs on the last one forever. I counted them and there are definitely 5 instances of awaitBarrier printed out.

I'm using version 1.0.0 of the plugin in Jenkins 2.219

Thanks!

@imavroukakis
Copy link
Author

FYI, I also tried your sample, and it displays the exact same behaviour

[Pipeline] Start of Pipeline
[Pipeline] createBarrier
[Pipeline] parallel
[Pipeline] { (Branch: await1)
[Pipeline] { (Branch: await2)
[Pipeline] { (Branch: await3)
[Pipeline] awaitBarrier
[Pipeline] awaitBarrier
[Pipeline] awaitBarrier

@topikachu
Copy link
Contributor

@imavroukakis I will look at this issue.

@topikachu topikachu added the bug Something isn't working label Jun 27, 2020
@topikachu
Copy link
Contributor

@imavroukakis
I can't reproduce this issue. Below is the script I used.
I will close this issue, anyway, yor can reopen or create a new one if you still meet this.

def numberOfTestNodes = 5
def testBarrier = createBarrier count: numberOfTestNodes; // 5 in my case
def testGroups = [:]

Random rand = new Random()



script {
    for (int i = 0; i < numberOfTestNodes; i++) {
        def num = i
        testGroups["node $num"] = {
            //node('workers') {
                random_num = rand.nextInt(20+1)
                sleep random_num
                // do some prep work
                awaitBarrier barrier: testBarrier, timeout: 5, unit: 'SECONDS'
                // main work goes here
                
            //}
        }
    }
    parallel testGroups
}

@topikachu topikachu added can't reproduce and removed bug Something isn't working labels Jul 8, 2020
@gobos
Copy link

gobos commented Jul 27, 2020

I have the same issue on basic examples from readme. It worked and after dozen runs stopped. Restart jenkins helps
version 1.0.0 of the plugin in Jenkins 2.235.1

@eczernikowski
Copy link

eczernikowski commented Aug 27, 2020

I'm seeing similar behavior. If I set the barrier parties to greater than 3, the barrier never registers the additional waiting parties and all threads block until the timeout.

(note: I cloned the repo and amended BarrierTest and the jenkinsfile fixture to test 8 parties and it worked fine)

plugin version 1.0.0, Jenkins ver. 2.222.3

two-file pipelinejob example that reproduces this
test_barrier_dsl.groovy:

pipelineJob('tests/test-barrier') {
  
  description('Testing Concurrent Step Plugin Barrier')

  parameters {
    stringParam('NUM_THREADS', '8', 'Number of threads to launch and wait for')
  }

  environmentVariables(
    JOB_PATH: 'jobs/tests'
  )

  definition {
    cpsScm {
      lightweight(true)
      scm {
        git {
          branch('master')
          remote {
            url('[redacted github repo url]')
            credentials('[redacted github access token]')
          }
        }
      }
      scriptPath('${JOB_PATH}/test_barrier.groovy')
    }
  }
}

test_barrier.groovy:

#!groovy
  
def log(msg) {
  def timeStamp = (new Date()).format("yyyy-MM-dd HH:mm:ss")
  print "[${timeStamp}] ${msg}"
}

def logWaiting(threadNum, barrier) {
  log "[Thread ${threadNum}] BarrierRef object: ${barrier.dump()} - CyclicBarrier object: ${barrier.cyclicBarrier}"
  log "[Thread ${threadNum}] total waiting now: ${barrier.cyclicBarrier.numberWaiting} out of ${barrier.cyclicBarrier.parties} required"
}

def numThreads = Integer.parseInt(params.NUM_THREADS)
def testBarrier = createBarrier count: numThreads
testGroups = [:]

for (int i = 0; i < numThreads; i++) {
  def num = i
  testGroups["node ${num}"] = {
    sleep (num * 2)
    logWaiting num, testBarrier
    awaitBarrier barrier: testBarrier, timeout: 5, unit: 'MINUTES'
    log "Thread ${num} is done"
  }
}
parallel testGroups

jenkins console logs with NUM_THREADS=3:
Screen Shot 2020-08-26 at 11 01 20 PM

jenkins console logs with NUM_THREADS=8 (each thread times out waiting after 5 minutes, numberWaiting never gets higher than 3):
Screen Shot 2020-08-26 at 11 07 12 PM

@eczernikowski
Copy link

eczernikowski commented Aug 27, 2020

could it be a thread-related limitation for cyclicbarrier? I have no such issues when doing a similar test with the countdownlatch

@MohsineF
Copy link

Same issue, can't create barrier greater than 3, the barrier never registers the additional parties and all threads block, also a large number of threads gets created and i can't create any new jobs until jenkins server restart !

Is it related to any plugin that's causing the problem, or the thread maximum needs an increase ?

@ijrandom
Copy link

ijrandom commented Feb 8, 2022

For everyone who still facing this issue. It is caused by limited number of threads in ForkJoinPool on jenkins master node.
If jenkins master node has N CPUs only N simultaneous awaitBarrier will work

@moshavnik
Copy link

@ijrandom
do you think that updating the java paramter "java.util.concurrent.ForkJoinPool.common.parallelism" to a number which is higher that the number of processors can help this problem? (not solve it, ofcourse...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants