Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with big logs, causing the browser to freeze (JENKINS-56377) #164

Closed
wants to merge 8 commits into from
Closed

Conversation

zhekoff316
Copy link

@zhekoff316 zhekoff316 commented May 26, 2020

See JENKINS-56377.

We've had this issue for a couple of months, where big logs would freeze the browser and the whole thing would load for ~2-3 minutes. I've addressed this issue and optimized the code in several places:

  • Moved "hide/show" link generation functionality in the back-end
  • Removed an unnecessary loop, which would loop over every element, while it was looping through each element. This was the main cause of the performance issues.
  • I've added the setTimeout() function, so it prevents browser freezing in especially big logs
  • I've also added some documentation, so it will be easier in the future for everyone to work on this script and improve it

I've tested it on a pipeline, which has 5-10 thousand steps, and my load time has been in the span of 2-5 seconds, compared to 90-150s before (including a browser crash/freeze). Also, this has been tested on Chrome and Mozilla.

@zhekoff316
Copy link
Author

I feel like there is an issue with the build process, as a higher version of Jenkins is required to run the tests (the POM states 2.176.4 as the minimum Jenkins version, while the tests ran at 2.164.1). I've ran the tests locally and they all pass,

@dwnusbaum
Copy link
Member

Thanks for the PR!

I feel like there is an issue with the build process

Yeah there is, I just filed #165 to hopefully fix the issue.

@dwnusbaum dwnusbaum self-requested a review May 26, 2020 13:18
Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. Have not reviewed in detail.

if (id == null) {
break

// The CSS rule for branch names only needs to be added once per node, so we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW https://github.com/jenkinsci/workflow-job-plugin/pull/164/files?diff=unified&w=1 improves reviewability due to indentation changes.

var startId = e.getAttribute('startId')
if (startId == null || startId == nodeId) {
e.innerHTML = e.innerHTML.replace(/.+/, '$&<span class="pipeline-show-hide"> (<a href="#" onclick="showHidePipelineSection(this); return false">hide</a>)</span>')
// TODO automatically hide second and subsequent branches: namely, in case a node has the same parent as an earlier one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already filed in JIRA somewhere.

Copy link
Author

@zhekoff316 zhekoff316 May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this part, because it would probably make more sense to generate this inside the back-end.

EDIT: upon further inspection, I will leave it as it was before due to its performance impact being way smaller than I thought (almost not noticeable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have clarified that my review comment was in reference to the removal of the one-line code comment, which is properly an RFE that should be tracked in Jira, not to behavioral changes in your patch.

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhekoff316 Thanks for the PR! I am requesting changes to fix the issue I noted with parallel branch labels, but I would really like to see this merged once that is fixed!

I've tested it on a pipeline, which has 5-10 thousand steps

Is that a real Pipeline, or just one you created for testing?

Comment on lines 30 to 38
var nodes = $$('.pipeline-new-node')
var enclosings = new Map() // id → enclosingId
var labels = new Map() // id → label
for (var i = 0; i < nodes.length; i++) {
var node = nodes[i]
var oid = node.getAttribute('nodeId')
enclosings.set(oid, node.getAttribute('enclosingId'))
labels.set(oid, node.getAttribute('label'))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this loop causes branch names to no longer be shown when log output changes from one branch to another in many cases.

The problem is that in the while loop right after this, we need to walk through all of this node's parents to find what parallel branch it is a part of. Because the loop is gone, now we only walk through the immediate parents of each node, so if there is any nesting inside of the parallel branch, for example because of a block-scope step, we never find the enclosing parallel branch, and so the while loop exits before finding a label.

The old code is not optimal because we create a fresh enclosings and labels map for every node even though most of the content will be the same. I would try to come up with a good way to store those as page-level state so that you can keep the approach you have now of only updating enclosings and labels once per node, and then split the while loop below out somehow and only run it after the enclosings and labels have data for every node (not sure of a good way to do that).

Here is a Pipeline you can run to compare the behavior with and without this PR to see what I mean:

stage('Outer') {
    parallel(branchOne: {
        stage('Stage One') {
            node() {
                sh 'echo "from one"'
                sleep(time: 1)
                echo 'from one again'
            }
        }
    }, branchTwo: {
        stage('Stage Two') {
            node() {
                sh 'echo "from two"'
                sleep(time: 1)
                echo 'from two again'
            }
        }
    })
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I will try to find a solution to this problem.

break
}
}
}, 5000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 seconds is a long time to wait. Is 1 second (or maybe even less) not good enough just to get it out of the critical path for the main page load?

@zhekoff316
Copy link
Author

Hello again, thanks for the feedback.

First of all, sorry about the delay.

I have done some big changes in the latest commit. They resolve the issue with parallel pipelines (branch names are shown), and also greatly improve the performance with big logs.

  • I have removed the usage of the Behaviour.js library, as we can use basic JavaScript to accomplish the same results with greater performance. Also, this library is deprecated.
  • I have split most of the logic into separate functions. In my opinion, this enhances readability and maintainability.
  • More technically speaking, the problematic for loop, which generates the enclosings now only executes once, and if the pipeline is parallel it will go through another function, which will append the branch names. I feel like this is the most optimal way to do it while preserving all current functionality.

}
var isParallel = false
var nodes = $$('.pipeline-new-node')
var pipelineElements = document.getElementsByClassName("pipeline-new-node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, those two variables are left-over after I made some changes to the code. Will remove them.

Comment on lines 194 to 195
isParallelBuild($$('.pipeline-new-node'));
generatePipelineLinks($$('.pipeline-new-node'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this only once on page load seems like it will have issues if someone opens the log for an in-progress build, leaves it open while the build continues, and then comes back and looks at the logs. For example, there may not be a parallel step when the page is opened initially, but one may execute later on. I think that in-progress builds are supported today by Behaviour.specify and the old code, but I haven't specifically tested it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test this use case with the current code and will let you know. It is also a variant to remove the parallel check all-together - the performance impact will be minimal, as the main performance issue was fixed.

@jglick jglick removed their request for review June 17, 2020 20:41
@oleg-nenashev oleg-nenashev self-requested a review June 23, 2020 07:35
@oleg-nenashev
Copy link
Member

Added it to my review queue. Thanks to @zhekoff316 for this pull request, it is a very important improvement!

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look, and the logic seems a little off; I added some comments. Have you checked to see whether we can count on Behaviour.specify processing the nodes in any particular order? That could help us figure out a consistent way to build up the enclosings and labels maps without having to recreate them for every node.

Comment on lines 42 to 43
var enclosings = new Map() // id → enclosingId
var labels = new Map() // id → label
Copy link
Member

@dwnusbaum dwnusbaum Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to create a local version of enclosings and labels here, or were you wanting to reuse and modify the global variables and use them as a kind of cache or something?

The way things are now, it doesn't look like generateEnclosing (no s) actually does anything because the global variables aren't used.

Comment on lines 1 to 2
var enclosings = new Map() // id → enclosingId
var labels = new Map() // id → label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if scripts like this are encapsulated by Jenkins beyond standard JavaScript behavior, but having global vars like this seems like a bad idea. I would probably find the <pre> element with class console-output and attach these objects to that element instead, but I'm not a JavaScript expert, so I'm not sure what the preferred approach is for this kind of thing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be easier to simply wrap the whole script in an IIFE

(function() {
    // ...
})();

This way those variables are not in the global scope but easily accessible from anywhere in this script.

enclosings.set(oid, node.getAttribute('enclosingId'))
labels.set(oid, node.getAttribute('label'))
appendBranchNames(oid, enclosings, labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the only place where appendBranchNames is called, which seems wrong, since this method is only called for the parallel step itself. What if we start executing the first branch in a parallel step, and then that branch stops and we start executing the other branch? The first branch might get labelled correctly, but I think that the second branch wouldn't be labeled (unless later on in the Pipeline there was another parallel step).

@ngg
Copy link

ngg commented Sep 7, 2020

Hi! Is there any update regarding this PR? We hit this bug as well :(

@bitwiseman
Copy link
Contributor

@ngg
We're waiting for updates from @zhekoff316, I believe.

@bitwiseman bitwiseman self-requested a review September 9, 2020 03:06
@zhekoff316
Copy link
Author

I've been on a vacation, but I will take a look at the latest review comments as soon as I can, and will let you know if I can fix it.

@zhekoff316
Copy link
Author

Hello :) , sorry for taking so much time to come back to this.

I've noticed that @jglick has made a change (#112) that is mentioned in this PR. I haven't looked it in detail yet, but I can confirm that the problem does not appear anymore on big console logs on our Jenkins server (no browser crashes or freezes, it takes much faster for the logs to load).

Do you still have this problem on your side, as it seems the main issues have been fixed?

@ngg
Copy link

ngg commented Oct 22, 2021

That change was made 3 years ago, I don't see how that can be relevant. The issue is still not fixed, it's still horribly slow to open the console log. An 1mb log file takes several minutes to load and the browser stays unresponsive until then.

@zhekoff316
Copy link
Author

zhekoff316 commented Oct 22, 2021

That change was made 3 years ago, I don't see how that can be relevant. The issue is still not fixed, it's still horribly slow to open the console log. An 1mb log file takes several minutes to load and the browser stays unresponsive until then.

My bad on this one. I thought it was a newer change, as I can no longer re-produce the problem on our Jenkins server and I am talking about a 7mb log, which loads for 5-6 seconds maximum. The 1mb logs are loading almost instantly. The hardware or general configuration has not been changed.

Edit: I found some of the older jobs (they have a lot of steps) I used for testing this issue. I can still re-produce it, so I will start work on this again.

@zhekoff316
Copy link
Author

Hello,

I've tried to address the latest comments from before I stopped working on this PR.

(1) The global variables were indeed useless there, so I removed them and replaced them with variables in the function scope
(2) Regarding your comments @dwnusbaum about the appendBranchNames:

It looks like this is the only place where appendBranchNames is called, which seems wrong, since this method is only called for the parallel step itself. What if we start executing the first branch in a parallel step, and then that branch stops and we start executing the other branch? The first branch might get labelled correctly, but I think that the second branch wouldn't be labeled (unless later on in the Pipeline there was another parallel step).

Could you please give me an example pipeline where I can test this for example, because I can not reproduce anything wrong currently with the parallel builds. This function is only called if a parallel stage exists in the whole pipeline, and will go through all elements (from what I can see). Maybe I am doing something wrong, so it would be very helpful if you can provide me a very short example :) If there is something wrong there, I will fix it.


Overall, I tested again with a job with a lot of Pipeline Steps and the log would open very fast. Parallel builds with multiple stages also looked okay, and the show/hide buttons are working. I am ready to provide examples if you need them.

Will be happy if you can review this, as it's quite an important improvement.

var enclosings = new Map() // id → enclosingId
var labels = new Map() // id → label
var nodes = $$(".pipeline-new-node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, document.querySelectorAll will be much faster than $$ as one is a native browser API, and the other would be a prototypes scanner. its not a drop in replacement, but it doesn't seem like there's any prototype specific apis being used for this, so should work.

@jessejoe
Copy link

jessejoe commented Feb 3, 2022

I've also been hitting this issue as part of testing upgrading our old Jenkins server (v2.150.1) to current LTS, but this issue makes /console and /consoleFull pretty much unusable for us (I've had to use workflow-job-plugin v.2.36 or earlier so far). I've documented a lot of my troubleshooting in https://issues.jenkins.io/browse/JENKINS-62241.

I just tested the changes in this PR with a build log from an example pipeline in the ticket I mentioned above which is simply generating a log from parallel branches:

def genText(lines){
    (1..lines).each{
       println "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur interdum fringilla interdum"
    }
    return true
}
parallel firstBranch: {
    genText(20000)
}, secondBranch: {
    genText(20000)
}

I wanted to give some benchmarks of trying to load the different consoles for the the result of a build of the above pipeline, which winds up being a ~16MB log file:

Jenkins Version workflow-job-plugin Version /console /consoleFull /consoleText
2.150.1 (our existing server) 2.28 <1 second 5-7 seconds <1 second
2.319.2 (local container) 2.36 <1 second Never (5 min timeout) - browser slowdown warning <1 second
2.319.2 (local container) 1145.v7f2433caa07f Never (5 min timeout) - browser slowdown warning Never (5 min timeout) - browser slowdown warning <1 second
2.319.2 (local container) zhekoff316:master:43c58 5-7 seconds 65-75 seconds <1 second

The changes in this PR are definitely an improvement over the more current versions, but still not as efficient as our old Jenkins server 2.150.1 with workflow-job-plugin 2.28. Definitely at least make it usable though!

@dwnusbaum
Copy link
Member

The difference between 1145.v7f2433caa07f and 2.36 is that prior to #150 (landed in 2.37), the relevant Javascript essentially never executes on /console for large log files due to the bug described in that PR, only on /consoleFull (this is why they have the same behavior on /consoleFull). The difference between 2.36 and 2.28 is that that #111 (landed in 2.31) added Javascript to try to show the names of parallel branches in the console UI, and this Javascript is slow because it traverses the Pipeline flow graph for each flow node that produced log output to figure out what branch each node belongs to (IIRC this PR (#164) tries to improve the performance by caching the intermediate steps of the traversal). Prior to #27 (landed in 2.26) the branch names were written directly into the logs, so there was no Javascript needed to show this data at all.

So in 2.28, there is no Javascript that tries to show branch names in the log, which is why things are fast.

@jglick
Copy link
Member

jglick commented Feb 3, 2022

Possibly we could just detect a large log file and disable the JS (compared to consoleText you would still see rendered console notes)? Probably needs bigger redesigns to make the display usable on a large build, especially when there are parallel branches involved.

@jessejoe
Copy link

jessejoe commented Feb 3, 2022

After spending the week discussing this in the Jenkins Gitter, it seems like most large log files do not hit this issue (or I suspect there'd be a lot more complaints). The magic ingredient seems to be the multiple parallel branches logging (could detect that instead?). IMO the biggest issue is that even /console is affected, which really doesn't even show a lot of the log, and it is linked to from the job build history and build executor status. As a user, I think /console should load as fast as possible since its such a common endpoint and should not take multiple seconds to load a little bit of log.

@wblair-ebsco
Copy link

I hit this problem with large console logs quite often with version 2.293. Haven't complained but that does not means it's not a problem.
Behaves like a memory leak because I have to restart the controller to display large console logs.

@zhekoff316 zhekoff316 closed this by deleting the head repository Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants