Disable stacktraces from ExportTable entries by default #441

res0nance · 2021-02-17T12:14:13Z

Not certain if this is a good idea but the idea spawned from JENKINS-63975 as stacktraces seem to take a considerable amount of memory usage in Jenkins and as far as I understand it, it is entirely for debugging.

Although I'm not certain how often this workflow is used in the real world, if its very rare this ever happens perhaps its better to disable by default.

jeffret-b · 2021-02-17T16:34:12Z

This looks like a good idea. I don't think we get much value from retaining these stack traces. I'll have to look at it more deeply.

basil · 2021-02-19T17:58:38Z

While I have not used it myself, this infrastructure smells like what Brendan Gregg would call a crisis tool: a tool which you would need to get to root cause in a crisis situation and which you would not be able to easily install or enable in that same crisis situation. If such tools are not installed or enabled by default, then these crises often pass without getting to root cause. Installing or enabling the crisis tool after the fact may not help if the problem is not easily reproducible, or if inducing another failure for the purposes of root cause analysis is prohibitively expensive. The philosophy of debugging from first failure is best served by having such crisis tools installed and enabled by default.

res0nance · 2021-02-21T17:31:26Z

While I have not used it myself, this infrastructure smells like what Brendan Gregg would call a crisis tool: a tool which you would need to get to root cause in a crisis situation and which you would not be able to easily install or enable in that same crisis situation. If such tools are not installed or enabled by default, then these crises often pass without getting to root cause. Installing or enabling the crisis tool after the fact may not help if the problem is not easily reproducible, or if inducing another failure for the purposes of root cause analysis is prohibitively expensive. The philosophy of debugging from first failure is best served by having such crisis tools installed and enabled by default.

That was an informative watch, although in this case compared to having tools installed this has a cost associated in terms of memory which directly affects scalability. In some controllers StacktraceElement can be one of the largest consumers of memory. One of my instances has it pegged at the second highest consumer of memory. Depending on how useful this information is to remoting perhaps an argument can be made to disable it by default.

jeffret-b · 2021-02-22T17:51:43Z

Yes, I'm sure this was intended as some sort of debugging tool in problematic situations. But, I don't know that there was really a justification for it or someone years ago thought it might be useful sometime. I'm not personally aware of it every being useful.

Most of the failures associated with Remoting these days originate from outside -- system, environment, or networking issues that cause the channel to close. Or weird plugin interactions that end up looking like something in Remoting. The stack traces from inside rarely provide interesting information in these cases.

Unless we can come up with a meaningful scenario where these stack traces help, removing them would be better given the cost.

basil · 2021-02-25T20:13:06Z

I'm not personally aware of it ever being useful.

That doesn't necessarily mean this functionality is pointless, just that nobody has used it recently (which could be for any number of reasons). If this functionality is now pointless and has not been recently used, then it would indeed be clearly desirable to disable it by default or even to remove it entirely. If this functionality still has a legitimate purpose but has not been recently used, then making it harder to use during crisis situations would be a small regression in debuggability. Whether that small regression is worth the gain in scalability seems hard to quantify without fully understanding the original use case, something I don't personally understand and which isn't explained in this PR.

I think it would be helpful to research the original use case from which this functionality originated and to reason explicitly about whether or not that use case remains legitimate, regardless of whether or not anyone has used this functionality recently. Ideally a solution could be found that improves scalability without decreasing debuggability. It is unclear to me whether such a solution exists, but I think it would be worth thinking about. At minimum, going through this thought exercise would give us more clarity about the cost (to be weighed against the obvious scalability benefit).

If we have looked for such a solution but have not found one, then we have to choose which of the two is more important in this situation: scalability or debuggability. That decision is in part based on the cost-benefit calculus described above but also comes down to the community's values. I personally value debuggability over scalability, but I recognize that I am on the periphery of this community and that the community's values may not exactly match my own (which is fine).

jeffret-b · 2021-02-25T23:38:08Z

This diagnostic capability was added 7 years ago in an attempt to gather more information for tracking down some specific information. I'm not seeing any indication that it was helpful in diagnosing the problem or that the problem has retained any importance. While it's hard to say that it no longer exists, I can't find any further reports. There are only 4 people watching it.

This PR does include a way to reintroduce the logging, if it is needed. Given the thin history of this capability, even that may be excessive.

Given the cost of this feature and the very dubious value, it's probably best to at least disable it.

jeffret-b

I'm inclined to go with this cleanup. I haven't seen cases where this information has been useful. If it does prove useful somewhere this change includes a flag to turn it back on.

jeffret-b · 2021-03-29T23:22:01Z

Additional reviews and opinions on this one are definitely welcome. I've taken basil's into account and am still inclined to accept it as mentioned in my approval.

oleg-nenashev

It should be documented as a system property. No objections w.r.t disabling it by default, but maybe we need some means to enable it in runtime for the controller and agents. Otherwise debugging will become even more complicated

Will request reviews from @jenkinsci/core to get more feedback

jglick

Probably OK. If heap rather than CPU is the main issue, might be preferable to retain traces for recently created entries, but I am not sure if that would even be appropriate.

src/main/java/hudson/remoting/ExportTable.java

jglick · 2021-04-05T22:19:08Z

src/test/java/hudson/remoting/ExportTableTest.java

@@ -11,6 +11,7 @@
 */
 public class ExportTableTest extends TestCase {
    public void testDiagnosis() throws Exception {
+        ExportTable.EXPORT_TRACES = true;


Restore previous value in a finally block, or @After etc. (See FlagRule for JTH-based tests.)

timja

lgtm apart from Jesse's comments

Co-authored-by: Jesse Glick <jglick@cloudbees.com>

jeffret-b · 2021-04-06T23:26:20Z

Looks like this has received enough approvals so I marked it as ready-to-merge. Other reviews are still welcome. We'll plan on getting this into an upcoming release.

oleg-nenashev

Looks good

src/main/java/hudson/remoting/ExportTable.java

Disable stacktraces from exporttable entries by default

3457b46

jeffret-b approved these changes Mar 29, 2021

View reviewed changes

oleg-nenashev reviewed Apr 5, 2021

View reviewed changes

oleg-nenashev requested a review from a team April 5, 2021 21:36

jglick approved these changes Apr 5, 2021

View reviewed changes

timja reviewed Apr 6, 2021

View reviewed changes

res0nance and others added 2 commits April 6, 2021 19:51

Apply suggestions from code review

4791f80

Co-authored-by: Jesse Glick <jglick@cloudbees.com>

Restore variable after test

0a428a5

oleg-nenashev self-requested a review April 6, 2021 12:00

timja approved these changes Apr 6, 2021

View reviewed changes

jeffret-b added ready-to-merge removed Functionality removed or otherwise cleaned up labels Apr 6, 2021

oleg-nenashev approved these changes Apr 17, 2021

View reviewed changes

oleg-nenashev merged commit 6f30821 into jenkinsci:master Apr 17, 2021

daniel-beck reviewed Apr 19, 2021

View reviewed changes

src/main/java/hudson/remoting/ExportTable.java Show resolved Hide resolved

jeffret-b added the enhancement For changelog: An enhancement providing new capability. label Apr 25, 2021

olamy mentioned this pull request Aug 23, 2021

Fix test which will fail when upgrading to remoting 4.8 (i.e core 2.294) jenkinsci/support-core-plugin#304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable stacktraces from ExportTable entries by default #441

Disable stacktraces from ExportTable entries by default #441

res0nance commented Feb 17, 2021

jeffret-b commented Feb 17, 2021

basil commented Feb 19, 2021

res0nance commented Feb 21, 2021

jeffret-b commented Feb 22, 2021

basil commented Feb 25, 2021

jeffret-b commented Feb 25, 2021

jeffret-b left a comment

jeffret-b commented Mar 29, 2021

oleg-nenashev left a comment

jglick left a comment

jglick Apr 5, 2021

timja left a comment

jeffret-b commented Apr 6, 2021

oleg-nenashev left a comment

Disable stacktraces from ExportTable entries by default #441

Disable stacktraces from ExportTable entries by default #441

Conversation

res0nance commented Feb 17, 2021

jeffret-b commented Feb 17, 2021

basil commented Feb 19, 2021

res0nance commented Feb 21, 2021

jeffret-b commented Feb 22, 2021

basil commented Feb 25, 2021

jeffret-b commented Feb 25, 2021

jeffret-b left a comment

Choose a reason for hiding this comment

jeffret-b commented Mar 29, 2021

oleg-nenashev left a comment

Choose a reason for hiding this comment

jglick left a comment

Choose a reason for hiding this comment

jglick Apr 5, 2021

Choose a reason for hiding this comment

timja left a comment

Choose a reason for hiding this comment

jeffret-b commented Apr 6, 2021

oleg-nenashev left a comment

Choose a reason for hiding this comment