-
-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable stacktraces from ExportTable entries by default #441
Disable stacktraces from ExportTable entries by default #441
Conversation
This looks like a good idea. I don't think we get much value from retaining these stack traces. I'll have to look at it more deeply. |
While I have not used it myself, this infrastructure smells like what Brendan Gregg would call a crisis tool: a tool which you would need to get to root cause in a crisis situation and which you would not be able to easily install or enable in that same crisis situation. If such tools are not installed or enabled by default, then these crises often pass without getting to root cause. Installing or enabling the crisis tool after the fact may not help if the problem is not easily reproducible, or if inducing another failure for the purposes of root cause analysis is prohibitively expensive. The philosophy of debugging from first failure is best served by having such crisis tools installed and enabled by default. |
That was an informative watch, although in this case compared to having tools installed this has a cost associated in terms of memory which directly affects scalability. In some controllers StacktraceElement can be one of the largest consumers of memory. One of my instances has it pegged at the second highest consumer of memory. Depending on how useful this information is to remoting perhaps an argument can be made to disable it by default. |
Yes, I'm sure this was intended as some sort of debugging tool in problematic situations. But, I don't know that there was really a justification for it or someone years ago thought it might be useful sometime. I'm not personally aware of it every being useful. Most of the failures associated with Remoting these days originate from outside -- system, environment, or networking issues that cause the channel to close. Or weird plugin interactions that end up looking like something in Remoting. The stack traces from inside rarely provide interesting information in these cases. Unless we can come up with a meaningful scenario where these stack traces help, removing them would be better given the cost. |
That doesn't necessarily mean this functionality is pointless, just that nobody has used it recently (which could be for any number of reasons). If this functionality is now pointless and has not been recently used, then it would indeed be clearly desirable to disable it by default or even to remove it entirely. If this functionality still has a legitimate purpose but has not been recently used, then making it harder to use during crisis situations would be a small regression in debuggability. Whether that small regression is worth the gain in scalability seems hard to quantify without fully understanding the original use case, something I don't personally understand and which isn't explained in this PR. I think it would be helpful to research the original use case from which this functionality originated and to reason explicitly about whether or not that use case remains legitimate, regardless of whether or not anyone has used this functionality recently. Ideally a solution could be found that improves scalability without decreasing debuggability. It is unclear to me whether such a solution exists, but I think it would be worth thinking about. At minimum, going through this thought exercise would give us more clarity about the cost (to be weighed against the obvious scalability benefit). If we have looked for such a solution but have not found one, then we have to choose which of the two is more important in this situation: scalability or debuggability. That decision is in part based on the cost-benefit calculus described above but also comes down to the community's values. I personally value debuggability over scalability, but I recognize that I am on the periphery of this community and that the community's values may not exactly match my own (which is fine). |
This diagnostic capability was added 7 years ago in an attempt to gather more information for tracking down some specific information. I'm not seeing any indication that it was helpful in diagnosing the problem or that the problem has retained any importance. While it's hard to say that it no longer exists, I can't find any further reports. There are only 4 people watching it. This PR does include a way to reintroduce the logging, if it is needed. Given the thin history of this capability, even that may be excessive. Given the cost of this feature and the very dubious value, it's probably best to at least disable it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to go with this cleanup. I haven't seen cases where this information has been useful. If it does prove useful somewhere this change includes a flag to turn it back on.
Additional reviews and opinions on this one are definitely welcome. I've taken basil's into account and am still inclined to accept it as mentioned in my approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be documented as a system property. No objections w.r.t disabling it by default, but maybe we need some means to enable it in runtime for the controller and agents. Otherwise debugging will become even more complicated
Will request reviews from @jenkinsci/core to get more feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably OK. If heap rather than CPU is the main issue, might be preferable to retain traces for recently created entries, but I am not sure if that would even be appropriate.
@@ -11,6 +11,7 @@ | |||
*/ | |||
public class ExportTableTest extends TestCase { | |||
public void testDiagnosis() throws Exception { | |||
ExportTable.EXPORT_TRACES = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore previous value in a finally
block, or @After
etc. (See FlagRule
for JTH-based tests.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm apart from Jesse's comments
Co-authored-by: Jesse Glick <jglick@cloudbees.com>
Looks like this has received enough approvals so I marked it as ready-to-merge. Other reviews are still welcome. We'll plan on getting this into an upcoming release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Not certain if this is a good idea but the idea spawned from JENKINS-63975 as stacktraces seem to take a considerable amount of memory usage in Jenkins and as far as I understand it, it is entirely for debugging.
Although I'm not certain how often this workflow is used in the real world, if its very rare this ever happens perhaps its better to disable by default.