Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-39150] expose diagnostics across all the channels #122

Merged
merged 8 commits into from Oct 21, 2016

Conversation

@kohsuke
Copy link
Member

commented Oct 21, 2016

Continuing from #120

No code change since then, just re-targeting stable-2.x

kohsuke added 8 commits Oct 20, 2016
To be used by support-core, we need to be able to enumerate all active
channels. We do this via WeakHashMap so that references get
automatically garbage collected.

Unclosed channel will remain in memory forever, which also helps us find
those leaks.
... and see how they can be interpreted to aid diagnostics
JLS (http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7) states
that a write  to non-volatile long variable can go in 32bit batch, so
without it, read could retrieve a completely bogus value.

There's no risk of writer contention here because that is serialized by
the context in which it gets invoked.
@kohsuke

This comment has been minimized.

Copy link
Member Author

commented Oct 21, 2016

Merging based on 🐝 given in #120

@kohsuke kohsuke merged commit f911511 into stable-2.x Oct 21, 2016
@kohsuke kohsuke deleted the JENKINS-39150 branch Oct 21, 2016
* that the other side has made to us but not yet returned yet.
*
* @since 2.26.3
*/

This comment has been minimized.

Copy link
@amuniz

amuniz Oct 21, 2016

Member

Thanks for the Javadocs @kohsuke 👍

This comment has been minimized.

Copy link
@oleg-nenashev

oleg-nenashev Oct 24, 2016

Member

Does not work in such way

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Oct 22, 2016

Well, the Javadoc for the stable branch is not ideal. We are not supposed to provide new APIs in the stabilization branch, because it may cause binary incompatibilities for people upgrading from Jenkins LTS to later weekly releases. I'll fix it later before the release

@@ -494,7 +515,8 @@ protected Channel(ChannelBuilder settings, CommandTransport transport) throws IO

transport.setup(this, new CommandReceiver() {
public void handle(Command cmd) {
updateLastHeard();
commandsReceived++;

This comment has been minimized.

Copy link
@rsandell

rsandell Oct 24, 2016

Member

Risk of overflow?

@rsandell

This comment has been minimized.

Copy link
Member

commented Oct 24, 2016

Would it be worth adding JMX beans for this?

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Oct 24, 2016

#120 (comment) in the original PR has been also addressed improperly. Needs fix

oleg-nenashev added a commit to oleg-nenashev/remoting that referenced this pull request Oct 25, 2016
oleg-nenashev added a commit to oleg-nenashev/jenkins that referenced this pull request Nov 9, 2016
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements.

Bugfixes:

* [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) -
Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`.
It was causing connection failures of Jenkins JNLP agents when using Java Web Start.
([PR jenkinsci#131](jenkinsci/remoting#131))
* [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) -
`hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed.
([PR jenkinsci#131](jenkinsci/remoting#131))

Improvements:

* [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) -
Add logic for dumping diagnostics across all the channels.
([PR jenkinsci#122](jenkinsci/remoting#122), [PR jenkinsci#125](jenkinsci/remoting#125))
* [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) -
Improve the caller/callee correlation diagnostics in thread dumps.
([PR jenkinsci#119](jenkinsci/remoting#119))
* [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) -
Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes).
([PR jenkinsci#123](jenkinsci/remoting#123))
* [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) -
Add extra logging to help diagnosing `IOHub` Thread spikes.
([PR jenkinsci#116](jenkinsci/remoting#116))
* [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) -
 When a proxy fails, report what caused the channel to go down.
([PR jenkinsci#128](jenkinsci/remoting#128))
oleg-nenashev added a commit to jenkinsci/jenkins that referenced this pull request Nov 10, 2016
The change introduces one serious bugfix (JENKINS-39596) and a bunch of various diagnostics improvements.

Bugfixes:

* [JENKINS-39596](https://issues.jenkins-ci.org/browse/JENKINS-39596) -
Jenkins URL in `hudson.remoting.Engine` was always `null` since `3.0`.
It was causing connection failures of Jenkins JNLP agents when using Java Web Start.
([PR #131](jenkinsci/remoting#131))
* [JENKINS-39617](https://issues.jenkins-ci.org/browse/JENKINS-39617) -
`hudson.remoting.Engine` was failing to establish connection if one of the URLs parameter in parameters was malformed.
([PR #131](jenkinsci/remoting#131))

Improvements:

* [JENKINS-39150](https://issues.jenkins-ci.org/browse/JENKINS-39150) -
Add logic for dumping diagnostics across all the channels.
([PR #122](jenkinsci/remoting#122), [PR #125](jenkinsci/remoting#125))
* [JENKINS-39543](https://issues.jenkins-ci.org/browse/JENKINS-39543) -
Improve the caller/callee correlation diagnostics in thread dumps.
([PR #119](jenkinsci/remoting#119))
* [JENKINS-39290](https://issues.jenkins-ci.org/browse/JENKINS-39290) -
Add the `org.jenkinsci.remoting.nio.NioChannelHub.disabled` flag for disabling NIO (mostly for debugging purposes).
([PR #123](jenkinsci/remoting#123))
* [JENKINS-38692](https://issues.jenkins-ci.org/browse/JENKINS-38692) -
Add extra logging to help diagnosing `IOHub` Thread spikes.
([PR #116](jenkinsci/remoting#116))
* [JENKINS-39289](https://issues.jenkins-ci.org/browse/JENKINS-39289) -
 When a proxy fails, report what caused the channel to go down.
([PR #128](jenkinsci/remoting#128))
oleg-nenashev added a commit that referenced this pull request Nov 8, 2017
* [JENKINS-37566] - FindBugs: Unclosed stream in hudson.remoting.Capability

* [JENKINS-37566] - FindBugs: Stream not closed on Exception path in InitializeJarCacheMain#copyFile

* [JENKINS-37566] - Fix Util#makeResource and remove obsolete hack

* Extra Util#makeResource() polishing

* Deprecate obsolete Util#mkdirs()

* Exceptional case in setLastModifiedTime

* Handle exception case during temp file deletion in FileSystemJarCache

* Synchronize ProxyWriter#closed in write()

* Synchronization of ProxyWriter#channel

* Get rid of the obsolete collection in Channel#properties, fix synchronization

* Checksum#calculateFor() - no need to put if absent since there is a check above

* UWF_UNWRITTEN_FIELD in ExportTable$Entry

* NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE in JnlpAgentEndpointResolver.
Now we use a more agressive check

* UWF_FIELD_NOT_INITIALIZED_IN_CONSTRUCTOR in org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort

* UPM_UNCALLED_PRIVATE_METHOD - hudson.remoting.ChunkedOutputStream.frameSize()

* FindBugs: Revert the stream closure at emoting Capability, just a design to be reworked

* Unrealistic NP_NULL_ON_SOME_PATH in NioChannelHub

* EI_EXPOSE_REP in DiagnosedStreamCorruptionException (diagnostics code)

* SE_TRANSIENT_FIELD_NOT_RESTORED in PreloadJarTask, RemoteClassLoader, RemoteInvocationHandler and UserRequest

* Leftover false-positive UG_SYNC_SET_UNSYNC_GET in Channel

* EI_EXPOSE_REP in ChannelBuilder#properties

* Suppress DMI_NONSERIALIZABLE_OBJECT_WRITTEN in ClassLoaderHolder#writeObject()

* Better handling of Streams in hudson.remoting.Capability (still weird)

* Fix the Util#makeResource() behavior for nested folders

* FileSystemJarCache: tmp file may be renamed

* JENKINS-37566 - Modification of Channel#dumpDiagnostics() actually is not required after the merge of #122

* Disable DP_DO_INSIDE_DO_PRIVILEGED as per feeback from @kohsue in #118

* Fix the FindBugs filter

* [NP_NULL_ON_SOME_PATH] - Prevent NPE in ExportTable#diagnoseInvalidObjectId()

* [RV_RETURN_VALUE_IGNORED_BAD_PRACTICE] - Propagate exceptions when FileSystemJarCache#retrieve() fails to retrieve the target

* [DP_CREATE_CLASSLOADER_INSIDE_DO_PRIVILEGED] - Ignore the warning as per discussion in #118

* [VO_VOLATILE_INCREMENT] - Suppress warning in Channel#send()

* FindBugs - Which#jarFile() should not suppress all exceptions

* FindBugs - Suppress URF_UNREAD_FIELD in Pipewindow#Real

* [JENKINS-37566] - Channel#properties should be a ConcurrentHashMap

Reason - prevent possible delays and deadlocks in the getter method.
Addresses the feedback from @stephenc

* [JENKINS-37566] - DiagnosedStreamCorruptionException readAhead/readBack handlers should use a more efficient clone() command

* [JENKINS-37566] - Suppress UG_SYNC_SET_UNSYNC_GET after switching to the ConcurrentHashMap

* [JENKINS-37566] - hudson.remoting.Util should be tolerant against InvalidPathException

* [JENKINS-37566] - make ProxyWriter more asynchronous (follow-up to @stephenc’s review)

* [JENKINS-37566] - Add a comment about race condition as suggested by @stephenc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.