Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-70103] Clear websocket sessions on connection close or error #7406

Closed
wants to merge 1 commit into from

Conversation

amuniz
Copy link
Member

@amuniz amuniz commented Nov 18, 2022

This is causing a memory leak, as sessions pile up there and the GC does not collect them.

See JENKINS-70103 for more information.

I'll file a follow up PR in remoting to slow down connection re-try at agent side. There is no point on trying hundreds of times per second.

Testing done

AFAICT the current test setup is not ready to test an agent running on Java 8 trying to connect to Jenkins on Java 11, so I did some manual tests:

  1. Run Jenkins on Java 11.
  2. Create an agent using websocket connection.
  3. Run the agent (an older version that supports Java 8) on Java 8.
  4. The agent connects and immediately disconnects (because of the Java version mismatch). There are hundreds of connection re-tries per second.
  5. Kill the agent process.
  6. Using the Jenkins script console verify that jenkins.websocket.Jetty10Provider.sessions is clear.

Proposed changelog entries

  • [JENKINS-70103] Clear websocket sessions on agent connection close or error, otherwise sessions are accumulated causing a memory leak.

Proposed upgrade guidelines

N/A

Submitter checklist

  • The Jira issue, if it exists, is well-described.
  • The changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developers, depending on the change) and are in the imperative mood (see examples).
    • Fill in the Proposed upgrade guidelines section only if there are breaking changes or changes that may require extra steps from users during upgrade.
  • There is automated testing or an explanation as to why this change has no tests.
  • New public classes, fields, and methods are annotated with @Restricted or have @since TODO Javadocs, as appropriate.
  • New deprecations are annotated with @Deprecated(since = "TODO") or @Deprecated(forRemoval = true, since = "TODO"), if applicable.
  • New or substantially changed JavaScript is not defined inline and does not call eval to ease future introduction of Content Security Policy (CSP) directives (see documentation).
  • For dependency updates, there are links to external changelogs and, if possible, full differentials.
  • For new APIs and extension points, there is a link to at least one consumer.

Desired reviewers

@jenkinsci/core-pr-reviewers

Maintainer checklist

Before the changes are marked as ready-for-merge:

  • There are at least two (2) approvals for the pull request and no outstanding requests for change.
  • Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
  • Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
  • Proper changelog labels are set so that the changelog can be generated automatically.
  • If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
  • If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

This is causing a memory leak, as sessions pile up there and the GC does
not collect them. See [JENKINS-70103](https://issues.jenkins.io/browse/JENKINS-70103) for more information.
@res0nance res0nance added the bug For changelog: Minor bug. Will be listed after features label Nov 18, 2022
Copy link
Member

@Vlatombe Vlatombe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine so far as it goes but you are leaving

private static final Map<Listener, Session> sessions = Collections.synchronizedMap(new WeakHashMap<>());
which is wrong. Either the value refers to the key (please provide a reference chain dump), which would explain the bug, in which case please switch to a plain HashMap or IdentityHashMap for clarity; or it does not, in which case the bug has not been explained.

@@ -147,6 +147,7 @@ public void onWebSocketText(String message) {
@Override
public void onWebSocketClose(int statusCode, String reason) {
listener.onWebSocketClose(statusCode, reason);
sessions.remove(listener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know this will be called before

public void close() throws IOException {
session().close();
?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Do you have some insight?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. You would need to test it.

@amuniz
Copy link
Member Author

amuniz commented Nov 18, 2022

the value refers to the key

AFAICT by inspecting a heap dump, that's the case. Reference chain is:
reference-chain

The referent (key) in the WeakHashMap#Entry is referenced down the chain from the value (a WebSocketSession)

@jglick
Copy link
Member

jglick commented Nov 18, 2022

Then cease use of WeakHashMap as it cannot work.

Ideally we could reproduce this, say in JNLPLauncherRealTest.webSocket, and use MemoryAssert to verify that the session map is cleared as expected.

@jglick
Copy link
Member

jglick commented Nov 18, 2022

The problem is

// TODO does not seem possible to use HttpServletRequest.get/setAttribute for this
req.setAttribute(ATTR_LISTENER, listener);
Listener listener = (Listener) req.getHttpServletRequest().getAttribute(ATTR_LISTENER);
If we cannot use get/setAttribute to look up the Session for a given HttpServletRequest, perhaps it works to use a WeakHashMap keyed by HttpServletRequest?

@jglick
Copy link
Member

jglick commented Nov 18, 2022

Nope:

storing session in org.eclipse.jetty.websocket.server.internal.DelegatedServerUpgradeRequest@5ebb565a with attrs [org.kohsuke.stapler.compression.CompressionServletResponse, jenkins.websocket.Jetty10Provider.listener, org.eclipse.jetty.server.HttpTransport.UPGRADE, org.kohsuke.stapler.EvaluationTrace, org.kohsuke.stapler.compression.CompressionFilter]

vs.

looking up session in org.kohsuke.stapler.RequestImpl@7b73cc65 with attrs []

Still looking into options.

@jglick
Copy link
Member

jglick commented Nov 18, 2022

I think I found a better approach.

@timja timja added the proposed-for-close There is no consensus about feasibility of this PR, and maintainers do not see a way forward for it label Nov 20, 2022
@basil
Copy link
Member

basil commented Nov 20, 2022

Closing in favor of #7412.

@basil basil closed this Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug For changelog: Minor bug. Will be listed after features proposed-for-close There is no consensus about feasibility of this PR, and maintainers do not see a way forward for it
Projects
None yet
6 participants