constrain an agents memory #380

jtnord · 2022-01-19T12:06:32Z

Noticed when looking at the memory usage of pipeline-model-defintion
that the agents created in the tests did not have a constrained memory
set. This means that the tests would be platform dependent. or flaky
depending on what GC happens at various times.

the number 512 is relatively arbitrary, but k8s agents use 256Mi as a
request and so 512 is double that, and the agent process is supposed to
be lightweight.

Whilst this could reveal a few issues in plugins when they upgrade, it
seems to me that that could be a bug in the plugin, and so I have not
implemented a way to create an online slave with a specific amount of
memory. if that is indeed required it can be added later as jth updates
are not forced on plugins.

here is a before from a pipeline-model-definition test (nearly 3gb claimed for the heap...) (and there are 4 agents in this test!):

and after :

Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
Ensure that the pull request title represents the desired changelog entry
Please describe what you did
Link to relevant issues in GitHub or Jira
Link to relevant pull requests, esp. upstream and downstream changes
Ensure you have provided tests - that demonstrates feature works or fixes the issue

Notivced when looking at the memory usage of pipeline-model-defintion that the agents created in the tests did not have a constrained memory set. This means that the tests would be platform dependent. or flaky depending on what GC happens at various times. the number 512 is relativly arbitrary, but k8s agents use 256Mi as a request and so 512 is double that, and the agent process is supposed to be lightweight. Whilst this could reveal a few issues in plugins when they upgrade, it seems to me that that could be a bug in the plugin, and so I have not implemented a way to create an online slave with a specific amount of memory. if that is indeed required it can be added later as jth updates are not forced on plugins.

src/main/java/org/jvnet/hudson/test/HudsonTestCase.java

Co-authored-by: Tim Jacomb <21194782+timja@users.noreply.github.com>

basil · 2022-01-19T13:19:08Z

I think a realistic test of this PR would involve the following:

(Regression test) Running the core test suite with the incremental from this PR
(Regression test) Running a plugin's test suite with the incremental from this PR and a Jenkinsfile, assuming that:
a) The plugin has test cases that create 1-2 agents
b) The plugin is running on our standard buildPlugin(useContainerAgent: true) setup
c) The plugin's tests currently pass
(Fix test) Find a plugin with a test case that had to be disabled on Windows ACI container agents due to the fact that it created too many agents and used up too much memory (ExtendedEmailPublisherMatrixTest is a good one), enable the test on Windows ACI container agents to observe that it still fails, and then confirm that with this fix the test case now uses less memory and passes on Windows ACI container agents

jglick · 2022-01-19T13:25:28Z

Yes we could do it that way rather than just waiting to see if there are any regressions and reverting if so. Any volunteers?

jtnord · 2022-01-19T13:26:14Z

Yes we could do it that way rather than just waiting to see if there are any regressions and reverting if so. Any volunteers?

I am creating some PRs

timja · 2022-01-19T13:57:30Z

Yes we could do it that way rather than just waiting to see if there are any regressions and reverting if so. Any volunteers?

I am creating some PRs

good luck while jenkins-infra/helpdesk#2733 is a thing

jtnord · 2022-01-19T14:01:27Z

CI build failed due to a timeout (50+ minutes before even starting the tests) which looks like the artifactory issue) so there is no incremental to attempt to use at the moment 😢

jtnord · 2022-01-19T15:31:02Z

I was thinking this would most likely break the maven-plugin as that does horrible things with maven in the agent process but we are in luck - it copies the JTH code 🥳
/cc @aheritier incase he want to make similar changes in the job.

jtnord · 2022-01-21T13:22:49Z

@basil

(Regression test) Running the core test suite with the incremental from this PR

jenkinsci/jenkins#6192 exists and has 2 failures - both have been observed before this change.
Also given jenkinsci/jenkins#6198 (review) that the previous release has been reverted it is unclear how to demonstrate this further in jenkins core given the current state.

(Regression test) Running a plugin's test suite with the incremental from this PR and a Jenkinsfile, assuming that:

pipeline-model-definition does that and there is a PR that fixes a lot of tests.

also given the extended issue with artifactory it is not clear if I can demonstrate this any further before getting a release?

timja · 2022-01-21T13:25:39Z

Shouldn't we identify the cause of the regression and revert the fix forward before adding more changes to this module?

jtnord · 2022-01-21T13:28:45Z

Shouldn't we identify the cause of the regression and revert the fix forward before adding more changes to this module?

I assumed given Basil reverted it that he is working on it somewhere. there are so few things in the change the only thing that looked applicable was the junit5 stuff (#375).

the Jenkins build is also no longer happy regardless of the JTH version (windows ACI agent issue) and given this does not seem to impact everyone, merging this whilst the above investigation goes on should not be a blocker should it (no one is forcing this update on anyone - it is purely opt in and given the changelog should not impact any reverting or fixing due to conflicts)?

Also to note at least one plugin (that is pretty important) is getting much less test coverage than it should because of this issue jenkinsci/pipeline-model-definition-plugin#477

do we really think an agent should have more than 512MB in production, let alone testing (save the Maven project type?) that we think this is too risky to accept?

basil · 2022-01-21T15:41:27Z

it is unclear how to demonstrate this further in jenkins core given the current state.

You could demonstrate this PR in core by rebasing it on top of the older version of jenkins-test-harness that core is currently using.

You have also not addressed my third point about Windows ACI tests like ExtendedEmailPublisherMatrixTest.

Shouldn't we identify the cause of the regression and revert the fix forward before adding more changes to this module?

Agreed. The reason jenkins-test-harness is in a bad state is that previous PRs were merged without sufficient testing. The bad state should be corrected by reverting the problematic PR, and we should avoid getting into a bad state going forward by doing more thorough testing of PRs (like this one) prior to merge.

I assumed given Basil reverted it that he is working on it somewhere.

Do not make assumptions about what I am working on. I am not working on stabilizing jenkins-test-harness.

jtnord · 2022-01-21T15:50:52Z

You have also not addressed my third point about Windows ACI tests like ExtendedEmailPublisherMatrixTest.

which was

Fix test) Find a plugin with a test case that had to be disabled on Windows ACI container agents due to the fact that it created too many agents and used up too much memory (ExtendedEmailPublisherMatrixTest is a good one), enable the test on Windows ACI container agents to observe that it still fails, and then confirm that with this fix the test case now uses less memory and passes on Windows ACI container agents

see jenkinsci/pipeline-model-definition-plugin#480 - the tests where disabled on windows and it uses aci agents. I did not choose the plugin you suggested but this does as far as I am aware cover the ask.
granted I did not re-enable the tests to show it failed first - but you only disabled them in a PR that was merged the same day so figured that would be overkill.

jtnord requested review from basil, jglick and timja January 19, 2022 12:06

jtnord mentioned this pull request Jan 19, 2022

Suppress MatrixTest on Windows CI jenkinsci/pipeline-model-definition-plugin#478

Closed

timja reviewed Jan 19, 2022

View reviewed changes

src/main/java/org/jvnet/hudson/test/HudsonTestCase.java Outdated Show resolved Hide resolved

Update src/main/java/org/jvnet/hudson/test/HudsonTestCase.java

41c59b7

Co-authored-by: Tim Jacomb <21194782+timja@users.noreply.github.com>

raul-arabaolaza approved these changes Jan 19, 2022

View reviewed changes

jglick approved these changes Jan 19, 2022

View reviewed changes

jglick added the bug label Jan 19, 2022

jglick enabled auto-merge January 19, 2022 13:15

basil disabled auto-merge January 19, 2022 13:19

timja approved these changes Jan 19, 2022

View reviewed changes

jtnord added a commit to jtnord/jenkins that referenced this pull request Jan 19, 2022

testing jenkinsci/jenkins-test-harness#380

60dab21

jtnord mentioned this pull request Jan 19, 2022

testing jenkins-test-harness#380 jenkinsci/jenkins#6192

Closed

10 tasks

jglick mentioned this pull request Jan 20, 2022

optimize memory usage to remove test failures jenkinsci/pipeline-model-definition-plugin#480

Merged

jtnord closed this Jan 21, 2022

jtnord reopened this Jan 21, 2022

olamy approved these changes Jan 23, 2022

View reviewed changes

jtnord mentioned this pull request Jan 24, 2022

Revert "[JENKINS-48466] Provide JUnit 5 support for JenkinsRule" #381

Merged

jtnord merged commit 6b1e34c into jenkinsci:master Jan 26, 2022

jtnord deleted the constrain-agent-memory branch January 26, 2022 12:20

basil mentioned this pull request May 2, 2022

[JENKINS-26398] More flexible versions of createSlave #428

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

constrain an agents memory #380

constrain an agents memory #380

jtnord commented Jan 19, 2022 •

edited

basil commented Jan 19, 2022

jglick commented Jan 19, 2022

jtnord commented Jan 19, 2022

timja commented Jan 19, 2022

jtnord commented Jan 19, 2022 •

edited

jtnord commented Jan 19, 2022

jtnord commented Jan 21, 2022 •

edited

timja commented Jan 21, 2022

jtnord commented Jan 21, 2022 •

edited

basil commented Jan 21, 2022

jtnord commented Jan 21, 2022 •

edited

constrain an agents memory #380

constrain an agents memory #380

Conversation

jtnord commented Jan 19, 2022 • edited

basil commented Jan 19, 2022

jglick commented Jan 19, 2022

jtnord commented Jan 19, 2022

timja commented Jan 19, 2022

jtnord commented Jan 19, 2022 • edited

jtnord commented Jan 19, 2022

jtnord commented Jan 21, 2022 • edited

timja commented Jan 21, 2022

jtnord commented Jan 21, 2022 • edited

basil commented Jan 21, 2022

jtnord commented Jan 21, 2022 • edited

jtnord commented Jan 19, 2022 •

edited

jtnord commented Jan 19, 2022 •

edited

jtnord commented Jan 21, 2022 •

edited

jtnord commented Jan 21, 2022 •

edited

jtnord commented Jan 21, 2022 •

edited