Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-19022] add extension to stop storing buildsByBranch history #568

Closed

Conversation

Projects
None yet
3 participants
@jacob-keller
Copy link

commented Jan 19, 2018

We store a new copy of the entire history buildsByBranch every build, as
part of BuildData. This is used for a few cases of determining which
revisions already built, and how to select the branch. Unfortunately, it
bloats the build.xml and takes time to process, as it grows indefinitely
under some use-cases, such as using the GerritTrigger plugin, in which
every build is essentially run on a different branch.

To avoid this, add an extension which allows disabling the saving of
this buildsByBranch history, only maintaining the current data for each
build.

Add tests to make sure that buildsByBranchName tracks history by
default, and doesn't track history when the extension is enabled.

Signed-off-by: Jacob Keller jacob.e.keller@intel.com

@MarkEWaite

This comment has been minimized.

Copy link

commented Jan 19, 2018

The change looks reasonable.

Shouldn't there also be a trait added in src/main/java/jenkins/plugins/git/traits/ to allow the extension to be selected when defining a Git branch source for a multi-branch pipeline?

Should the trait be enabled by default for multi-branch pipelines so that we reduce the bloat by default?

@stephenc is the maintainer of that code. I'd love to have his comments if we should allow Freestyle projects and matrix projects to optionally limit the amount of BuildData they track, or if there is some reason to not allow this extension.

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Jan 19, 2018

I can take a look at adding a trait tomorrow.

Personally, I'd vote that we look for a path towards deprecation and finding alternative solutions for things that need this info (possibly such as computing it on the fly themselves instead of storing it for EVERY build.

Ideally we'd find a way to migrate to per-project data, possibly by looking up the project from the build. However, even if we do this, we still end up storing at least one copy per project, (which is still a large time waste in the build processing for gerrit triggered projects at least), and we additionally have to handle any parallel builds issues that crop up.

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Jan 19, 2018

Should the trait be enabled by default for multi-branch pipelines so that we reduce the bloat by default?

In terms of this, I do not know. Possibly we should, because that would ensure that no administrator would run into the issue. The downside would be if there is a use-case for needing the build data in this setup we'd have to have a way to disable it.

@jacob-keller jacob-keller force-pushed the jacob-keller:jk-build-by-branch-history branch from f508e2b to 79226dc Jan 19, 2018

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Jan 19, 2018

Ok, I added a trait for this as well.

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Jan 19, 2018

For SCMSource there should be zero requirements for it. Branch-api ignores it because it has its own tracking of the last revision and only ever builds one branch on any job.

just in case there is a crazy user, we should make it possible for the crazy user to restore the behaviour... but the crazy user needs to pay a cost for their crazy... namely the crazy user will have to write an extension plugin that defines a trait to restore the crazy.

The git plugin should not encourage crazy by exposing silly options in the UI. But it’s API can enable users to do silly if they really want it where that enablement is facile and low risk - which is the case here imho

I'm not sure how to do that. @MarkEWaite thoughts?

@jacob-keller jacob-keller force-pushed the jacob-keller:jk-build-by-branch-history branch 2 times, most recently from dd2df7d to b51819e Jan 19, 2018

Jacob Keller
[JENKINS-19022] add extension to stop storing buildsByBranch history
We store a new copy of the entire history buildsByBranch every build, as
part of BuildData. This is used for a few cases of determining which
revisions already built, and how to select the branch. Unfortunately, it
bloats the build.xml and takes time to process, as it grows indefinitely
under some use-cases, such as using the GerritTrigger plugin, in which
every build is essentially run on a different branch.

To avoid this, add an extension which allows disabling the saving of
this buildsByBranch history, only maintaining the current data for each
build.

Add tests to make sure that buildsByBranchName tracks history by
default, and doesn't track history when the extension is enabled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

@jacob-keller jacob-keller force-pushed the jacob-keller:jk-build-by-branch-history branch from b51819e to 194ad0a Mar 16, 2018

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Mar 16, 2018

@stephenc suggested that we should instead do an inverse trait where we only save build-by-branch history if we have an extension but avoid actually exposing such an option to the user.

Does this makes sense to you, @MarkEWaite? I'm ok with that solution, and I definitely prefer an opt-in since it means we don't have to tell everyone to "opt-out" of crazy.

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Mar 16, 2018

@stephenc you said

We shouldn’t store the build branch history in multibranch. Enable putting an inverse trait in an extension plugin for people that really need it is the most we should do here

I sort of get the idea of what you mean by this, but I have no clue what you mean for how to implement in this way?

Like do you mean define the behavior as an extension point of the git plugin?

@MarkEWaite

This comment has been minimized.

Copy link

commented Mar 16, 2018

That makes sense to me, especially considering the number of times we've failed in our attempts to remove the storage of that history in every build record and the complaints about the memory bloat from the git plugin as a result of those records.

We'd likely need to provide the additional plugin immediately which allows that trait, since there will be some set of users that complain and want there existing behavior restored (even as bloated as it is)

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Mar 16, 2018

I'm going to see if I can remove the data and provide extension points for adding it back in. I'm definitely not sure how to actually go about creating the additional plugin (my Jenkins extension fu is not very strong here...) Help or pointers to resource that might help (or an example of something similar) would be appreciated.

@MarkEWaite

This comment has been minimized.

Copy link

commented Mar 16, 2018

This one is further complicated by my lack of a good set of use cases which depend on the branch history stored in the build records. I suspect I'm actively using one or more of those cases, but am blissfully unaware that I'm depending on bloated build records.

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Mar 16, 2018

This one is further complicated by my lack of a good set of use cases which depend on the branch history stored in the build records. I suspect I'm actively using one or more of those cases, but am blissfully unaware that I'm depending on bloated build records.

I agree. I looked at trying to just remove the build data completely as a first step, and I found that we have a ton of references to it.

Then I realized: why store a map per build when we could generate the map by looking at the Job's getBuilds() and grabbing each action and re-populating the build.

However, this breaks down because we pass BuildData into things like the buildChooser()s which (presumably) depend on that data...

I don't know the right way to deprecate and move on from that API...

I really like the idea of removing the build data contents and instead just rebuilding the map from scratch, but I don't know how to handle the cases like build choosers.

I want to get this cleaned up at the git plugin level entirely so that we can move on from the mistake...

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Mar 16, 2018

Ok, I have an alternative solution I'm going to push: Instead of storing the complete BuildData() object as an action, store just a BuildDetails which only tracks one build.

Then, we can generate the BuildData by looping over all the builds and building up the buildsByBranchName map. I am not sure the exact performance loss from re-building the map every time, but that should prevent the bloat of saving multiple copies of that map every single build.

@jacob-keller

This comment has been minimized.

Copy link
Author

commented Mar 16, 2018

I don't believe this is the correct approach anymore, I've submitted a separate pull request with an alternative solution.

@tsuna

This comment has been minimized.

Copy link

commented Mar 19, 2018

For those of us anxiously waiting for this long standing problem to be resolved and looking for that separate PR, it's here: #578

Thanks for helping fix this issue!

@MarkEWaite MarkEWaite added this to the 4.0 milestone Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.