Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Opbeans" stage of release pipeline fails #2728

Closed
trentm opened this issue May 26, 2022 · 12 comments · Fixed by #2763
Closed

"Opbeans" stage of release pipeline fails #2728

trentm opened this issue May 26, 2022 · 12 comments · Fixed by #2763
Assignees
Labels
agent-nodejs Make available for APM Agents project planning.
Milestone

Comments

@trentm
Copy link
Member

trentm commented May 26, 2022

Recently in #2625 we automated releases: when a version tag ("vN.N.N") is pushed, a Jenkins "Release" stage will build and publish the Lambda layer, do a GitHub release, npm publish, and attempt to updating opbeans-node.git to use this new APM agent release.

That "Opbeans" stage is flaky (or perhaps fails every time), as discussed here: #2625 (comment)
This issue is about making the release process reliable by doing something about this stage.

The Opbeans stage effectively does this: #2723 (comment)

Options

Option 1: npm publish early and hope

Do the 'npm publish' step earlier in the pipeline and hope that the lambda layer publishing steps take enough time that the Opbeans stage will work then.

I don't love this idea because relying on "hope" means that it may fail sometime, just less frequently, which just means a more subtle bug. Also see the "timeout" discussion below.

Option 2: wait for npm install to work

Add a spin loop at the start of the Opbeans stage process to retry the npm install if it gets an ETARGET with a timeout to account for being run soon after a publish.

The "ETARGET" is referring to the specific error you get from npm install when this issue happens:

[2022-05-25T21:23:57.820Z] + CI=true npm install --ignore-scripts elastic-apm-node@3.34.0
[2022-05-25T21:23:59.440Z] npm ERR! code ETARGET
[2022-05-25T21:23:59.440Z] npm ERR! notarget No matching version found for elastic-apm-node@3.34.0.
[2022-05-25T21:23:59.440Z] npm ERR! notarget In most cases you or one of your dependencies are requesting
[2022-05-25T21:23:59.440Z] npm ERR! notarget a package version that doesn't exist.

Theoretically this option would be straightforward to implement, but what should that timeout be? Granted the issue is old (from 2018) but user reports from npm/npm#20574 suggest that the time for all npm servers to update could be an hour or more. That's too long to have as a timeout in a release process.

Option 3: use dependabot to update opbeans

Configure dependabot to look for an agent update daily.

Some issues with this:

  • The current "bump-version.sh" script also updates a label in the repo's Dockerfile, which dependabot will not update. So either we drop using that label, or an option would be to have a separate lint GitHub check that fails the dependabot PR until it is manually updated to tweak the Dockerfile as well. This is pretty indirect and laborious.
  • There is no way to have this process create a git tag on the opbeans repo, which the current process is currently doing. I am not sure those git tags are being used. They do result in tagged builds of the opbeans Docker image builds (see https://hub.docker.com/r/opbeans/opbeans-node/tags). However, I'm not sure if anyone uses anything but the "latest" of those docker images.

Option 4: use a Jenkins pipeline in the opbeans repo

Add a stage to the Jenkinsfile in the opbeans repo(s) on a cron(@daily) to look for a new agent version, then do the update, commit, and tag.

I don't see any issues with this approach other than:

  • It means that a new opbeans update (and Docker image build) will take up to a day after an agent release.
  • It will take some dev effort to make this work.

This is my current preferred option.

@elastic/observablt-robots @astorm Thoughts?

@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label May 26, 2022
@pazone
Copy link
Contributor

pazone commented May 30, 2022

@trentm I'd propose to use wait-on and target it to current releasing version like this one https://www.npmjs.com/package/elastic-apm-node/v/3.34.0.
If you are agree to proceed I can do it on this week

@trentm
Copy link
Member Author

trentm commented May 30, 2022

@pazone The package version showing up at https://www.npmjs.com/package/elastic-apm-node/v/VERSION doesn't guarantee that all the npm mirrors will be updated such that npm install elastic-apm-node@VERSION will work -- at least I don't believe so.

Also from above:

user reports from npm/npm#20574 suggest that the time for all npm servers to update could be an hour or more. That's too long to have as a timeout in a release process.

Perhaps that is outdated and in practice the wait time is less than that. However, I think it would be unfortunate for the agent release process to fail because there is some issue or slowness with one or more npm package mirrors/CDN nodes. It feels architecturally cleaner to have the responsibility of updating opbeans-node be a something that lives with the opbeans-node repo.

@pazone
Copy link
Contributor

pazone commented May 31, 2022

We can do npm install elastic-apm-node@VERSION with some time interval and limitation until the package is successfully installed.

If it still takes ~1h we can trigger the opbeans job after 1h timeout.

@trentm
Copy link
Member Author

trentm commented May 31, 2022

I'm willing to try that, and thanks very much for offering to implement that. However, if we do find it takes up to 1h for the "Opbeans" stage of the agent release pipeline to complete, then I think we should revisit this and remove the "Opbeans" stage.

If it still takes ~1h we can trigger the opbeans job after 1h timeout.

What "opbeans job" do you mean here?

@estolfo estolfo added this to the 8.4 milestone Jun 1, 2022
@elastic-apm-tech elastic-apm-tech added this to Planned in APM-Agents (OLD) Jun 1, 2022
@Mpdreamz
Copy link
Member

Mpdreamz commented Jun 2, 2022

I'd personally prefer Option 4 as well.

Opbeans is not a public artifact that is tied to this repository. It should not influence our ability to execute the release of the agent IMO. Moving the opbeans update completely out of band seems appropriate.

@pazone
Copy link
Contributor

pazone commented Jun 2, 2022

Ok let's consider it as plan A

@trentm
Copy link
Member Author

trentm commented Jun 2, 2022

@pazone Is this something you or your team will have time to work on soon? If not, please let me know and I can take a stab at it.

Given the current opbeans-node Jenkinsfile (https://github.com/elastic/opbeans-node/blob/main/.ci/Jenkinsfile) is just a call to an apm-pipeline-library function, I'm not sure what the preferred approach would be to supporting this.

@cachedout
Copy link
Contributor

@trentm Hi Trent. In looking at this team's workload, it would be hard for us to get this in within the next month or so. We're happy to do it of course but if your needs are more urgent, you might want to take a swing at it. Happy to chat more to help get this prioritized correctly though. LMK.

@trentm
Copy link
Member Author

trentm commented Jun 8, 2022

@cachedout Thanks and understood. I'll take a stab at it and get review from y'all.

As a sanity check, my plan is to add an optional stage('Update Agent Dep') { to opbeansPipeline here: https://github.com/elastic/apm-pipeline-library/blob/main/vars/opbeansPipeline.groovy#L193 that will handle updating the APM agent dep if there is a new one available. It will be off by default so the opbeans-FOO.git repos that are using opbeansPipeline() can opt into it. It will expect a new .ci/avail-agent-update-ver.sh script (beside the existing .ci/bump-version.sh script) in each opbeans repo that will use it. Please let me know if this sounds crazy. :)

trentm added a commit that referenced this issue Jun 9, 2022
Responsibility for updating the elastic-apm-node dep in
opbeans-node will move to *opbeans-node*'s CI.

Refs: elastic/opbeans-node#164
Fixes: #2728
@trentm trentm self-assigned this Jun 9, 2022
@trentm trentm moved this from Planned to In Progress in APM-Agents (OLD) Jun 9, 2022
@v1v
Copy link
Member

v1v commented Jun 13, 2022

Hi all,

If Option 4: use a Jenkins pipeline in the opbeans repo is the one chosen, and it's required to run every X hours, then I'd say to use a similar approach but a bit different:

  1. Create a new pipeline in the opbeans-node.git
  2. This pipeline runs cronly only for the main branch
  3. The pipeline does look for a new version and if so:
    a) Bump the version in the main branch
    b) Create the tag.

By splitting the bump from the main pipeline then there is no need to change the opbeansPipeline step, in addition, we can refactor if need it in the future. Otherwise, the same pipeline will be looking after two different concerns

What do you think?

APM-Agents (OLD) automation moved this from In Progress to Done Jun 13, 2022
trentm added a commit that referenced this issue Jun 13, 2022
Responsibility for updating the elastic-apm-node dep in
opbeans-node will move to *opbeans-node*'s CI.

Refs: elastic/opbeans-node#164
Fixes: #2728
@trentm
Copy link
Member Author

trentm commented Jun 13, 2022

What do you think?

@v1v That sounds fine to me.

A question about Jenkinsfile syntax: Can a Jenkinsfile have multiple top-level pipeline { ... } blocks? E.g. can I have something like this for https://github.com/elastic/opbeans-node/blob/main/.ci/Jenkinsfile

#!/usr/bin/env groovy
@Library('apm@current') _

opbeansPipeline()

pipeline {
  agent { label 'linux && immutable' }
  // ... my new pipeline for doing the update and tagging
}

?

@trentm
Copy link
Member Author

trentm commented Jun 13, 2022

@v1v elastic/opbeans-node#163 is my attempt at doing this.

trentm added a commit to elastic/opbeans-node that referenced this issue Jun 15, 2022
…gent dep (#163)

This adds a second "Opbeans Node Bump" Jenkins job that runs weekly on the
"main" branch. It checks for an available agent update (a published version
newer than what is in the current package-lock file), and if there is one it:
bumps to that ver, pushes, and git tags with "v$VERSION". The push and tag
will trigger the usual opbeans Jenkins pipeline to publish docker images.

Refs: elastic/apm-agent-nodejs#2728
Fixes: #164
Co-authored-by: Victor Martinez <VictorMartinezRubio@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
Development

Successfully merging a pull request may close this issue.

6 participants