"Opbeans" stage of release pipeline fails #2728

trentm · 2022-05-26T23:05:45Z

Recently in #2625 we automated releases: when a version tag ("vN.N.N") is pushed, a Jenkins "Release" stage will build and publish the Lambda layer, do a GitHub release, npm publish, and attempt to updating opbeans-node.git to use this new APM agent release.

That "Opbeans" stage is flaky (or perhaps fails every time), as discussed here: #2625 (comment)
This issue is about making the release process reliable by doing something about this stage.

The Opbeans stage effectively does this: #2723 (comment)

Options

Option 1: npm publish early and hope

Do the 'npm publish' step earlier in the pipeline and hope that the lambda layer publishing steps take enough time that the Opbeans stage will work then.

I don't love this idea because relying on "hope" means that it may fail sometime, just less frequently, which just means a more subtle bug. Also see the "timeout" discussion below.

Option 2: wait for npm install to work

Add a spin loop at the start of the Opbeans stage process to retry the npm install if it gets an ETARGET with a timeout to account for being run soon after a publish.

The "ETARGET" is referring to the specific error you get from npm install when this issue happens:

[2022-05-25T21:23:57.820Z] + CI=true npm install --ignore-scripts elastic-apm-node@3.34.0
[2022-05-25T21:23:59.440Z] npm ERR! code ETARGET
[2022-05-25T21:23:59.440Z] npm ERR! notarget No matching version found for elastic-apm-node@3.34.0.
[2022-05-25T21:23:59.440Z] npm ERR! notarget In most cases you or one of your dependencies are requesting
[2022-05-25T21:23:59.440Z] npm ERR! notarget a package version that doesn't exist.

Theoretically this option would be straightforward to implement, but what should that timeout be? Granted the issue is old (from 2018) but user reports from npm/npm#20574 suggest that the time for all npm servers to update could be an hour or more. That's too long to have as a timeout in a release process.

Option 3: use dependabot to update opbeans

Configure dependabot to look for an agent update daily.

Some issues with this:

The current "bump-version.sh" script also updates a label in the repo's Dockerfile, which dependabot will not update. So either we drop using that label, or an option would be to have a separate lint GitHub check that fails the dependabot PR until it is manually updated to tweak the Dockerfile as well. This is pretty indirect and laborious.
There is no way to have this process create a git tag on the opbeans repo, which the current process is currently doing. I am not sure those git tags are being used. They do result in tagged builds of the opbeans Docker image builds (see https://hub.docker.com/r/opbeans/opbeans-node/tags). However, I'm not sure if anyone uses anything but the "latest" of those docker images.

Option 4: use a Jenkins pipeline in the opbeans repo

Add a stage to the Jenkinsfile in the opbeans repo(s) on a cron(@daily) to look for a new agent version, then do the update, commit, and tag.

I don't see any issues with this approach other than:

It means that a new opbeans update (and Docker image build) will take up to a day after an agent release.
It will take some dev effort to make this work.

This is my current preferred option.

@elastic/observablt-robots @astorm Thoughts?

The text was updated successfully, but these errors were encountered:

pazone · 2022-05-30T15:54:28Z

@trentm I'd propose to use wait-on and target it to current releasing version like this one https://www.npmjs.com/package/elastic-apm-node/v/3.34.0.
If you are agree to proceed I can do it on this week

trentm · 2022-05-30T18:44:02Z

@pazone The package version showing up at https://www.npmjs.com/package/elastic-apm-node/v/VERSION doesn't guarantee that all the npm mirrors will be updated such that npm install elastic-apm-node@VERSION will work -- at least I don't believe so.

Also from above:

user reports from npm/npm#20574 suggest that the time for all npm servers to update could be an hour or more. That's too long to have as a timeout in a release process.

Perhaps that is outdated and in practice the wait time is less than that. However, I think it would be unfortunate for the agent release process to fail because there is some issue or slowness with one or more npm package mirrors/CDN nodes. It feels architecturally cleaner to have the responsibility of updating opbeans-node be a something that lives with the opbeans-node repo.

pazone · 2022-05-31T10:01:15Z

We can do npm install elastic-apm-node@VERSION with some time interval and limitation until the package is successfully installed.

If it still takes ~1h we can trigger the opbeans job after 1h timeout.

trentm · 2022-05-31T15:37:58Z

I'm willing to try that, and thanks very much for offering to implement that. However, if we do find it takes up to 1h for the "Opbeans" stage of the agent release pipeline to complete, then I think we should revisit this and remove the "Opbeans" stage.

If it still takes ~1h we can trigger the opbeans job after 1h timeout.

What "opbeans job" do you mean here?

Mpdreamz · 2022-06-02T09:44:10Z

I'd personally prefer Option 4 as well.

Opbeans is not a public artifact that is tied to this repository. It should not influence our ability to execute the release of the agent IMO. Moving the opbeans update completely out of band seems appropriate.

pazone · 2022-06-02T10:18:01Z

Ok let's consider it as plan A

trentm · 2022-06-02T16:11:33Z

@pazone Is this something you or your team will have time to work on soon? If not, please let me know and I can take a stab at it.

Given the current opbeans-node Jenkinsfile (https://github.com/elastic/opbeans-node/blob/main/.ci/Jenkinsfile) is just a call to an apm-pipeline-library function, I'm not sure what the preferred approach would be to supporting this.

cachedout · 2022-06-07T16:29:15Z

@trentm Hi Trent. In looking at this team's workload, it would be hard for us to get this in within the next month or so. We're happy to do it of course but if your needs are more urgent, you might want to take a swing at it. Happy to chat more to help get this prioritized correctly though. LMK.

trentm · 2022-06-08T23:43:49Z

@cachedout Thanks and understood. I'll take a stab at it and get review from y'all.

As a sanity check, my plan is to add an optional stage('Update Agent Dep') { to opbeansPipeline here: https://github.com/elastic/apm-pipeline-library/blob/main/vars/opbeansPipeline.groovy#L193 that will handle updating the APM agent dep if there is a new one available. It will be off by default so the opbeans-FOO.git repos that are using opbeansPipeline() can opt into it. It will expect a new .ci/avail-agent-update-ver.sh script (beside the existing .ci/bump-version.sh script) in each opbeans repo that will use it. Please let me know if this sounds crazy. :)

Responsibility for updating the elastic-apm-node dep in opbeans-node will move to *opbeans-node*'s CI. Refs: elastic/opbeans-node#164 Fixes: #2728

v1v · 2022-06-13T08:05:07Z

Hi all,

If Option 4: use a Jenkins pipeline in the opbeans repo is the one chosen, and it's required to run every X hours, then I'd say to use a similar approach but a bit different:

Create a new pipeline in the opbeans-node.git
This pipeline runs cronly only for the main branch
The pipeline does look for a new version and if so:
a) Bump the version in the main branch
b) Create the tag.

By splitting the bump from the main pipeline then there is no need to change the opbeansPipeline step, in addition, we can refactor if need it in the future. Otherwise, the same pipeline will be looking after two different concerns

What do you think?

Responsibility for updating the elastic-apm-node dep in opbeans-node will move to *opbeans-node*'s CI. Refs: elastic/opbeans-node#164 Fixes: #2728

trentm · 2022-06-13T18:49:54Z

What do you think?

@v1v That sounds fine to me.

A question about Jenkinsfile syntax: Can a Jenkinsfile have multiple top-level pipeline { ... } blocks? E.g. can I have something like this for https://github.com/elastic/opbeans-node/blob/main/.ci/Jenkinsfile

#!/usr/bin/env groovy
@Library('apm@current') _

opbeansPipeline()

pipeline {
  agent { label 'linux && immutable' }
  // ... my new pipeline for doing the update and tagging
}

?

trentm · 2022-06-13T23:52:11Z

@v1v elastic/opbeans-node#163 is my attempt at doing this.

…gent dep (#163) This adds a second "Opbeans Node Bump" Jenkins job that runs weekly on the "main" branch. It checks for an available agent update (a published version newer than what is in the current package-lock file), and if there is one it: bumps to that ver, pushes, and git tags with "v$VERSION". The push and tag will trigger the usual opbeans Jenkins pipeline to publish docker images. Refs: elastic/apm-agent-nodejs#2728 Fixes: #164 Co-authored-by: Victor Martinez <VictorMartinezRubio@gmail.com>

trentm added the 8.4-candidate label May 26, 2022

github-actions bot added the agent-nodejs Make available for APM Agents project planning. label May 26, 2022

trentm mentioned this issue May 26, 2022

automate releases #2625

Closed

estolfo added this to the 8.4 milestone Jun 1, 2022

elastic-apm-tech added this to Planned in APM-Agents (OLD) Jun 1, 2022

estolfo removed the 8.4-candidate label Jun 1, 2022

trentm mentioned this issue Jun 8, 2022

chore: make it the responsibility of opbeans-node to update its APM agent dep elastic/opbeans-node#163

Merged

trentm mentioned this issue Jun 9, 2022

opbeans CI should handle updating its elastic-apm-node dep elastic/opbeans-node#164

Closed

trentm added a commit that referenced this issue Jun 9, 2022

chore: drop Opbeans update stage of the CI Release pipeline

56e87e2

Responsibility for updating the elastic-apm-node dep in opbeans-node will move to *opbeans-node*'s CI. Refs: elastic/opbeans-node#164 Fixes: #2728

trentm mentioned this issue Jun 9, 2022

chore: drop Opbeans update stage of the CI Release pipeline #2763

Merged

trentm self-assigned this Jun 9, 2022

trentm moved this from Planned to In Progress in APM-Agents (OLD) Jun 9, 2022

trentm closed this as completed in #2763 Jun 13, 2022

APM-Agents (OLD) automation moved this from In Progress to Done Jun 13, 2022

trentm added a commit that referenced this issue Jun 13, 2022

chore: drop Opbeans update stage of the CI Release pipeline (#2763)

fc6428c

Responsibility for updating the elastic-apm-node dep in opbeans-node will move to *opbeans-node*'s CI. Refs: elastic/opbeans-node#164 Fixes: #2728

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Opbeans" stage of release pipeline fails #2728

"Opbeans" stage of release pipeline fails #2728

trentm commented May 26, 2022 •

edited

pazone commented May 30, 2022

trentm commented May 30, 2022

pazone commented May 31, 2022

trentm commented May 31, 2022

Mpdreamz commented Jun 2, 2022

pazone commented Jun 2, 2022

trentm commented Jun 2, 2022

cachedout commented Jun 7, 2022

trentm commented Jun 8, 2022

v1v commented Jun 13, 2022

trentm commented Jun 13, 2022

trentm commented Jun 13, 2022

"Opbeans" stage of release pipeline fails #2728

"Opbeans" stage of release pipeline fails #2728

Comments

trentm commented May 26, 2022 • edited

Options

Option 1: npm publish early and hope

Option 2: wait for npm install to work

Option 3: use dependabot to update opbeans

Option 4: use a Jenkins pipeline in the opbeans repo

pazone commented May 30, 2022

trentm commented May 30, 2022

pazone commented May 31, 2022

trentm commented May 31, 2022

Mpdreamz commented Jun 2, 2022

pazone commented Jun 2, 2022

trentm commented Jun 2, 2022

cachedout commented Jun 7, 2022

trentm commented Jun 8, 2022

v1v commented Jun 13, 2022

trentm commented Jun 13, 2022

trentm commented Jun 13, 2022

trentm commented May 26, 2022 •

edited