Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/allow upgrade to snapshots #2752

Merged
merged 9 commits into from Jun 7, 2023

Conversation

pchila
Copy link
Contributor

@pchila pchila commented May 31, 2023

What does this PR do?

This change allows to upgrade to a specific snapshot version by specifying the build id.
For example, we can now target a specific 8.8.0 snapshot build with id 49c28bdb with
elastic-agent upgrade 8.8.0-SNAPSHOT+49c28bdb

Of course the form without a build id is still valid and the agent will still query the artifact API to figure out what is the latest snapshot version available
elastic-agent upgrade 8.8.0-SNAPSHOT

We can have a look at the builds available for a specific snapshot by checking the artifact API at https://artifacts-api.elastic.co/v1/versions/8.8.0-SNAPSHOT/builds/

A parsed semver version object has been introduced (and tested) in pkg/version (we already have at least 1 direct dependency to a semver library and an indirect one to another semver library, but since the code is small I thought it was useful to have our own version so we can implement format changes or utility methods easily)

We are using this parsed semver version in the step_download.go and the snapshot downloader and verifier: using this we are able to detect if we have an exact snapshot version already and we can take a shortcut through the snapshotURI() function of the snapshot downloader without querying the Artifact API.

For the download and the verification themselves we use the VersionWithPrerelease() method to calculate the correct file name that we have to download.

I left the localremote and the http.Downloader pretty much untouched so that the previous defaults are still in place and we can still upgrade using a 8.9.0-SNAPSHOT stack that specifies '8.9.0' as version (have a look at the localremote downloader for more info)

Why is it important?

We want users to be able to test unreleased versions (snapshots) of elastic agent but with some control over the exact version of the software (that is we want to upgrade to a specific build)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels May 31, 2023
@pchila pchila self-assigned this May 31, 2023
@mergify
Copy link
Contributor

mergify bot commented May 31, 2023

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@pchila pchila force-pushed the feature/allow-upgrade-to-snapshots branch from 8e30eac to 3cb5734 Compare May 31, 2023 10:25
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 31, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-06-07T07:05:38.816+0000

  • Duration: 23 min 3 sec

Test stats 🧪

Test Results
Failed 0
Passed 5991
Skipped 19
Total 6010

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

elasticmachine commented May 31, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.667% (74/75) 👍
Files 68.605% (177/258) 👍
Classes 67.423% (327/485) 👍
Methods 53.677% (1000/1863) 👎 -0.029
Lines 39.614% (11473/28962) 👎 -0.022
Conditionals 100.0% (0/0) 💚

@pchila pchila requested a review from cmacknz May 31, 2023 14:59
@pchila pchila force-pushed the feature/allow-upgrade-to-snapshots branch 2 times, most recently from ad86036 to 3164b49 Compare May 31, 2023 17:17
@mergify
Copy link
Contributor

mergify bot commented May 31, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/allow-upgrade-to-snapshots upstream/feature/allow-upgrade-to-snapshots
git merge upstream/main
git push upstream feature/allow-upgrade-to-snapshots

@pchila pchila force-pushed the feature/allow-upgrade-to-snapshots branch from 3164b49 to 0852262 Compare June 1, 2023 15:12
@pchila pchila marked this pull request as ready for review June 1, 2023 15:12
@pchila pchila requested a review from a team as a code owner June 1, 2023 15:12
@pchila pchila requested a review from michalpristas June 1, 2023 15:13
@pchila pchila mentioned this pull request Jun 1, 2023
5 tasks
@ycombinator
Copy link
Contributor

/test

@@ -82,13 +88,19 @@ func snapshotConfig(config *artifact.Config, versionOverride string) (*artifact.
}, nil
}

func snapshotURI(versionOverride string, config *artifact.Config) (string, error) {
func snapshotURI(versionOverride *agtversion.ParsedSemVer, config *artifact.Config) (string, error) {
// do we support upgrade without specifying a target version ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this question here for? Does this need to be answered before this is truely ready?

Copy link
Contributor Author

@pchila pchila Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a comment I added because it seems that version is optional...
I later realized that the localremote downloader implementation calls the snapshot impl with a zero value ("" before the change, nil now) during the upgrade process.
Will remove the comment.
Edit: I actually changed the comment with the explanation of why we test for empty override version (could be useful to next lost soul that wanders this part of the code 🤣 )

return "", fmt.Errorf("error parsing version %q: %w", version, err)
}

fetcher, err := newDownloader(parsedVersion, u.log, &settings)
if err != nil {
return "", errors.New(err, "initiating fetcher")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are here, I would just go ahead and change from errors.New(err...) to fmt.Errorf. I mean that through the whole file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that the usage of package github.com/elastic/elastic-agent/internal/pkg/agent/errors is deprecated ? (I am more than happy if it is and we use the go stdlib error wrapping 🎉 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I want it gone!

return localremote.NewDownloader(log, settings)
}

// TODO since we know if it's a snapshot or not, shouldn't we add EITHER the snapshot downloader OR the release one ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.


s.Require().FileExists(updateMarkerFile)

// The checks of the update marker makes the test time out since it runs for more than 10 minutes :(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we need to figure out if we can shorten this somehow for out testing purposes. Waiting 10 minutes is not really something that any developer wants to wait on.

Copy link
Contributor Author

@pchila pchila Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR I will leave this part commented so that the integration test verifies that the new agent is built from the specific commit included in the specified build, that the update marker is created and that the new agent is HEALTHY.

For a broader scope however I still think that we should test that the agent completes the upgrade by removing the update marker after the normal grace period (it's the only way we can confirm that an upgrade a->b really works, not just for this particular upgrade scenario) so we may need longer integration test runs...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just wonder if we should see if we can shorten the time windows somehow just for the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not for this PR, but we could enhance the watcher to try and look for an optional watcher.properties or watcher.test.properties file. This file could hold an override value for the gracePeriodDuration that's used by the watcher (as well as any other similar settings used by the error checker and crash checker). Then, (only) tests could create this file before kicking off the upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #2796

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchila Could you create an issue to discuss how to shorten the watcher's grace period duration and possibly other similar duration-related settings and link it from the commented out code in this PR? That way we don't forget about this conversation here. After that, I'm good to LGTM this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchila Looks like you created the issue just as I was typing my previous comment 🙂. Thanks. Just reference it from the commented out code in the test here and I'll LGTM this PR.

@pchila pchila force-pushed the feature/allow-upgrade-to-snapshots branch from 18e386f to 0765501 Compare June 5, 2023 18:29
@mergify
Copy link
Contributor

mergify bot commented Jun 5, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/allow-upgrade-to-snapshots upstream/feature/allow-upgrade-to-snapshots
git merge upstream/main
git push upstream feature/allow-upgrade-to-snapshots

@pchila pchila force-pushed the feature/allow-upgrade-to-snapshots branch from 0765501 to 2e0855f Compare June 6, 2023 07:18
@pchila pchila requested a review from blakerouse June 6, 2023 15:19
ycombinator
ycombinator previously approved these changes Jun 6, 2023
Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Left a couple of suggestions, one is just a minor comment cleanup the other is probably for a follow up PR.

…remote/downloader.go

Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants