New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/allow upgrade to snapshots #2752
Feature/allow upgrade to snapshots #2752
Conversation
This pull request does not have a backport label. Could you fix it @pchila? 🙏
NOTE: |
8e30eac
to
3cb5734
Compare
🌐 Coverage report
|
ad86036
to
3164b49
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
3164b49
to
0852262
Compare
/test |
@@ -82,13 +88,19 @@ func snapshotConfig(config *artifact.Config, versionOverride string) (*artifact. | |||
}, nil | |||
} | |||
|
|||
func snapshotURI(versionOverride string, config *artifact.Config) (string, error) { | |||
func snapshotURI(versionOverride *agtversion.ParsedSemVer, config *artifact.Config) (string, error) { | |||
// do we support upgrade without specifying a target version ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this question here for? Does this need to be answered before this is truely ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a comment I added because it seems that version is optional...
I later realized that the localremote
downloader implementation calls the snapshot impl with a zero value (""
before the change, nil
now) during the upgrade process.
Will remove the comment.
Edit: I actually changed the comment with the explanation of why we test for empty override version (could be useful to next lost soul that wanders this part of the code 🤣 )
return "", fmt.Errorf("error parsing version %q: %w", version, err) | ||
} | ||
|
||
fetcher, err := newDownloader(parsedVersion, u.log, &settings) | ||
if err != nil { | ||
return "", errors.New(err, "initiating fetcher") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are here, I would just go ahead and change from errors.New(err...)
to fmt.Errorf
. I mean that through the whole file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that the usage of package github.com/elastic/elastic-agent/internal/pkg/agent/errors
is deprecated ? (I am more than happy if it is and we use the go stdlib error wrapping 🎉 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I want it gone!
return localremote.NewDownloader(log, settings) | ||
} | ||
|
||
// TODO since we know if it's a snapshot or not, shouldn't we add EITHER the snapshot downloader OR the release one ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
|
||
s.Require().FileExists(updateMarkerFile) | ||
|
||
// The checks of the update marker makes the test time out since it runs for more than 10 minutes :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we need to figure out if we can shorten this somehow for out testing purposes. Waiting 10 minutes is not really something that any developer wants to wait on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR I will leave this part commented so that the integration test verifies that the new agent is built from the specific commit included in the specified build, that the update marker is created and that the new agent is HEALTHY.
For a broader scope however I still think that we should test that the agent completes the upgrade by removing the update marker after the normal grace period (it's the only way we can confirm that an upgrade a->b really works, not just for this particular upgrade scenario) so we may need longer integration test runs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just wonder if we should see if we can shorten the time windows somehow just for the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not for this PR, but we could enhance the watcher to try and look for an optional watcher.properties
or watcher.test.properties
file. This file could hold an override value for the gracePeriodDuration
that's used by the watcher (as well as any other similar settings used by the error checker and crash checker). Then, (only) tests could create this file before kicking off the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #2796
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchila Could you create an issue to discuss how to shorten the watcher's grace period duration and possibly other similar duration-related settings and link it from the commented out code in this PR? That way we don't forget about this conversation here. After that, I'm good to LGTM this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchila Looks like you created the issue just as I was typing my previous comment 🙂. Thanks. Just reference it from the commented out code in the test here and I'll LGTM this PR.
18e386f
to
0765501
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
0765501
to
2e0855f
Compare
internal/pkg/agent/application/upgrade/artifact/download/localremote/downloader.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…remote/downloader.go Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
…race period expires
What does this PR do?
This change allows to upgrade to a specific snapshot version by specifying the build id.
For example, we can now target a specific 8.8.0 snapshot build with id
49c28bdb
withelastic-agent upgrade 8.8.0-SNAPSHOT+49c28bdb
Of course the form without a build id is still valid and the agent will still query the artifact API to figure out what is the latest snapshot version available
elastic-agent upgrade 8.8.0-SNAPSHOT
We can have a look at the builds available for a specific snapshot by checking the artifact API at https://artifacts-api.elastic.co/v1/versions/8.8.0-SNAPSHOT/builds/
A parsed semver version object has been introduced (and tested) in
pkg/version
(we already have at least 1 direct dependency to a semver library and an indirect one to another semver library, but since the code is small I thought it was useful to have our own version so we can implement format changes or utility methods easily)We are using this parsed semver version in the
step_download.go
and the snapshot downloader and verifier: using this we are able to detect if we have an exact snapshot version already and we can take a shortcut through thesnapshotURI()
function of the snapshot downloader without querying the Artifact API.For the download and the verification themselves we use the
VersionWithPrerelease()
method to calculate the correct file name that we have to download.I left the
localremote
and thehttp.Downloader
pretty much untouched so that the previous defaults are still in place and we can still upgrade using a 8.9.0-SNAPSHOT stack that specifies '8.9.0' as version (have a look at thelocalremote
downloader for more info)Why is it important?
We want users to be able to test unreleased versions (snapshots) of elastic agent but with some control over the exact version of the software (that is we want to upgrade to a specific build)
Checklist
./changelog/fragments
using the changelog toolAuthor's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs
Questions to ask yourself