Skip to content
This repository has been archived by the owner on Feb 9, 2024. It is now read-only.

[7.0.x] custom planet containers #1982

Merged
merged 14 commits into from Oct 1, 2020

Conversation

a-palchikov
Copy link
Contributor

@a-palchikov a-palchikov commented Aug 7, 2020

Description

Forward port of #1962.
Also fixes an issue with the upgrade unable to determine existing etcd server version if the planet container did not expose this information.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Regression fix (non-breaking change which fixes a regression)
  • This change has a user-facing impact

Linked tickets and other PRs

TODOs

  • Self-review the change
  • Perform manual testing
  • Address review feedback

Testing done

Prerequisites

  1. Custom planet container replacing the default package:
systemOptions:
  baseImage: quay.io/gravitational/planet:<custom-build>
  1. Custom planet container for a (subset of) node profile
nodeProfiles:
  - name: profile
    systemOptions:
      baseImage: quay.io/gravitational/planet:<custom-build>

Clusters

  • Installation, 3-node cluster, all control plane nodes
  • Installation, 3-node cluster, one control plane, 2 workers
  • Upgrade from a previous version (6.1.33), 3-node cluster
  • Upgrade from a previous version, run gc and expand (both controller plane and workers)
  • Installation, 1-node cluster, then expand to 3-nodes (both controller plane and workers)

@a-palchikov a-palchikov requested review from a team, r0mant and knisbet August 7, 2020 19:08
@a-palchikov a-palchikov changed the title Forward port https://github.com/gravitational/gravity/pull/1962. [7.0.x] custom planet containers Aug 7, 2020
@@ -164,7 +164,7 @@ type healthExecutor struct {
func (p *healthExecutor) Execute(ctx context.Context) error {
p.Progress.NextStep("Waiting for the planet to start")
p.Info("Waiting for the planet to start.")
err := utils.Retry(defaults.RetryInterval, defaults.RetryAttempts,
err := utils.RetryWithInterval(ctx, defaults.ExponentialBackOff(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this issue that prevents timely abort (e.g. when explicitly aborted by a user) of the installer during the planet status wait phase.

@@ -391,113 +391,50 @@ func (s *PlanSuite) TestPlanWithIntermediateRuntimeUpdate(c *check.C) {
})
}

func (s *PlanSuite) TestUpdatesEtcdFromManifestWithoutLabels(c *check.C) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the test about assumed etcd version since it is expected that we have both installed and upgrade versions available.

Jenkinsfile Outdated Show resolved Hide resolved
lib/builder/builder.go Outdated Show resolved Hide resolved
@wadells wadells mentioned this pull request Aug 12, 2020
3 tasks
a-palchikov added a commit that referenced this pull request Aug 15, 2020
Otherwise, builds are subject to environment changes regarding the tele
login state.
a-palchikov added a commit that referenced this pull request Aug 15, 2020
* Cherry-pick changes from #1982 to fix builds.
Otherwise, builds are subject to environment changes regarding the tele
login state.

* Bump e
Copy link
Contributor

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this massive builder refactoring makes this a pretty risky change esp. for an LTS release. My worry is we'll break one of our subtle tele build use-cases as we did in the past.

I would either see if it's possible to fix the custom planet problem with less of unnecessary refactoring, or extensively test all tele build use-cases (for both enterprise and open-source where applicable) to make sure nothing regresses:

  • That when logged out, it pulls stuff from default hub.
  • That when logged into a hub (both via "tele login" and "tsh login"), it uses that hub.
  • That when --hub/--token/--repository provided, it uses the indicated hub.
  • Behavior of --state-dir flag, should be source for both packages and credentials (when logged in via tele login --state-dir), and when used in conjunction with --hub/--token/--repository flags.

Maybe I'm forgetting something else...

@a-palchikov
Copy link
Contributor Author

a-palchikov commented Sep 3, 2020

Checked the following for enterprise tele builds:

  • That when logged out, it pulls stuff from default hub.
  • That when logged into a hub (both via "tele login" and "tsh login"), it uses that hub.
  • [!] That when --hub/--token/--repository provided, it uses the indicated hub. This did not work when using --token/--repository combination - seems like --hub is always required.
  • Behavior of --state-dir flag, should be source for both packages and credentials (when logged in via tele login --state-dir), and when used in conjunction with --hub/--token/--repository flags.

@dorsany
Copy link

dorsany commented Sep 13, 2020

Hi @a-palchikov, do you think that this pr will be a part of the next release version of 7.0.x ?

@a-palchikov
Copy link
Contributor Author

@dorsany I have a couple more things to take care as part of this PR that were discovered in the neighbor branch, but I hope it will.

Copy link
Contributor

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of question, otherwise lgtm.

lib/pack/localpack/packageserver.go Show resolved Hide resolved
lib/update/cluster/plan.go Outdated Show resolved Hide resolved
lib/update/cluster/steps.go Outdated Show resolved Hide resolved
@a-palchikov a-palchikov force-pushed the dmitri/7.0.x/1940-baseimage branch 2 times, most recently from d84e4a8 to b8c750c Compare September 28, 2020 12:27
lib/schema/manifest.go Outdated Show resolved Hide resolved
lib/update/cluster/plan.go Outdated Show resolved Hide resolved
lib/update/cluster/plan.go Outdated Show resolved Hide resolved
lib/update/cluster/plan.go Outdated Show resolved Hide resolved
@a-palchikov a-palchikov force-pushed the dmitri/7.0.x/1940-baseimage branch 2 times, most recently from 7c19daa to 847294c Compare September 30, 2020 12:55
@a-palchikov a-palchikov merged commit 018b843 into version/7.0.x Oct 1, 2020
@a-palchikov a-palchikov deleted the dmitri/7.0.x/1940-baseimage branch October 1, 2020 11:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants