Skip to content
This repository has been archived by the owner on Jul 23, 2019. It is now read-only.

Rebasing kni-installer on openshift/installer #36

Merged
merged 112 commits into from
Apr 4, 2019

Conversation

stbenjam
Copy link
Member

@stbenjam stbenjam commented Apr 3, 2019

No description provided.

squeed and others added 30 commits March 14, 2019 10:52
This has already been merged in the operator; just need to update
the installer's cache of the crd.

Someday we can get rid of this, but not yet.
…rces"

Bring the docs up to speed after 05f7e0d (create cluster: change
Creating cluster to Creating infrastructure resources,
2019-03-14, #1417).
This PR adds back the support to create machines and machinesets
with trunk support enabled.
The description of the wildcard DNS entry and the example did not match.
In 200f0c9 (pkg/destroy/aws: Remove some lastError local-variable
masking, 2019-03-18, #1434), I removed some local lastError variables
that masked the function-level variable.  But Matthew points out that
we were still clobbering that function-level variable with the
loop-level value [1], so a successful loop iteration might silently
clear a previously-set lastError.  This commit goes through and uses
'err' for consistently for the loop-level variable.  When we have an
error, we log any previous lastError value before clobbering that
value with the new error.  It's up to the caller to decide how they
want to handle any final lastError; they can log it or not as they see
fit.

I've demoted the lastError logging from Info to Debug, because the
destroy logic usually uses debug for errors (e.g. DependencyViolation
errors), and I don't see a point to trying to classify errors as
expected or unexpected.

[1]: openshift/installer#1434 (comment)
Originally we installed nss_wrapper package from epel-testing,
I think because it wasn't available in epel repo (I'm not 100%sure)
We can now install from the stable epel repo, so no longer need the
epel-testing repo.  That's good, because epel-testing is no longer
configured in the base image (the build was failing until I removed
it, and I realized we no longer needed it).
This adds some terraform to use to create the infrastructure for an
OpenShift cluster on vSphere.

See upi/vsphere/README.md for some instructions on how to perform an
install. The process is very rough and not streamlined at the moment,
but it mostly works.
pkg/destroy/aws: Remove lastError value masking
as reported in openshift/installer#1341 , the credential validation errors out when you try to run iam:SimulatePrincipalPolicy on IAM creds that belong to the AWS account's root user. vendor in an updated cloud-credential-operator with the changes to detect and allow the root creds through (with a stern warning printed out)

dep ensure -update github.com/openshift/cloud-credential-operator
update vendor of cloud-credential-operator (allow use of root creds)
upi/vshpere: Add initial support for vSphere UPI
Since fd1349c (cmd/openshift-install/create: Log progressing
messages, 2019-03-18, #1432), we log progress messages while waiting.
But I'd forgotten to log them in the timeout message, which could lead
to logs like:

  $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5971/artifacts/e2e-aws/installer/.openshift_install.log | grep -B3 level=fatal
  time="2019-03-21T02:17:31Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.ci-2019-03-21-015242: 95% complete"
  time="2019-03-21T02:36:11Z" level=debug msg="Still waiting for the cluster to initialize: Could not update rolebinding \"openshift-cluster-storage-operator/cluster-storage-operator\" (231 of 310): the server has forbidden updates to this resource"
  time="2019-03-21T02:36:41Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.ci-2019-03-21-015242: 98% complete"
  time="2019-03-21T02:42:24Z" level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

That's not very helpful if you're looking at stderr and our default
info level.  With this commit, we'll get:

  failed to initialize the cluster: Working towards 4.0.0-0.alpha-2019-03-21-015242: 98% complete: timed out waiting for the condition

Even better would be to get the "forbidden updates to this resource"
message, but that's up to the cluster-verison operator to set more
helpful failing/progressing messages.
docs/user/customization: Catch up with "Creating infrastructure resources"
Add support for rchos template that use thin provisioning.

Update terraform.tfvars.example to give details about the rchos-latest
template.

This also removed some unused variables and commented out code around
setting static IP addresses for the machines. Static IP addresses are
not working yet.
upi/vsphere: support rhcos-latest template
In RHEL8 journald switched to `DynamicUser=yes`, we can't reference
the user at Ignition time.  Let's hack around this by adding a fixed
version of the user and doing the chown.
It builds an image containing binaries like jq, terraform, awscli, oc, etc. to allow bringing up UPI infrastructure.
It also contains the `upi` directory that contains various terraform and cloud formation templates that are used to create infrastructure resources.
cmd/openshift-install/create: Log progress on timeout too
bootstrap: Work around systemd-journal-gateway DynamicUser=yes
images/installer: add image that can be used to instal UPI platforms
The cluster object was necessary for the machine controllers to function. This dep is being removed and there's no reason for this object to exist at the moment
Remove cluster-api cluster object dependency
As part of our release process we build container images for installer
that are added to the release image (which has a cryptographic
relationship to the images it contains, giving strong integrity). A
consumer should be able to download and install a locked installer
binary that uses the payload. However, we would prefer to not
rebuild the binary outside of the image, but instead have:

1. a source for the binary from the payload
2. the binary be locked to the payload it comes from

This commit allows a build system above the payload to extract the
installer binary for linux from the image (other platforms later)
and perform a replacement on the binary itself, patching:

```
_RELEASE_IMAGE_LOCATION_XXXXXXXXX...
```

with

```
quay.io/openshift/ocp-release:v4.0\x00
```

without requiring a recompilation of the binary. The internal code
checks the constant and verifies bounds (panicking if necessary)
and then returns the updated constant. This allows a simpler
replacement process to customize a binary for both external use
and offline use that is locked to a payload.
…o_have_replacement

release: Allow release image to be directly substituted into binary
Through 0d891e1 (Merge pull request #1446 from staebler/vsphere_tf,
2019-03-21).
openshift-merge-robot and others added 19 commits April 1, 2019 08:59
openstack: add image for openstack ci
Bug 1659970: data/aws/route53: Block private Route 53 zone on public record
BUG 1670700: data/data/bootstrap: add --etcd-metric-ca to MCO bootstrap
image: Add a production "installer-artifacts" image for Mac binary
pkg/asset/manifests/infrastructure: Set InfrastructureName
Assets are required to build, but hack/build-go.sh cannot handle
cross architecture asset generation. Explicitly generate before
invoking the script.

Failed when CI tried to build:

+ go build ... ./cmd/openshift-install
data/unpack.go:12:15: undefined: Assets
pkg/destroy/aws: Destroy NAT gateways by VPC too
The kubeadmin user should only be used to temporarily access the
console, and only until an admin configures a proper Identity Provider.
There is no reason to login as the kubeadmin user via CLI.  If oauth
is broken in a cluster, an admin can still access via CLI as system:admin,
but will not be able to access via kube:admin.
Modify kubeadmin usage message, admins should not use kubeadmin via CLI
Through 58a2767 (Merge pull request #1497 from
vrutkovs/upi-multistage-cli, 2019-03-29).
Catching up with c734361 (Remove cluster-api object as this is not
needed anymore, 2019-03-22, #1449), 1408d8a (*: use
kube-etcd-cert-signer release image, 2019-03-27, #1477), and possibly
others.  Generated with:

  $ openshift-install graph | dot -Tsvg >docs/design/resource_dep.svg

using:

  $ dot -V
  dot - graphviz version 2.30.1 (20170916.1124)
Catching up with a31e12f (release: Allow release image to be
directly substituted into binary, 2019-03-15, #1422).
Bootstrap node is failing to uncompress the bios image with `t1.small` capacity.
and also packet seems to be failing to deploy servers in SJC1. Using `any` allows terraform to create the servers in any available datacenter.
CHANGELOG: Document changes since 0.15.0
fix: Gopkg.lock after running dep ensure on pkg/terraform/exec
upi/metal: update the instance location and size
hack/build: Update release-pin location to defaultReleaseImageOriginal
@stbenjam
Copy link
Member Author

stbenjam commented Apr 3, 2019

This rebase pulls in a newer version of RHCOS (410.8.20190325.0) in hack/build.sh. I was running into issues with coredns not starting, but that turned out to be that I needed the same fixes from openshift-metal3/dev-scripts@401ba61.

I did that, but the bootstrap node is still not bringing up the k8s API. There's no errors in bootkube, but all progress seems to stop.

@markmc
Copy link
Contributor

markmc commented Apr 3, 2019

Note that you're likely seeing the bootstrap launched with an ootpa image whereas the masters are being given a maipo image by dev-scripts. See #37 and openshift-metal3/dev-scripts#287

@derekhiggins
Copy link
Collaborator

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/516/

@markmc
Copy link
Contributor

markmc commented Apr 4, 2019

Ok, the rebase/merge looks good mechanically to me, and our path to fixing everything will be based on this rebase ... so I'm going ahead and merging

@markmc markmc merged commit 9e54579 into openshift-metal3:master Apr 4, 2019
@stbenjam stbenjam deleted the latest-upstream branch April 4, 2019 12:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.