Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix containerized 3.9 -> 3.10 upgrade #8239

Merged
merged 3 commits into from May 4, 2018

Conversation

vrutkovs
Copy link
Member

@vrutkovs vrutkovs commented May 2, 2018

  • Node is now a system container
  • Rework a few etcd parts to properly backup and migrate to a static pod

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1571724

TODO:

  • Fix etcd backup and update to static pods for Atomic
  • Update ends with
TASK [fail] ***************************************************************************************************************************************************
task path: /code/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/config.yml:63
Wednesday 02 May 2018  20:48:53 +0000 (0:00:00.040)       0:23:37.982 ********* 
fatal: [vrutkovs_4d58cb0034c4-master-1]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "msg": "Master running 3.9.0 must be upgraded to 3.10 before node upgrade can be run."
}

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 2, 2018
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 2, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2018
@@ -89,7 +91,7 @@ etcd_listen_client_urls: "{{ etcd_url_scheme }}://{{ etcd_ip }}:{{ etcd_client_p
#etcd_peer: 127.0.0.1
etcdctlv2: "{{ r_etcd_common_etcdctl_command }} --cert-file {{ etcd_peer_cert_file }} --key-file {{ etcd_peer_key_file }} --ca-file {{ etcd_peer_ca_file }} -C https://{{ etcd_peer }}:{{ etcd_client_port }}"

etcd_service: etcd
etcd_service: "{{ (r_etcd_common_etcd_runtime == 'docker') | ternary('etcd_container', 'etcd') }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this. We're brute-force removing the services in favor of pods, except for stand-alone rpm etcd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this currently breaks, so I'll remove etcd commits

@@ -13,5 +14,14 @@
- reload systemd units
- restart node

- block:
- name: Remove existing systemd service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is what we want?

If the node is already a system container (which was possible previously in all types of deploys), and we remove this file and the host was already updated (IE, this is a no-op because this host was already upgraded) I'm not sure that the service unit will be placed again via the atomic command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node is already a system container (which was possible previously in all types of deploys)

On Atomic installs this was a different node service. As a result atomic install would crash and complain that service file already exists. Atomic install is not very smart though - it won't create a service file if its missing during a second run.

This part is WIP still

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked this to remove the services only when node is not yet bootstrapped

@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch 2 times, most recently from 400a23d to 7daae2a Compare May 3, 2018 11:04
@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch from e5ee8b8 to 7010f57 Compare May 3, 2018 14:31
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 3, 2018
@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch from 4b08c5c to b3dd1f8 Compare May 3, 2018 21:06
@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch from b3dd1f8 to 713b401 Compare May 3, 2018 21:51
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 3, 2018
@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch 6 times, most recently from 54087ea to 9a893cf Compare May 4, 2018 12:14
@vrutkovs vrutkovs changed the title WIP Fix containerized 3.9 -> 3.10 upgrade Fix containerized 3.9 -> 3.10 upgrade May 4, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 4, 2018
@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch from 9a893cf to c5666cd Compare May 4, 2018 12:17
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 4, 2018

@vrutkovs: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/system-containers 4b08c5c link /test system-containers

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch from 373a881 to 7019c2c Compare May 4, 2018 13:24
@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch 2 times, most recently from 9c6c5b3 to bc70748 Compare May 4, 2018 14:26
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 4, 2018
* Fix etcd runtime detection when setting up etcd
* During upgrade etcd runtime should be detected using systemd services
status
* Mask, disable and stop services before removing service files
* System container should be removed as stopping service doesn't seem to
cut it
* etcd cluster health check would wait for etcd static pod container to
start
@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch from bc70748 to f202390 Compare May 4, 2018 14:37
Copy link
Member

@sdodson sdodson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd discussed w/ Clayton copying all the variables from /etc/sysconfig/{{ openshift_service_type }}-api to /etc/origin/master/master.env because at least in free-int we lost AWS configuration because that was added manually to /etc/sysconfig/atomic-openshift-master-api.

What do you think?

state: absent
with_items:
- /etc/sysconfig/origin-master-api
- /etc/sysconfig/origin-master-controllers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- /etc/sysconfig/{{ openshift_service_type }}-api
- /etc/sysconfig/{{ openshift_service_type }}-controllers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@sdodson
Copy link
Member

sdodson commented May 4, 2018

BTW, migrating anything from /etc/sysconfig/atomic-openshift-master-* to master.env we can look at in a follow up.

@vrutkovs
Copy link
Member Author

vrutkovs commented May 4, 2018

we lost AWS configuration because that was added manually to /etc/sysconfig/atomic-openshift-master-api

This seems to be set in current master.env. Lets fix the missing vars later on, as I'm not sure which ones need to be migrated and which are not required anymore (e.g. OPTIONS and OPENSHIFT_DEFAULT_REGISTRY are probably not needed anymore)

@vrutkovs vrutkovs force-pushed the containerized-3.9-to-3.10 branch from f202390 to 7dc9e74 Compare May 4, 2018 14:57
Copy link
Member

@sdodson sdodson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 4, 2018
@sdodson sdodson dismissed michaelgugino’s stale review May 4, 2018 15:04

all concerns seem to have been addressed

@vrutkovs
Copy link
Member Author

vrutkovs commented May 4, 2018

bot, retest this please

1 similar comment
@vrutkovs
Copy link
Member Author

vrutkovs commented May 4, 2018

bot, retest this please

@sdodson sdodson merged commit 17155a2 into openshift:master May 4, 2018
@sdodson
Copy link
Member

sdodson commented May 4, 2018

We need to dig into the flake we're seeing here, discussion on that at #8264

@vrutkovs vrutkovs deleted the containerized-3.9-to-3.10 branch May 30, 2018 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants