New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openshift_release / version / upgrade improvements #1945

Merged
merged 97 commits into from Jul 14, 2016

Conversation

Projects
None yet
6 participants
@dgoodwin
Contributor

dgoodwin commented May 25, 2016

These changes bring the addition of a new openshift_release="3.2" variable for inventory, giving customers a more generic version they can use to install a specific release instead of having to know 3.1.1.6.

Includes a significant refactor for how we handle versions throughout all roles, and safer behavior.

Changes for users:

  • Can now specify openshift_release="3.2" to install a generic release.
    • This value is treated as a seed, we only really use it for cluster setup to lookup a more specific version (i.e. 3.2.0.44), and then use that as the exact version to configure on all hosts.
    • This is the recommended path for choosing which version to configure.
    • RPM installations on RHEL still require you to configure the correct repositories beforehand for this to work correctly.
  • Once a host is installed we try to never modify the running version, but there are some exceptions.
  • openshift_image_tag and openshift_pkg_version are treated as overrides, if you specify these values in your inventory, we will make sure that is the version configured when you install and/or maintain your environment with the config playbooks. If you modify those values and re-run the playbooks, this could cause an upgrade and downtime.
  • Added new "docker_version=1.9.1" inventory variable for specifying the version of Docker you wish to install, or upgrade to. Default will be to install the latest, and then leave it alone. This value is respected both in the normal config playbooks, as well as the 3.2 and Docker upgrade playbooks.
  • Added new "docker_upgrade=False" inventory variable for performing a 3.2 upgrade without upgrading Docker.
  • Docker upgrade playbook is now less 1.10 centric and can be re-used to upgrade to an arbitrary Docker version. Images will only be cleared on the system when crossing the Docker 1.10 barrier.

Changes within the roles/playbooks:

  • Introduced new openshift_version role, responsible for making sure all related version variables are always set for a host.
    • Role's default behavior is to protect the installed version if there is one, however a playbook can override this with a specific version, a flag to ignore the current version, and by specifying openshift_release, image_tag, and pkg_version.
    • If no version is installed already and no version variables were specified, the role will lookup the latest version available.
  • Improved openshift.common.openshift_version fact, should now be a more reliable way to always detect the version of openshift installed on a host, if there is one.
    • Removed other unnecessary and redundant version facts.
  • Roles use openshift_pkg_version for RPM installations, and openshift_image_tag for pulling images. We now can assume that these values are always set.
  • Playbooks do version initialization on first master only, then pass this value along to all other hosts.
  • Upgrade has become dramatically simpler, no longer requires service file updates and unsafe restarts during the "pre" stage of the upgrade, effectively upgrading a containerized machine just to check if we can proceed with upgrade.
  • Old upgrade paths are removed. (3.0 -> 3.1, 3.1 minor)
  • Virtually all instances of pulling latest, or pulling images possibly causing unexpected side effects, have been eliminated. The openshift_version role makes sure all required variables are set, and after that we always pull and configure the use of images with an exact tag, never latest.
  • Fixed unsafe Docker restarts on nodes. (without evac) Docker upgrade is now handled together with node upgrade so we only evac once for both operations. Each step is run only if the host is actually a node or a host running docker.
@gravis

This comment has been minimized.

Show comment
Hide comment
@gravis

gravis May 27, 2016

Guys, this is really a must-have, as the current playbook is installing 1.3.0, which is in alpha.
Thanks

gravis commented May 27, 2016

Guys, this is really a must-have, as the current playbook is installing 1.3.0, which is in alpha.
Thanks

@detiber

This comment has been minimized.

Show comment
Hide comment
@detiber

detiber May 27, 2016

Contributor

@gravis #1685 is a better short term fix. I don't want to rush this PR and risk have to tackle another major refactoring for this in another couple of months

Contributor

detiber commented May 27, 2016

@gravis #1685 is a better short term fix. I don't want to rush this PR and risk have to tackle another major refactoring for this in another couple of months

@dgoodwin dgoodwin closed this May 30, 2016

@dgoodwin dgoodwin reopened this May 30, 2016

dgoodwin added some commits May 30, 2016

Refactor openshift_version behavior.
Very early in playbooks we must init the openshift_version for each
host. First we determine it for the master, logic now is pushed into the
openshift_docker role which we run only on first master via
openshift_cli. Facts are reloaded leaving us with a first master with
openshift.common.version fact we can then re-use on all other hosts. The
correct version of docker should be installed as well.

We then set openshift_version for all other hosts by re-using the master
fact.
Fix error with containerized etcd install.
Role was assuming it could successfully disable the rpm etcd service
without checking if it was actuall present.
@@ -881,14 +881,13 @@ def oo_image_tag_to_rpm_version(version, include_dash=False):
"""
if not isinstance(version, basestring):
raise errors.AnsibleFilterError("|failed expects a string or unicode")
# TODO: Do we need to make this actually convert v1.2.0-rc1 into 1.2.0-0.rc1
# We'd need to be really strict about how we build the RPM Version+Release

This comment has been minimized.

@detiber

detiber Jun 1, 2016

Contributor

Man, the new origin tagging is going to throw a wrench into this... Maybe we can tackle that in packaging, where we if upstream specifies something that would look like a 'release' we use it as the 'release' in the subsequent packaging.

I can't think of a reason that we would need to track a separate release than upstream, and if so, maybe we can just tack on a second "release", so 1.2.0-rc1-0 ? I must say that I'm pretty ignorant of t he edge cases of RPM versioning, though.

@sdodson @tdawson thoughts?

@detiber

detiber Jun 1, 2016

Contributor

Man, the new origin tagging is going to throw a wrench into this... Maybe we can tackle that in packaging, where we if upstream specifies something that would look like a 'release' we use it as the 'release' in the subsequent packaging.

I can't think of a reason that we would need to track a separate release than upstream, and if so, maybe we can just tack on a second "release", so 1.2.0-rc1-0 ? I must say that I'm pretty ignorant of t he edge cases of RPM versioning, though.

@sdodson @tdawson thoughts?

This comment has been minimized.

@dgoodwin

dgoodwin Jun 1, 2016

Contributor

I have not gotten to the origin use case much here, I assumed you can specify openshift_release="1.2", and get "v1.2", whatever that is tagged at. If you want to install container tags for an rc you'd specify openshift_image_tag=v1.2.0-rc1-0.

@dgoodwin

dgoodwin Jun 1, 2016

Contributor

I have not gotten to the origin use case much here, I assumed you can specify openshift_release="1.2", and get "v1.2", whatever that is tagged at. If you want to install container tags for an rc you'd specify openshift_image_tag=v1.2.0-rc1-0.

This comment has been minimized.

@dgoodwin

dgoodwin Jun 21, 2016

Contributor

I believe this concern is now covered, I've tested with the latest 1.3 alpha tags and openshift_image_tag, and it should be installable.

@dgoodwin

dgoodwin Jun 21, 2016

Contributor

I believe this concern is now covered, I've tested with the latest 1.3 alpha tags and openshift_image_tag, and it should be installable.

@dgoodwin

This comment has been minimized.

Show comment
Hide comment
@dgoodwin

dgoodwin Jul 13, 2016

Contributor

@abutcher ok thanks, I just pushed my last fixes for known issues.

Contributor

dgoodwin commented Jul 13, 2016

@abutcher ok thanks, I just pushed my last fixes for known issues.

@abutcher

This comment has been minimized.

Show comment
Hide comment
@abutcher

abutcher Jul 13, 2016

Member

@dgoodwin There are also a few pylint warnings.

This one is fixed by #2146

E:1621,32: No value for argument 'gather_subset' in function call (no-value-for-parameter)
Member

abutcher commented Jul 13, 2016

@dgoodwin There are also a few pylint warnings.

This one is fixed by #2146

E:1621,32: No value for argument 'gather_subset' in function call (no-value-for-parameter)
@dgoodwin

This comment has been minimized.

Show comment
Hide comment
@dgoodwin

dgoodwin Jul 13, 2016

Contributor

On it.

Contributor

dgoodwin commented Jul 13, 2016

On it.

@dgoodwin

This comment has been minimized.

Show comment
Hide comment
@dgoodwin

dgoodwin Jul 13, 2016

Contributor

@abutcher I hope that will take care of them, probably some ones coming in future pylint versions but I tried not to get into other files I didn't hit in the PR.

Contributor

dgoodwin commented Jul 13, 2016

@abutcher I hope that will take care of them, probably some ones coming in future pylint versions but I tried not to get into other files I didn't hit in the PR.

@detiber

This comment has been minimized.

Show comment
Hide comment
@detiber

detiber Jul 13, 2016

Contributor

@dgoodwin looks like there is still an outstanding pylint error: E: 63, 0: Bad option value 'redefined-variable-type' (bad-option-value)

Contributor

detiber commented Jul 13, 2016

@dgoodwin looks like there is still an outstanding pylint error: E: 63, 0: Bad option value 'redefined-variable-type' (bad-option-value)

@dgoodwin

This comment has been minimized.

Show comment
Hide comment
@dgoodwin

dgoodwin Jul 13, 2016

Contributor

aos-ci-test

Contributor

dgoodwin commented Jul 13, 2016

aos-ci-test

@dgoodwin

This comment has been minimized.

Show comment
Hide comment
@dgoodwin

dgoodwin Jul 13, 2016

Contributor

Trying a fix, caused by fixing pylint errors locally that are for a newer version of pylint than CI uses, leading to unknown keys I was trying to disable.

Contributor

dgoodwin commented Jul 13, 2016

Trying a fix, caused by fixing pylint errors locally that are for a newer version of pylint than CI uses, leading to unknown keys I was trying to disable.

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
Member

openshift-bot commented Jul 13, 2016

@dgoodwin

This comment has been minimized.

Show comment
Hide comment
@dgoodwin

dgoodwin Jul 14, 2016

Contributor

aos-ci-test

Contributor

dgoodwin commented Jul 14, 2016

aos-ci-test

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
Member

openshift-bot commented Jul 14, 2016

@dgoodwin

This comment has been minimized.

Show comment
Hide comment
@dgoodwin

dgoodwin Jul 14, 2016

Contributor

Fixed a couple small errors reported by QE. Two bugs pending that I cannot reproduce and waiting for more info, I don't think either would be a blocker to merging though.

Contributor

dgoodwin commented Jul 14, 2016

Fixed a couple small errors reported by QE. Two bugs pending that I cannot reproduce and waiting for more info, I don't think either would be a blocker to merging though.

@sdodson

This comment has been minimized.

Show comment
Hide comment
@sdodson

sdodson Jul 14, 2016

Member

Remove WIP from title? So far my testing looks good. Working on containerized now.

Member

sdodson commented Jul 14, 2016

Remove WIP from title? So far my testing looks good. Working on containerized now.

@abutcher abutcher changed the title from WIP: openshift_release / version / upgrade improvements to openshift_release / version / upgrade improvements Jul 14, 2016

@sdodson

This comment has been minimized.

Show comment
Hide comment
@sdodson

sdodson Jul 14, 2016

Member

aos-ci-test-extended

Member

sdodson commented Jul 14, 2016

aos-ci-test-extended

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
Member

openshift-bot commented Jul 14, 2016

@sdodson sdodson merged commit 65ffae3 into openshift:master Jul 14, 2016

3 of 6 checks passed

aos-ci-jenkins/OS_3.2_containerized "Internal Jenkins error during provisioning an 3.2 cluster"
Details
aos-ci-jenkins/OS_3.2_NOT_containerized_e2e_tests "e2e tests in progress"
Details
aos-ci-jenkins/Overall Internal aos Jenkins job has been triggered.
Details
aos-ci-jenkins/OS_3.2_NOT_containerized "openshift-ansible install passed"
Details
aos-ci-jenkins/OS_unit_tests "all unit tests passed"
Details
default
Details
@sdodson

This comment has been minimized.

Show comment
Hide comment
@sdodson

sdodson Jul 14, 2016

Member

Manual testing checks out.

Member

sdodson commented Jul 14, 2016

Manual testing checks out.

@dgoodwin dgoodwin deleted the dgoodwin:upgrade33 branch Jul 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment