Stop allowing oc cluster up version X to instantiate a server version X-1 #15731

jim-minter · 2017-08-10T15:45:28Z

Although it's a cool idea to be able to instantiate previous versions of origin using a more recent oc cluster up, as complexity/distribution increases I think it's becoming untenable/expensive. Example: #15677. Without test cases, I don't see how we can guarantee correctness. I think we should draw a clear line in the sand after which oc cluster up will only instantiate servers matching its version.

@openshift/devex wdyt?

bparees · 2017-08-10T15:54:39Z

@hferentschik @LalatenduMohanty @gbraad (I don't think this would affect minishift since they would be expected to align the client binary and the openshift version, but we should make sure).

@jorgemoralespou as a consumer of oc cluster up, any feelings on this one way or another? Do you use a single oc binary to launch different versions? would it be a burden for you to have to use different oc client binaries to do so? (anyone else you think might have an opinion?)

gabemontero · 2017-08-10T17:34:14Z

So would this prevent an openshift developer from:

building his openshift/origin local snapshot from scratch, including all images via hack/build-local-images.py
running oc cluster up --version=latest and pick up those images built via hack/build-local-images.py

Or would part of this issues' implementation allow the equivalent by either:

simply removing the need for --version=latest, or
continue to support --version=latest, or
support by supplying some special case value for --version to that local images are picked up
or make changes to hack/build-local-images.py such that it tags the images with the local git commit level, and then that must be supplied to oc cluster up

bparees · 2017-08-10T17:39:16Z

@gabemontero it wouldn't prevent that. the local oc binary they just built, and the openshift binary in the images they just built, would have the same version.

passing "--version=latest" doesn't matter, we check the version by actually running the binary.

gabemontero · 2017-08-10T17:42:42Z

On Thu, Aug 10, 2017 at 1:39 PM, Ben Parees ***@***.***> wrote: @gabemontero <https://github.com/gabemontero> it wouldn't prevent that. the local oc binary they just built, and the openshfit binary in the images they just built, would have the same version. passing "--version=latest" doesn't matter, we check the version by actually running the binary.

cool deal

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15731 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADbadAjQCXJQXd3ISOlUiqIofSlhOS8cks5sW0BPgaJpZM4Ozng-> .

hferentschik · 2017-08-10T19:52:00Z

I don't think this would affect minishift since they would be expected to align the client binary and the openshift version, but we should make sure.

Well, per default we align the oc version with the baseline OpenShift version used by a given Minishift release. We used to do the same for provisioning earlier version of OpenShift. However, we switched to an approach where we use the "baseline" oc to provision earlier versions of OpenShift. There were for example issues provisioning say OpenShift 1.4 with oc 1.4. However, it works when provisioning OpenShift 1.4 with oc 1.5. So this change would indeed affect us. Let me refresh my memory on what issues we had.

hferentschik · 2017-08-10T20:01:15Z

For what its worth, the use case for us is to give users who for example run OpenShift 1.4 or 1.5 in production the ability to match their OpenShift version when using Minishift.

bparees · 2017-08-10T20:39:44Z

@hferentschik going forward, what's the use case for someone getting the minishift release from 3.7 and wanting to run 3.6 with it, instead of just getting the minishift 3.6 VM?

LalatenduMohanty · 2017-08-10T21:00:11Z

I think for users it wont be a blocker if we provision v3.6 from v3.6 oc binary and v3.7 from 3.7 oc binary. However I remember @jorgemoralespou mentioned that he use latest oc binary to provision older OpenShift version and I would let him explain the use-case.

praveenkumar · 2017-08-11T02:32:38Z

@bparees Moving forward if I download a released oc client binary after this implementation then what would happen if I use oc cluster up --version 3.6.0 assuming I have client binary as 3.7.0. Does oc client binary just ignore --version and provision 3.7 or it will just abort by saying this is not support?

In past we had issue when provisioning origin-1.4 with oc-1.4 because it didn't have proxy support and also there was a gap between the way registry expose. And quite recently we had issue provision 1.5.1 on RHEL-7.4 because of container network check and that works for v3.6.0 oc binary so in future how will address those kind of issues?

Then there is also something from CDK side which will start a ML thread.

bparees · 2017-08-11T04:23:33Z

For what its worth, the use case for us is to give users who for example run OpenShift 1.4 or 1.5 in production the ability to match their OpenShift version when using Minishift.

@hferentschik I don't understand why those customers can't use a version of minishift that aligns with their cluster version?

However, we switched to an approach where we use the "baseline" oc to provision earlier versions of OpenShift. There were for example issues provisioning say OpenShift 1.4 with oc 1.4. However, it works when provisioning OpenShift 1.4 with oc 1.5. So this change would indeed affect us. Let me refresh my memory on what issues we had.

If there were issues provisioning openshift 1.4 using oc 1.4, what was shipped in the "1.4" version of the CDK/minishift? That sounds like something that should have been fixed in the 1.4 stream.

Moving forward if I download a released oc client binary after this implementation then what would happen if I use oc cluster up --version 3.6.0 assuming I have client binary as 3.7.0. Does oc client binary just ignore --version and provision 3.7 or it will just abort by saying this is not support?

@praveenkumar I think we'd want to have it tell you its not supported. Frankly the version field would be a bit of a misnomer at that point. It's really indicating what image tag to pull+use (or look for locally).

In past we had issue when provisioning origin-1.4 with oc-1.4 because it didn't have proxy support and also there was a gap between the way registry expose.

and why by the time this was fixed in oc-1.5, was it still important to go back and bring up a 1.4 cluster? At that point, why wouldn't people just bring up a 1.5 cluster with the oc-1.5 binary?

Essentially by saying "you can use a oc-1.5 to get proxy configuration support on an origin-1.4 cluster" we're stealth backporting features to 1.4 that we otherwise aren't interested in backporting. If we were interested in doing that, we'd have backported the capability into the oc-1.4 binary. I realize it's a nice workaround/escape-hatch, but I don't think it's a pattern we should be encouraging. (or can afford to maintain).

And quite recently we had issue provision 1.5.1 on RHEL-7.4 because of container network check and that works for v3.6.0 oc binary so in future how will address those kind of issues?

A few options.

we don't move the CDK to a version of rhel(or other dependency) that doesn't work for oc next time. If it's too late to get the fix into oc (as it was this time) then the CDK doesn't move up.
again, once there is a 3.6 binary available, you use it to run 3.6 clusters. If you want to run 1.5, you use the 1.5 binary, which would be part of the v1.5 CDK/minishift, presumably, no?

Today there's no guarantee that the 3.6 oc binary we just shipped, will actually work to bring up a 1.4 cluster. As @jim-minter noted, no one is testing that, and frankly we can't afford to test all those permutations (which grow each release). If it's not broken today, it almost certainly will be someday.

The current implied promise that any given oc binary can bring up an oc cluster on any version (newer or older than the binary itself) is simply not sustainable.

gbraad · 2017-08-11T04:30:07Z

I agree with @bparees, as provisioning an older or even a newer version, can cause very unexpected side effects. This is not a use-case you want to support. I understand how this can affect CDK, but in that case they would have to accept a download (related to 2. as they binary for a previous version is not embedded/included).

hferentschik · 2017-08-11T07:41:36Z

I don't understand why those customers can't use a version of minishift that aligns with their cluster version?

Minishift progresses quite fast. We release every three weeks, usually with several new features. We are adding new commands (new ways to mount host folders, improvements to add-on mechanism, bug fixes, etc). For us it is nice to be able to provide this new features to the user while he still uses OpenShift in an older version.

Also, assuming that being able to provision 1.4 while it is still supported, we would have to start doing patch releases for various Minishift version to fix important bugs. We would have to keep multiple dev lines. This is nothing uncommon and most software is developed and maintained this way, but as said, for now we had the advantage of being able to tell users to just upgrade to the latest Minishift version to get all bug fixes.

I actually agree with @jim-minter, @bparees and @gbraad that in the long run it is just too hard to keep this compatibility matrix up. We just need to find a way forward as well for Minishift.

I don't think in this case that Minishift should allow to provision earlier version (than its current baseline) of OpenShift in this case either. Going back to an approach where we provision version X of OpenShift with oc version X is not going to work. A concrete example is the switch from 1.5 to 3.6. Pre 3.6 the nodes provisioned used the VM IP as name for the node (in the OpenShift config and in Kube config). As of 3.6, the node for the single node cluster is always 127-0-0-1. This is something we rely on in our code. Using oc 3.6 to provision 1.5 will work. Using oc 1.5 to provision 1.5 will not work unless we put some conditional logic into the code (note this is not a OpenShift or cluster up issue, but relates to additional things we configure). In the end we would be in a similar situation oc is now and which lead to this issue.

If there were issues provisioning openshift 1.4 using oc 1.4, what was shipped in the "1.4" version of the CDK/minishift? That sounds like something that should have been fixed in the 1.4 stream.

In the case of 1.4 it is more of a missing feature. From the CDK perspective it is important to allow the use of HTTP proxies. cluster up of OpenShift 1.4 did not support this, 1.5 does.

I think we'd want to have it tell you its not supported. Frankly the version field would be a bit of a misnomer at that point. It's really indicating what image tag to pull+use (or look for locally).

+1

and why by the time this was fixed in oc-1.5, was it still important to go back and bring up a 1.4 cluster? At that point, why wouldn't people just bring up a 1.5 cluster with the oc-1.5 binary?

Because they might be running an OpenShift 1.4 cluster in production and want to match this version in development. See also minishift/minishift#667

jorgemoralespou · 2017-08-11T14:21:56Z

@bparees, @jim-minter -1 as this breaks a feature we use. In any case, I don't think you've been providing QA to validate this feature and I think that is Ben always up to the user. I would keep the feature and keep not supporting it.
@GrahamDumpleton you're also affected

bparees · 2017-08-11T14:24:52Z

@bparees, @jim-minter -1 as this breaks a feature we use

Please elaborate on why the alternatives (use the right binary for the cluster version you want) are not sufficient for your use case. The minishift team has provided some valid concerns, what are yours?

I would keep the feature and keep not supporting it.

that's a terrible user experience. And you're not going to put up with us telling you "sorry, that scenario is unsupported, we're not fixing it." Not to mention how terrible it will look for us when people have a reasonable expectation of it working and then run into a wall.

jorgemoralespou · 2017-08-12T03:04:27Z

@bparees, for every different version I want to try I might need the corresponding client. In Enterprise when a version is released there's for a version multiple minor releases, which means that people will need to constantly update their clients.
Also, in Enterprise environments, there's customers with multiple clusters in different versions, so when a developer wants to test an app against the production specific version he'll need to switch to the appropriate client, which is not a good user experience. I know because I suffer it, I have a script to select different what client I want (for other reasons). Bare with me that I consider this a power user feature, that really takes a lot of time from me. In any case, I think developers should use minishift, and for it there will be no issue.

To your comment of terrible user experience, I'm not sure why you say that since it's what we currently have. Do we currently test every permutation? Do we even currently support it?
If the plans is to start supporting oc in a way and this feature gets in the middle, then go ahead.
But I envision the push back you'll get from the many people that currently exploit this feature. You can validate the idea in the internal openshift--sme mail alias.

In any case, I think you should do what's best and more sustainable.

bparees · 2017-08-12T03:28:30Z

@bparees, for every different version I want to try I might need the corresponding client. In Enterprise when a version is released there's for a version multiple minor releases, which means that people will need to constantly update their clients.

i think we can safely assume/provide minor version compatibility. this is about 3.4->3.5 and such, not 3.5.1 to 3.5.2.

To your comment of terrible user experience, I'm not sure why you say that since it's what we currently have. Do we currently test every permutation? Do we even currently support it?

We don't test them, at least not explicitly. As for what our support statement is, oc cluster up is now a supported part of the product. It's not entirely clear what our statement would be if someone using the oc 3.6 client tried to bring up a 3.4 cluster and hit an issue, but certainly the user would come from a starting point of assuming that's a supported usage today.

Regarding the terrible user experience, @csrwng has done a heroic job of supporting users and providing fixes, but it is only getting more complex as things like logging+metrics, and service catalog support, get added to what oc cluster up can do for you. And the fact that users have to open issues+get fixes is part of the bad experience. How those features gets configured can change from release to release, which means the oc cluster up logic has to understand what cluster level its talking to, and what the right way is to configure those features on that cluster. This is the part I am worried about sustaining.

Essentially oc cluster up is an installer w/ some basic default config options. I don't think many products ship a single installer that can install/configure any version of the product. The fact that it's difficult for users to get a hold of multiple versions of the oc client binary seems like the problem to solve.

jorgemoralespou · 2017-08-12T03:44:57Z

@bparees, I understand everything you say and can agree with you. There's installers that support multiple versions, and I've been in a customer meeting today where they have explicitly called out that fact. Maybe the approach is wrong. Maybe all the logic should not be in the client. Maybe the installer should have been a specific privileged image that the client would pull down. That would have alleviated the problem. My concern is that with oc cluster the scope never has been clear and I think still is not, and hence in every release there's a tremendous work that @csrwng need to do to keep what comes up. We can deprecate or remove this feature, but what will come next, will we deprecate it as well at any point? Like the service catalog. We've added a technology preview as first class citizen of the client, and how is set up will change from version to version. I would rather ask you to change how oc cluster does the work so that is not in the client itself but in a versionable installer. I think @csrwng at some point mentioned ansible in a container. Whatever it takes to make things easier. But don't rush to remove features per-se, but to find the correct design. But, as I said before, I agree with you and support you although I might complain in between. Whatever makes"oc cluster" more sustainable for the goal for which has been created.

bparees · 2017-08-12T03:54:07Z

Maybe the approach is wrong. Maybe all the logic should not be in the client. Maybe the installer should have been a specific privileged image that the client would pull down. That would have alleviated the problem.

yes, we definitely need to move oc cluster up to a model where it leverages the same paths the official ansible installer leverages instead of baking in its own logic. @csrwng and I have discussed this and it would help w/ these issues, though i do not think it would resolve all of them (the configuration of the ansible install can still change from release to release, after all). A model where the config logic came from the origin image that's being bootstrapped, instead of the oc client would also help. These are things that should be considered regardless of whether the client binary is going to continue to support bootstrapping multiple (major) versions.

Anyway the point of the issue was to have the conversation, not to decree the capability is being removed, and at a minimum to ensure we can provide a "soft landing" if we do remove it.. options for that include maybe we could allow 1-2 previous versions to be supported since that would still constrain the test matrix, though i'm wary of even that since there is still significant overhead as soon as you go from "one path" to "more than one path", and it ensures having to update it every release to change the compatibility list.

jorgemoralespou · 2017-08-12T11:56:45Z

If this is a pain, let's remove the feature completely.

LalatenduMohanty · 2017-09-21T19:07:48Z

Anyway the point of the issue was to have the conversation, not to decree the capability is being removed, and at a minimum to ensure we can provide a "soft landing" if we do remove it.. options for that include maybe we could allow 1-2 previous versions to be supported since that would still constrain the test matrix,

@bparees It is has been some time since we last discussed this. Just wanted to know if you any new update on this issue. Also are you planning to enforce this for 3.7 or 3.8?

bparees · 2017-09-21T19:35:35Z

we have not yet moved forward with any plans on this.

bparees · 2017-12-11T22:30:33Z

at this point we're going to move forward with N-1 support. For the moment nothing enforces this(you want get warned/errored if you stand up an older cluster), but the oc client doesn't guarantee anything beyond N-1 compatibility, so if you use a v3.x client to stand up a v3.(x-2) cluster, there is no guarantee your oc client will even be able to talk to that cluster.

jim-minter added the component/cluster-up label Aug 10, 2017

pweil- added kind/enhancement priority/P2 labels Aug 10, 2017

pweil- assigned bparees Aug 10, 2017

hferentschik mentioned this issue Aug 19, 2017

Spike failure of service catalog provisioning minishift/minishift#1252

Closed

coolbrg mentioned this issue Sep 20, 2017

Minishift should use oc binary of 3.6 or greater, depending upon the OpenShift version needs to be provisioned minishift/minishift#1417

Closed

jorgemoralespou mentioned this issue Nov 20, 2017

cluster up registry route exposure #16614

Closed

bparees closed this as completed Dec 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop allowing oc cluster up version X to instantiate a server version X-1 #15731

Stop allowing oc cluster up version X to instantiate a server version X-1 #15731

jim-minter commented Aug 10, 2017

bparees commented Aug 10, 2017

gabemontero commented Aug 10, 2017

bparees commented Aug 10, 2017 •

edited

gabemontero commented Aug 10, 2017 via email

hferentschik commented Aug 10, 2017

hferentschik commented Aug 10, 2017

bparees commented Aug 10, 2017

LalatenduMohanty commented Aug 10, 2017 •

edited

praveenkumar commented Aug 11, 2017

bparees commented Aug 11, 2017 •

edited

gbraad commented Aug 11, 2017 •

edited

hferentschik commented Aug 11, 2017

jorgemoralespou commented Aug 11, 2017

bparees commented Aug 11, 2017

jorgemoralespou commented Aug 12, 2017

bparees commented Aug 12, 2017

jorgemoralespou commented Aug 12, 2017 via email

bparees commented Aug 12, 2017

jorgemoralespou commented Aug 12, 2017

LalatenduMohanty commented Sep 21, 2017 •

edited

bparees commented Sep 21, 2017

bparees commented Dec 11, 2017

Stop allowing oc cluster up version X to instantiate a server version X-1 #15731

Stop allowing oc cluster up version X to instantiate a server version X-1 #15731

Comments

jim-minter commented Aug 10, 2017

bparees commented Aug 10, 2017

gabemontero commented Aug 10, 2017

bparees commented Aug 10, 2017 • edited

gabemontero commented Aug 10, 2017 via email

hferentschik commented Aug 10, 2017

hferentschik commented Aug 10, 2017

bparees commented Aug 10, 2017

LalatenduMohanty commented Aug 10, 2017 • edited

praveenkumar commented Aug 11, 2017

bparees commented Aug 11, 2017 • edited

gbraad commented Aug 11, 2017 • edited

hferentschik commented Aug 11, 2017

jorgemoralespou commented Aug 11, 2017

bparees commented Aug 11, 2017

jorgemoralespou commented Aug 12, 2017

bparees commented Aug 12, 2017

jorgemoralespou commented Aug 12, 2017 via email

bparees commented Aug 12, 2017

jorgemoralespou commented Aug 12, 2017

LalatenduMohanty commented Sep 21, 2017 • edited

bparees commented Sep 21, 2017

bparees commented Dec 11, 2017

bparees commented Aug 10, 2017 •

edited

LalatenduMohanty commented Aug 10, 2017 •

edited

bparees commented Aug 11, 2017 •

edited

gbraad commented Aug 11, 2017 •

edited

LalatenduMohanty commented Sep 21, 2017 •

edited