Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop allowing oc cluster up version X to instantiate a server version X-1 #15731

Closed
jim-minter opened this issue Aug 10, 2017 · 22 comments
Closed

Comments

@jim-minter
Copy link
Contributor

Although it's a cool idea to be able to instantiate previous versions of origin using a more recent oc cluster up, as complexity/distribution increases I think it's becoming untenable/expensive. Example: #15677. Without test cases, I don't see how we can guarantee correctness. I think we should draw a clear line in the sand after which oc cluster up will only instantiate servers matching its version.

@openshift/devex wdyt?

@bparees
Copy link
Contributor

bparees commented Aug 10, 2017

@hferentschik @LalatenduMohanty @gbraad (I don't think this would affect minishift since they would be expected to align the client binary and the openshift version, but we should make sure).

@jorgemoralespou as a consumer of oc cluster up, any feelings on this one way or another? Do you use a single oc binary to launch different versions? would it be a burden for you to have to use different oc client binaries to do so? (anyone else you think might have an opinion?)

@gabemontero
Copy link
Contributor

So would this prevent an openshift developer from:

  • building his openshift/origin local snapshot from scratch, including all images via hack/build-local-images.py
  • running oc cluster up --version=latest and pick up those images built via hack/build-local-images.py

Or would part of this issues' implementation allow the equivalent by either:

  • simply removing the need for --version=latest, or
  • continue to support --version=latest, or
  • support by supplying some special case value for --version to that local images are picked up
  • or make changes to hack/build-local-images.py such that it tags the images with the local git commit level, and then that must be supplied to oc cluster up

@bparees
Copy link
Contributor

bparees commented Aug 10, 2017

@gabemontero it wouldn't prevent that. the local oc binary they just built, and the openshift binary in the images they just built, would have the same version.

passing "--version=latest" doesn't matter, we check the version by actually running the binary.

@gabemontero
Copy link
Contributor

gabemontero commented Aug 10, 2017 via email

@hferentschik
Copy link
Contributor

I don't think this would affect minishift since they would be expected to align the client binary and the openshift version, but we should make sure.

Well, per default we align the oc version with the baseline OpenShift version used by a given Minishift release. We used to do the same for provisioning earlier version of OpenShift. However, we switched to an approach where we use the "baseline" oc to provision earlier versions of OpenShift. There were for example issues provisioning say OpenShift 1.4 with oc 1.4. However, it works when provisioning OpenShift 1.4 with oc 1.5. So this change would indeed affect us. Let me refresh my memory on what issues we had.

@hferentschik
Copy link
Contributor

For what its worth, the use case for us is to give users who for example run OpenShift 1.4 or 1.5 in production the ability to match their OpenShift version when using Minishift.

@bparees
Copy link
Contributor

bparees commented Aug 10, 2017

@hferentschik going forward, what's the use case for someone getting the minishift release from 3.7 and wanting to run 3.6 with it, instead of just getting the minishift 3.6 VM?

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Aug 10, 2017

I think for users it wont be a blocker if we provision v3.6 from v3.6 oc binary and v3.7 from 3.7 oc binary. However I remember @jorgemoralespou mentioned that he use latest oc binary to provision older OpenShift version and I would let him explain the use-case.

@praveenkumar
Copy link
Contributor

@bparees Moving forward if I download a released oc client binary after this implementation then what would happen if I use oc cluster up --version 3.6.0 assuming I have client binary as 3.7.0. Does oc client binary just ignore --version and provision 3.7 or it will just abort by saying this is not support?

In past we had issue when provisioning origin-1.4 with oc-1.4 because it didn't have proxy support and also there was a gap between the way registry expose. And quite recently we had issue provision 1.5.1 on RHEL-7.4 because of container network check and that works for v3.6.0 oc binary so in future how will address those kind of issues?

Then there is also something from CDK side which will start a ML thread.

@bparees
Copy link
Contributor

bparees commented Aug 11, 2017

For what its worth, the use case for us is to give users who for example run OpenShift 1.4 or 1.5 in production the ability to match their OpenShift version when using Minishift.

@hferentschik I don't understand why those customers can't use a version of minishift that aligns with their cluster version?

However, we switched to an approach where we use the "baseline" oc to provision earlier versions of OpenShift. There were for example issues provisioning say OpenShift 1.4 with oc 1.4. However, it works when provisioning OpenShift 1.4 with oc 1.5. So this change would indeed affect us. Let me refresh my memory on what issues we had.

If there were issues provisioning openshift 1.4 using oc 1.4, what was shipped in the "1.4" version of the CDK/minishift? That sounds like something that should have been fixed in the 1.4 stream.

Moving forward if I download a released oc client binary after this implementation then what would happen if I use oc cluster up --version 3.6.0 assuming I have client binary as 3.7.0. Does oc client binary just ignore --version and provision 3.7 or it will just abort by saying this is not support?

@praveenkumar I think we'd want to have it tell you its not supported. Frankly the version field would be a bit of a misnomer at that point. It's really indicating what image tag to pull+use (or look for locally).

In past we had issue when provisioning origin-1.4 with oc-1.4 because it didn't have proxy support and also there was a gap between the way registry expose.

and why by the time this was fixed in oc-1.5, was it still important to go back and bring up a 1.4 cluster? At that point, why wouldn't people just bring up a 1.5 cluster with the oc-1.5 binary?

Essentially by saying "you can use a oc-1.5 to get proxy configuration support on an origin-1.4 cluster" we're stealth backporting features to 1.4 that we otherwise aren't interested in backporting. If we were interested in doing that, we'd have backported the capability into the oc-1.4 binary. I realize it's a nice workaround/escape-hatch, but I don't think it's a pattern we should be encouraging. (or can afford to maintain).

And quite recently we had issue provision 1.5.1 on RHEL-7.4 because of container network check and that works for v3.6.0 oc binary so in future how will address those kind of issues?

A few options.

  1. we don't move the CDK to a version of rhel(or other dependency) that doesn't work for oc next time. If it's too late to get the fix into oc (as it was this time) then the CDK doesn't move up.
  2. again, once there is a 3.6 binary available, you use it to run 3.6 clusters. If you want to run 1.5, you use the 1.5 binary, which would be part of the v1.5 CDK/minishift, presumably, no?

Today there's no guarantee that the 3.6 oc binary we just shipped, will actually work to bring up a 1.4 cluster. As @jim-minter noted, no one is testing that, and frankly we can't afford to test all those permutations (which grow each release). If it's not broken today, it almost certainly will be someday.

The current implied promise that any given oc binary can bring up an oc cluster on any version (newer or older than the binary itself) is simply not sustainable.

@gbraad
Copy link
Contributor

gbraad commented Aug 11, 2017

I agree with @bparees, as provisioning an older or even a newer version, can cause very unexpected side effects. This is not a use-case you want to support. I understand how this can affect CDK, but in that case they would have to accept a download (related to 2. as they binary for a previous version is not embedded/included).

@hferentschik
Copy link
Contributor

I don't understand why those customers can't use a version of minishift that aligns with their cluster version?

Minishift progresses quite fast. We release every three weeks, usually with several new features. We are adding new commands (new ways to mount host folders, improvements to add-on mechanism, bug fixes, etc). For us it is nice to be able to provide this new features to the user while he still uses OpenShift in an older version.

Also, assuming that being able to provision 1.4 while it is still supported, we would have to start doing patch releases for various Minishift version to fix important bugs. We would have to keep multiple dev lines. This is nothing uncommon and most software is developed and maintained this way, but as said, for now we had the advantage of being able to tell users to just upgrade to the latest Minishift version to get all bug fixes.

I actually agree with @jim-minter, @bparees and @gbraad that in the long run it is just too hard to keep this compatibility matrix up. We just need to find a way forward as well for Minishift.

I don't think in this case that Minishift should allow to provision earlier version (than its current baseline) of OpenShift in this case either. Going back to an approach where we provision version X of OpenShift with oc version X is not going to work. A concrete example is the switch from 1.5 to 3.6. Pre 3.6 the nodes provisioned used the VM IP as name for the node (in the OpenShift config and in Kube config). As of 3.6, the node for the single node cluster is always 127-0-0-1. This is something we rely on in our code. Using oc 3.6 to provision 1.5 will work. Using oc 1.5 to provision 1.5 will not work unless we put some conditional logic into the code (note this is not a OpenShift or cluster up issue, but relates to additional things we configure). In the end we would be in a similar situation oc is now and which lead to this issue.

If there were issues provisioning openshift 1.4 using oc 1.4, what was shipped in the "1.4" version of the CDK/minishift? That sounds like something that should have been fixed in the 1.4 stream.

In the case of 1.4 it is more of a missing feature. From the CDK perspective it is important to allow the use of HTTP proxies. cluster up of OpenShift 1.4 did not support this, 1.5 does.

I think we'd want to have it tell you its not supported. Frankly the version field would be a bit of a misnomer at that point. It's really indicating what image tag to pull+use (or look for locally).

+1

and why by the time this was fixed in oc-1.5, was it still important to go back and bring up a 1.4 cluster? At that point, why wouldn't people just bring up a 1.5 cluster with the oc-1.5 binary?

Because they might be running an OpenShift 1.4 cluster in production and want to match this version in development. See also minishift/minishift#667

@jorgemoralespou
Copy link

@bparees, @jim-minter -1 as this breaks a feature we use. In any case, I don't think you've been providing QA to validate this feature and I think that is Ben always up to the user. I would keep the feature and keep not supporting it.
@GrahamDumpleton you're also affected

@bparees
Copy link
Contributor

bparees commented Aug 11, 2017

@bparees, @jim-minter -1 as this breaks a feature we use

Please elaborate on why the alternatives (use the right binary for the cluster version you want) are not sufficient for your use case. The minishift team has provided some valid concerns, what are yours?

I would keep the feature and keep not supporting it.

that's a terrible user experience. And you're not going to put up with us telling you "sorry, that scenario is unsupported, we're not fixing it." Not to mention how terrible it will look for us when people have a reasonable expectation of it working and then run into a wall.

@jorgemoralespou
Copy link

@bparees, for every different version I want to try I might need the corresponding client. In Enterprise when a version is released there's for a version multiple minor releases, which means that people will need to constantly update their clients.
Also, in Enterprise environments, there's customers with multiple clusters in different versions, so when a developer wants to test an app against the production specific version he'll need to switch to the appropriate client, which is not a good user experience. I know because I suffer it, I have a script to select different what client I want (for other reasons). Bare with me that I consider this a power user feature, that really takes a lot of time from me. In any case, I think developers should use minishift, and for it there will be no issue.

To your comment of terrible user experience, I'm not sure why you say that since it's what we currently have. Do we currently test every permutation? Do we even currently support it?
If the plans is to start supporting oc in a way and this feature gets in the middle, then go ahead.
But I envision the push back you'll get from the many people that currently exploit this feature. You can validate the idea in the internal openshift--sme mail alias.

In any case, I think you should do what's best and more sustainable.

@bparees
Copy link
Contributor

bparees commented Aug 12, 2017

@bparees, for every different version I want to try I might need the corresponding client. In Enterprise when a version is released there's for a version multiple minor releases, which means that people will need to constantly update their clients.

i think we can safely assume/provide minor version compatibility. this is about 3.4->3.5 and such, not 3.5.1 to 3.5.2.

To your comment of terrible user experience, I'm not sure why you say that since it's what we currently have. Do we currently test every permutation? Do we even currently support it?

We don't test them, at least not explicitly. As for what our support statement is, oc cluster up is now a supported part of the product. It's not entirely clear what our statement would be if someone using the oc 3.6 client tried to bring up a 3.4 cluster and hit an issue, but certainly the user would come from a starting point of assuming that's a supported usage today.

Regarding the terrible user experience, @csrwng has done a heroic job of supporting users and providing fixes, but it is only getting more complex as things like logging+metrics, and service catalog support, get added to what oc cluster up can do for you. And the fact that users have to open issues+get fixes is part of the bad experience. How those features gets configured can change from release to release, which means the oc cluster up logic has to understand what cluster level its talking to, and what the right way is to configure those features on that cluster. This is the part I am worried about sustaining.

Essentially oc cluster up is an installer w/ some basic default config options. I don't think many products ship a single installer that can install/configure any version of the product. The fact that it's difficult for users to get a hold of multiple versions of the oc client binary seems like the problem to solve.

@jorgemoralespou
Copy link

jorgemoralespou commented Aug 12, 2017 via email

@bparees
Copy link
Contributor

bparees commented Aug 12, 2017

Maybe the approach is wrong. Maybe all the logic should not be in the client. Maybe the installer should have been a specific privileged image that the client would pull down. That would have alleviated the problem.

yes, we definitely need to move oc cluster up to a model where it leverages the same paths the official ansible installer leverages instead of baking in its own logic. @csrwng and I have discussed this and it would help w/ these issues, though i do not think it would resolve all of them (the configuration of the ansible install can still change from release to release, after all). A model where the config logic came from the origin image that's being bootstrapped, instead of the oc client would also help. These are things that should be considered regardless of whether the client binary is going to continue to support bootstrapping multiple (major) versions.

Anyway the point of the issue was to have the conversation, not to decree the capability is being removed, and at a minimum to ensure we can provide a "soft landing" if we do remove it.. options for that include maybe we could allow 1-2 previous versions to be supported since that would still constrain the test matrix, though i'm wary of even that since there is still significant overhead as soon as you go from "one path" to "more than one path", and it ensures having to update it every release to change the compatibility list.

@jorgemoralespou
Copy link

If this is a pain, let's remove the feature completely.

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Sep 21, 2017

Anyway the point of the issue was to have the conversation, not to decree the capability is being removed, and at a minimum to ensure we can provide a "soft landing" if we do remove it.. options for that include maybe we could allow 1-2 previous versions to be supported since that would still constrain the test matrix,

@bparees It is has been some time since we last discussed this. Just wanted to know if you any new update on this issue. Also are you planning to enforce this for 3.7 or 3.8?

@bparees
Copy link
Contributor

bparees commented Sep 21, 2017

we have not yet moved forward with any plans on this.

@bparees
Copy link
Contributor

bparees commented Dec 11, 2017

at this point we're going to move forward with N-1 support. For the moment nothing enforces this(you want get warned/errored if you stand up an older cluster), but the oc client doesn't guarantee anything beyond N-1 compatibility, so if you use a v3.x client to stand up a v3.(x-2) cluster, there is no guarantee your oc client will even be able to talk to that cluster.

@bparees bparees closed this as completed Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants