Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a minimal release process for our ksonnet configs #215

Closed
jlewi opened this issue Feb 7, 2018 · 13 comments
Closed

Create a minimal release process for our ksonnet configs #215

jlewi opened this issue Feb 7, 2018 · 13 comments
Assignees

Comments

@jlewi
Copy link
Contributor

jlewi commented Feb 7, 2018

Kubeflow was broken for quite a bit because the ksonnet configs got out of sync with the Docker images.

To avoid this we should start pinning/tagging known good releases.

Context:
kubeflow/training-operator#339
Discussion in #208

If we create a tag or branch then we can just use that tag with ksonnet

ks registry add kubeflow github.com/kubeflow/kubeflow/tree/0.1/kubeflow

At a minimum it would be good to start defining a process for creating releases to figure out what we will do for our 0.1 milestone.

Even better would be to think about how to automate the toil of releases e.g
- Running tests to qualify the release
- Publishing release notes

Are there existing tools we can use?

@gaocegege
Copy link
Member

Publishing release notes

https://github.com/skywinder/github-changelog-generator could generate changelog automatically.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 21, 2018

@willb Can I assign this to you for now since you are working on a proposal for the release process?

@jlewi
Copy link
Contributor Author

jlewi commented Feb 23, 2018

We had another break #283 at head affecting serving.

I need to push a new Docker image for TFJob to fix #322

I'm thinking as part of this we should start documenting the steps it takes to do a release and then use that to inform a proposal.

I chatted briefly with @willb and it sounds like he's still researching various options.

In brief here's what I think this will look like

  • Build test TFJob operator by launching an Argo workflow
  • Create a PR updating the ksonnet configs to use the new image.
  • Tag the kubeflow/kubeflow commit "STABLE"

I think we should start using the tag STABLE to refer to the most recent stable commit of our registry.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 24, 2018

We now have an E2E test for TFServing.
But the E2E test builds a new Docker image for TFServing and then uses that. This is great if we're building a new Docker image and want to verify it works but its not quite what we want in the case where we want to verify that the current ksonnet prototypes will work.

When we release our ksonnet configs the flow will be something like

  • Update the ksonnet components (e.g. update Docker images)
  • Run the tests to verify everything works
  • Submit a PR
  • Tag the commit stable

A simple thing to do would be to add another workflow which runs the E2E test for TFServing but without building a new image and without overriding the image used by the ksonnet component.

/cc @lluunn

jlewi added a commit that referenced this issue Mar 1, 2018
Start putting instructions together for how one could release Kubeflow.

The instructions are rather adhoc and reflect what one could do today

They do not reflect what we think our release process should look like.

The current instructions only cover building and releasing TFJob operator and the TFServing images

Refactor the TFServing image E2E workflow so we can use it to build release images

We want to run a different cluster in a different project for releases and publish to a different GCR
registry
We move the default parameters into the .libsonnet file; The number of parameters was getting
large so I think it makes sense to pass them as a dictionary and not as a list of arguments
Make buildTemplate a local variable so we automatically inherit the values of the parameters
and don't have to pass a bunch of arguments every time we build it.
Update the actual image used by TFSerivng to the newly built image.

@willb and some others are putting together a proposal for our release process. Hopefully this PR will help inform that work and help us figure out how to iterate.

Related to #215
@jlewi
Copy link
Contributor Author

jlewi commented Mar 1, 2018

To elaborate on the above I think we should have the following

  • An Argo workflow that builds all the Docker images
    • e.g TFJob operator, TFServing, HTTP Proxy Jupyter Images,
    • I think the only tests we need to run are tests which won't be covered when we run the presubmits on the PR modifying the ksonnet configs.
  • A binary that can be invoked to update the parameters of various ksonnet prototypes to change the images to the new values
  • Create a PR with the modified ksonnet figs
  • PR should trigger presubmits which should verify that the ksonnet configs by running appropriate tests

@jlewi
Copy link
Contributor Author

jlewi commented Mar 7, 2018

There are some instructions here:
https://github.com/kubeflow/kubeflow/blob/master/releasing.md

I think the next steps would be

  1. Cut a 0.0.1 branch
  2. Update our user guide and instructions to to pull ksonnet from the release branch
  3. Update the instructions with the above.

@willb Any chance you could take a stab at this? Preferably this week?

@willb
Copy link
Contributor

willb commented Mar 7, 2018

@jlewi definitely assign this to me; I’m in offsite meetings all week but will be able to give this contiguous free time while traveling home or early next week.

@jlewi
Copy link
Contributor Author

jlewi commented Mar 8, 2018

Thank you so much @willb. Consider it assigned.

@jlewi
Copy link
Contributor Author

jlewi commented Mar 19, 2018

@willb Any update? Do you think we could cut an initial release this week with whatever's at head?

@mhausenblas
Copy link
Member

Let me know where I can be of assistance …

@willb
Copy link
Contributor

willb commented Mar 20, 2018

@jlewi I think so! I burned some time trying to set up a local sandbox environment but am going to bail on that for now. I'll ping you if I need help.

@jbottum
Copy link
Contributor

jbottum commented Sep 30, 2018

/area 0.4.0

@jlewi
Copy link
Contributor Author

jlewi commented Oct 1, 2018

This has already been fixed for a while; closing this issue.

@jlewi jlewi closed this as completed Oct 1, 2018
yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Nov 1, 2019
elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022
* Add yliu989 to the org

yuzhui liu has been contributing to KFServing

* Update org.yaml

* Update org.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants