Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider keeping serving->CI/CD control #29

Closed
sixolet opened this issue Mar 2, 2019 · 7 comments
Closed

Consider keeping serving->CI/CD control #29

sixolet opened this issue Mar 2, 2019 · 7 comments

Comments

@sixolet
Copy link
Contributor

sixolet commented Mar 2, 2019

This is a proposal that is a counterpoint to #25

It's also drawing heavily from @mattmoor and his thoughts about our next Serving API rev, even if it disagrees about this one point.

Motivation

Live your truth.

When you start using Knative, your day 1 experience probably doesn't involve a GitOps repository. On day one, you're using kn to deploy directly, and using the etcd of your Kubernetes cluster as your source of truth about what should be serving, and at what percentages, and under what subdomains.

As you grow, you may end up setting up a github repository, and transitioning your source of truth over to that.

Both of these are valid workflows for various orgs and maturities. Transitioning between them should be smooth, and not require a phase change (@evankanderson's point 1 in #25)

Well I do declare!

The serving API a language for describing what programs you want to run, accessible under what names, with what traffic split, configured how. Knative contains a way of interpereting that language into action on a kubernetes cluster too, but the language itself is useful even without the particular controllers in the serving repo. The PodSpec-able nature of @mattmoor's proposed v1 API is great for when you want to declare the thing you run as a fully-formed image, but sometimes the image is just a sideeffect, the actual main description of the program you want to be running is a particular commit to a source git repo.

A CI/CD system, on the other hand, describes a process, not a declaration of what you want running. Each run of the CI/CD pipeline might build a thing to run, and set some percentages, but what you actually run is something like the most recent of those to complete successfully, or even worse, ill-defined.

In your declarative serving-API language, you should declare exactly what code you want to be running. It should be up to your CI system to make a tested deployable artifact for you, safely.

In your declarative serving-API language, you should declare exactly what percent of traffic should be going to your program. It should be up to your CD system to figure out how to make that so, safely.

Therefore, the serving API specifies what a CI/CD system should accomplish. The serving API knows what, the CI/CD system knows how.

Concrete v1 API suggestions

(with reference to @mattmoor v1 API principles/opinions)

Specifying source to run in the serving API language

Let Service inline the inputs and outputs parts of TaskRun/PipelineRun from the Pipelines API. This is for specifying source instead of image to run, and is how the CI system knows what to build and test. (Subresource specs can be inlined vs. embedded, [Use] [known] idioms to evoke API familiarity)

When a Service specifies source to run, the outputs section can be considered to by default include an image in an operator-configured image repo named something sensible based on the service name, tagged with the commit hash, and the image of the container to run is also by default that image. ([Use] defaulting to graduate complexity)

Integrating with CI/CD

This is the part where I'm suggesting we NOT invert the flow control, or at least allow a non-inverted flow control.

Service can also inline the PipelineRef/TaskRef field from the Pipelines API, but generalized so you could point to anything (it shouldn't have to be a Pipeline or Task, because we shouldn't limit ourselves to Pipelines as a CI/CD system). (Henceforth when I say Pipeline please read Pipeline, Task, or other CI/CD primitive; they should be pluggable)

When you specify no PipelineRef, you get the behavior you get today for images, and an error if you attempted to specify source. Specifying a particular traffic split, Knative will immediately make the relevant route(s) reflect it.

When you specify a PipelineRef, that pipeline is in charge of all subresource manipulations. That means that the Service is treated as the declarative goal for what should be deployed; the Pipeline is in charge of manipulating the Service's Configuration and Route(s) to match. They're still owned by the Service, but the Service's controller won't touch them directly. Instead, it'll instantiate that Pipeline , and the Pipeline will do its thing.

Regarding the pipeline doing its thing, that thing can even be manipulations to another canary Knative service, or even manipulations to another three Knative clusters. (Embrace Gitops, or die: We must anticipate workflows where the same resource definitions may be applied across multiple clusters, including release scenarios.)

Knative comes with a special ExternalOperation pipeline. When you specify your Service has that pipeline, it does nothing, and expects a human or system outside Knative to be manipulating the child Configuration and Route(s) of the service. This is the Manual Mode missing from @mattmoor 's presentation.

Example

Here's a source-based service using @mattmoor 's suggestions for v1 as a jumping-off point:

apiVersion: serving.knative.dev/v1beta1
kind: Service
metadata:
  name: thingservice
spec:
  inputs:
    resources:
    - resourceSpec:
      type: git
      params:
      - name: url
        value: https://github.com/wizzbangcorp/thing.git
      - name: revision
        value: lolcommithash
  pipelineRef:
    name: build-it-and-roll-it-out-slowly
    kind: Pipeline
    apiVersion: tekton.dev/v1

This specifies the source. The cluster has gcr.io/stuff configured as the image repo for this namespace, so the image is going to be by default gcr.io/stuff/thingservice:lolcommithash. The pipeline build-it-and-roll-it-out-slowly is invoked with the source input, the image output, and a reference to this Service, which will build the relevant image and then roll it out slowly on this service (or even several others!).

Transitioning from etcdops to full-on gitops

You download your services form etcd into Git. You set up your same relevant pipelines to trigger off commits to the git repo, with the extra initial Task of kubectl apply-ing the directory. You change all the pipelineRef fields to ExternalOperation.


I plan to edit the above as discussion is ongoing, if we end up with discussion that would make this idea better. Please comment on anything that is either a bad idea or unclear.

@sixolet
Copy link
Contributor Author

sixolet commented Mar 2, 2019

Highlight to summon @mattmoor @duglin @cppforlife @evankanderson

@evankanderson
Copy link
Member

I have thoughts here, but I also have a lot of peer feedback to write for Google performance reviews, so I'll have better thoughts on Wednesday or Thursday.

@mattmoor
Copy link
Member

mattmoor commented Mar 8, 2019

So the way we inline builds today includes TypeMeta for extensibility:

apiVersion: serving.knative.dev/v1alpha1
kind: Configuration
metadata:
  name: foo
spec:
  build:
    # We support Pipelines / Task here today, as well as random
    # other stuff (e.g. I have a test Build we test in e2e for extensibility)
    apiVersion: build.knative.dev/v1alpha1
    kind: Build
    spec:
      ...
  revisionTemplate:
    spec:
      container:
        imate: mattmoor/foo:bar

Creates Revisions with:

apiVersion: serving.knative.dev/v1alpha1
kind: Configuration
metadata:
  name: foo
spec:
  buildRef:
    apiVersion: build.knative.dev/v1alpha1
    kind: Build
    name: ...
  container:
    imate: mattmoor/foo:bar

Where we expect the resource embedded into Configuration as spec.build that we reference from Revision as spec.buildRef to culminate in a Succeeded: True condition. Given this, I think I'd be inclined to position this as setup[Ref]: instead of build[Ref]: (if we were to keep them) since there's nothing about this contract that necessitates that this be used to produce the image.


However, drilling into your proposal a bit, it sounds like there are a couple key departures from this which raise questions:

  1. No TypeMeta, so what are the extensibility implications?
  2. No TypeMeta, so how this is implemented is unclear (would we practically have an internal dependency on Tekton?)
  3. There are a number of pieces you list that are implicit (e.g. image output, svc ref), which imply that we have a strong enough sense for the schema that we can make alterations.

The cluster has gcr.io/stuff configured as the image repo for this namespace, so the image is going to be by default gcr.io/stuff/thingservice:lolcommithash.

Configured how?

The pipeline build-it-and-roll-it-out-slowly is invoked with the source input, the image output, and a reference to this Service

I'm curious about the "invoked" part. Is this a hard dependency on Tekton? I'm curious how that's appreciably different from us simply invoking a Pipeline that does stuff?

which will build the relevant image and then roll it out slowly on this service (or even several others!).

I believe this means that the instantiated Pipeline will have an ObjectReference to the Service, and post back to it to start and progress a particular rollout? What happens if while this is running I post another new commit? Do the two Pipelines race?

@sixolet
Copy link
Contributor Author

sixolet commented Mar 8, 2019 via email

@sixolet
Copy link
Contributor Author

sixolet commented Mar 8, 2019

The cluster has gcr.io/stuff configured as the image repo for this namespace, so the image is going to be by default gcr.io/stuff/thingservice:lolcommithash.

Configured how?

I was imagining setting namespace-scoped or cluster-scoped defaults for "where do you go and agree to put images?" in a configuration, mostly so you can get good defaulting.

I'm curious about the "invoked" part. Is this a hard dependency on Tekton? I'm curious how that's appreciably different from us simply invoking a Pipeline that does stuff?

It's pretty similar to what we have now, but this introduces two ideas:

  • The Service is the input to a Pipeline or other CI/CD system
  • The Pipeline or other CD system can take the place of the Service controller for orchestrating traffic rollout, by controlling the Route.

which will build the relevant image and then roll it out slowly on this service (or even several others!).

Soft dependency on Tekton, or a system like it. A dependency on the interface, and the ability to go instantiate a FooRun object of the relevant type. A thing I'm a bit unsure of: how you get from the reference to Pipeline to know you have to instantiate PipelineRun, etc.

I believe this means that the instantiated Pipeline will have an ObjectReference to the Service, and post back to it to start and progress a particular rollout? What happens if while this is running I post another new commit? Do the two Pipelines race?

This is like any case where you go and invoke two PipelineRuns that are competing over the same resources. At least when you have the Service to reference back to, you know what the user intended the end state to be, unlike trying to invoke the Pipelines directly.

@sixolet
Copy link
Contributor Author

sixolet commented Mar 8, 2019

One more thought that occurs to me: Another possible valuable arrangement is one where a POST or PUT or PATCH of a changed Service object doesn't directly reference a Pipeline, but the Pipeline is triggered by the POST PUT or PATCH just like it would be if you made a git commit that caused an event to trigger a PipelineRun. In that scenario it'd be some kind of event binding beastie that would be in charge of associating the Pipeline with the Service, but you'd get a lot of the same benefits.

This assumes you can still reference source in the Serving API, even if you're not referencing a Pipeline— and that's a really weird thing to do if the default controller isn't going to touch it.

You'd need a different way than "specify a pipeline in the Service" to indicate that "there is a Pipeline in charge of managing rollout here", though.

@sixolet
Copy link
Contributor Author

sixolet commented Mar 8, 2019

And finally, before I leave my computer for the night, Knative could maintain some stock Tekton Task implementations that do a few pleasant classic styles of rollout, so you can get your custom pipelines up and running with a few good choices quickly.

@sixolet sixolet closed this as completed Jul 9, 2019
coryrc pushed a commit to coryrc/client that referenced this issue May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants