New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate the pod template from replicationController #170

Open
bgrant0607 opened this Issue Jun 19, 2014 · 75 comments

Comments

@bgrant0607
Member

bgrant0607 commented Jun 19, 2014

We should separate the pod template from replicationController, to make it possible to create pods from template without replicationController (e.g., for cron jobs, for deferred execution using hooks). This would also make updates cleaner.

@jbeda jbeda added the enhancement label Jun 19, 2014

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Jul 9, 2014

If we were to remove replicationController from the core apiserver into a separate service, I'd leave the pod template in the core.

@bgrant0607 bgrant0607 changed the title from Consider separating the pod template from replicationController to Separate the pod template from replicationController Jul 11, 2014

@erictune

This comment has been minimized.

Member

erictune commented Jul 23, 2014

I'll take a shot at this.

@bgrant0607
@lavalamp
Please share any other thoughts on podTemplates.

Brian mentioned cron. This makes me think he wants to use podTemplates for delegation.
That is, is there some mechanism where principal A can define a /podTemplate, and then grant principal B permission to create /pods which derive from a certain /podTemplate, but which run as A. (I guess a replicationController is effectively "another principal" which A can delegate to?)

Will the delegation of power be part of the /podTemplate message, or will that be stored in some sideband ACL?
Does the PUT /pods method get extended to allow creating a /pod from a template instead of using the desiredState? Or is there a new non-REST method?

@lavalamp

This comment has been minimized.

Member

lavalamp commented Jul 23, 2014

PUT /pods should only take pods, IMO. We should potentially offer something that "fills out" a pod template, but I think that should work like our current replication controller, which is an external component.

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Jul 24, 2014

The point of this proposal is to further narrow the responsibility and API of the replicationController to its bare essentials. The replicationController should just spawn new replicas.

Right now, essentially a full copy of the pod API is embedded in the replicationController API. As an external, independently versioned API, it would be challenging to keep synchronized with the core API. Additionally, with the replicationController creating pods by value rather than by reference, it needs to be delegated the authority to create ~arbitrary pods as ~arbitrary users (once we support multi-tenancy) -- this would mean it could do anything as anybody. This is even more of an issue once we introduce an auto-scaler layered on the replicationController.

It's very difficult to develop a signature-based approach layered on a literal pod creation API that could be made both usable and secure. OTOH, if the replicationController could only spawn instances from templates owned by the core, then it's power could be restricted. Think of this as the principle of least privilege and separation of concerns.

Including the template by reference in the replicationController API would also facilitate rollbacks to previous pod configurations. A standalone template could be used for cron and other forms of deferred execution.

Even if we were to add a more general templating/configuration-generation mechanism in the future, we can't have turtles all the way down. A pod template would be useful for spawning the config generator, among other things.

As with the current replicationController API, pods would have no relationship to the template from which they were generated other than their labels and any other provenance information we kept. Changes to the template would have no effect on pods created from it previously.

I'd be fine with a separate API endpoint for creating pods from a template, just as we have for replicationController today.

@erictune

This comment has been minimized.

Member

erictune commented Jul 24, 2014

Okay, putting together above comments and my own thoughts...

For the initial PR, we just need to have a pod template type which unambiguously and completely defines a /pod. Later PRs can extend /podTemplate as needed to support authorization of delegated use.

Here are a few examples with delegation and the types of expansion of templates that might occur:
- cron service runs a /pod, but passes to pod's environment a string identifying the datecode for this run.
- third-party auto-scaler makes more of a pod, or makes pods that request more or less resources.
- map-reduce service makes several pods, setting environment variables that control input and output file paths.
- ABTester service makes pods with two different values for an Environment variable that controls a new feature, and two different values of another Environment variable that controls a tag added to the logs of these pods (e.g. experiment_27354_mode_a)

Considerations for podTemplate:

  1. Ease and succinctness of definition of podTemplate
  2. Ease of reasoning about the security implications of giving a user permission to instantiate pods from a podTemplate
  3. Work with YAML as well as json.
  4. allow templates to generate many different kinds of pods.
    Item 4 seems much less important than 1 and 2. Therefore, this rules out a podTemplate which holds a schema definition, or jpath expression, or anything else which allows fully general manipulation of json-type data. Item 3 above further reinforces this.

Therefore, a /podTemplate will look something like this:

{ "id": "awesomePodTemplate",
  "pod": "<object exactly following /pod schema>", 
   "allowExtraEnvVars": [
     "MOTD": 
       "Today's pod brought to you by a replication controller."],
   "allowModifiedResourcesRequestsAndLimits": 1,
   "delegatedPodMakers": ["alice@example.com", "replicationcontroller@kubernetes.io"],
}

Note the specific, capability-like descriptions of allowed modifications to the /pod object.

However, the first PR will just have:

{ "id": "myPodTemplate",
  "pod": "<object exactly following /pod schema>", 
}

The /pod schema will get a new member "actsAsUser". This affects which user the pod acts as.
Initially, this will have no affect. As we add authentication (#443), the following authorization code can be added to the apiserver:

if authenticatedUser == request.pod.actAsUser { return auth.Authorized }
return auth.notAuthorized

In a later PRs, the /pod schema will be extended to have a "fromPodTemplateId" member which references the id of the /podTemplate that this /pod is modeled on. This adds an interesting twist: we can't use the user-provided name alone to identify the /podTemplate. We need to specify which user's namespace the name lies in. Maybe "actAsUser" identifies this or maybe we need a globally unique id for a podTemplate.

With that member added, the authorization check for creating a /pod would look like this:

if authenticatedUser == request.pod.actAsUser { return Authorized }
if auth.Can(authenticatedUser, auth.MakePodsFor, request.pod.actAsUser) {
    tpl := findPodTemplate(request.fromPodTemplateId)
    if tpl != nil {
      if tpl.Generates(request.pod) {
         return auth.Authorized
      }
    }
  }
}
return auth.NotAuthorized
@erictune

This comment has been minimized.

Member

erictune commented Jul 24, 2014

Other use case: pod's port can come from a range, to allow duplicate pods on the same host. Would this go in the template?

@lavalamp

This comment has been minimized.

Member

lavalamp commented Jul 24, 2014

I wonder if we should maybe add an "owner" field to the JSONBase, so that all objects in the system could have an owning user. If so, no need to specifically add that field to the PodTemplate.

In a later PRs, the /pod schema will be extended to have a "fromPodTemplateId" member which references the id of the /podTemplate that this /pod is modeled on. This adds an interesting twist: we can't use the user-provided name alone to identify the /podTemplate.

This could be done with a label, which is what our current replicationController does.

I think a step that should come shortly after adding PodTemplate as a resource is changing the replication controller struct to take a podTemplateID instead of a hardcoded PodTemplate.

Port shouldn't be dynamic.

May want @brendanburns to take a look at this when he gets back.

@erictune

This comment has been minimized.

Member

erictune commented Jul 24, 2014

On Thu, Jul 24, 2014 at 10:24 AM, Daniel Smith notifications@github.com
wrote:

I wonder if we should maybe add an "owner" field to the JSONBase, so that
all objects in the system could have an owning user. If so, no need to
specifically add that field to the PodTemplate.

In a later PRs, the /pod schema will be extended to have a
"fromPodTemplateId" member which references the id of the /podTemplate that
this /pod is modeled on. This adds an interesting twist: we can't use the
user-provided name alone to identify the /podTemplate.

This could be done with a label, which is what our current
replicationController does.

can a label selector select a different user's objects?

I think a step that should come shortly after adding PodTemplate as a
resource is changing the replication controller struct to take a
podTemplateID instead of a hardcoded PodTemplate.

Okay, but again the namespace/user issue is unresolved.

Port shouldn't be dynamic.

May want @brendanburns https://github.com/brendanburns to take a look
at this when he gets back.


Reply to this email directly or view it on GitHub
#170 (comment)
.

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Jul 25, 2014

Thanks, @erictune .

First of all, while security is part of the motivation for this, I'd drop all user / identity / auth / delegation stuff until we figure out auth[nz] more generally. That said, we'll want to namespace label keys by project implicitly by default to prevent conflicts and overlap across users.

Second, we should leave out most/all forms of substitution and computation more generally. A more general config mechanism is a separate issue. I was thinking of take what replicationController supports today and moving it to a separate object, which we might want to garbage collect after some amount of time in the case that it hasn't been used.

However, I think it's not too early to think about the override model, and whether we want one eventually, even though we wouldn't implement it initially. Env. vars. and resources are good examples.

It would be useful to think about how splitting out the template (and overrides) would interact with updates driven by declarative configuration. Does the replicationController change to a new template, or does one update its template? How does one update pods controlled by the replicationController? Some ideas were discussed in #492 .

Duplicate pods on the same host: We implement IP per pod, so no port allocation range is necessary.

fromPodTemplateId: The template must behave as a cookie cutter -- once a pod is created from a template, it has no relationship to the template. The template may be changed or deleted without affecting pods created from it, and pods created from it may be modified independently. We probably do want to record provenance information for debugging and/or auditing, though. It would include information like the template id, time, replication controller id (if created by one), user, etc.

@smarterclayton

This comment has been minimized.

Contributor

smarterclayton commented Jul 29, 2014

@bgrant0607 Can you describe config generators a bit more as mentioned in 146? Haven't heard you mention that yet, but I suspect it matches use cases we are looking to solve as well.

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Aug 28, 2014

Created #1007 to start the broader config discussion.

#503 contains another example that could use the pod template: job controller.

I'd like a bulk-creation operation to go with the pod template, so that a replication controller could send one operation to create N pods. This will eventually be important for performance, gang scheduling, usage analytics, etc.

bgrant0607 referenced this issue in smarterclayton/kubernetes Sep 12, 2014

Proposal for v1beta3 API
* Separate metadata from objects
* Identify current state of objects consistently
* Introduce BoundPod(s) as distinct from Pod to represent pods
  scheduled onto a host
* Use "spec" instead of "state"
* Rename Minion -> Node
* Add UID and Annotations on Metadata
* Treat lists differently from resources
* Remove ContainerManifest
@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Sep 19, 2014

@smarterclayton @erictune @lavalamp

Trying to make this concrete.

Standalone Pod Template

From #1225:

type PodTemplate struct {
    ObjectType `json:",inline" yaml:",inline"`
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodSpec `json:"spec,omitempty" yaml:"spec,omitempty"`
}

It should also have a Status PodTemplateStatus, for consistency with all other API objects. I could imagine recording status data like timestamp of last pod created (e.g., if we wanted to put a TTL on template objects).

There is the question of whether we want metadata like labels and annotations to come from the template or to be provided at pod instantiation time or both. Taking metadata from the template is easier to use and more secure. Providing metadata at instantiation time would allow more flexible template reuse. I'm going to declare that flexible template is a problem for the config system, not the PodTemplate, so I recommend we should take metadata from the PodTemplate.

The Metadata in the struct above is the PodTemplate's metadata. Typically, the pods created from the template will have the same labels and at least some of the same annotations, but may have additional annotations, such as to record the template from which they were created. However, for cleanliness (and flexibility, also), I recommend a separate field for pod metadata, PodMetadata ObjectMetadata.

I could also foresee us adding more fields in the future, such as authorization info, TTL, etc.

Therefore, I propose a PodTemplateSpec, which includes PodMetadata, PodSpec, and whatever other desired state fields we need.

type PodTemplateSpec struct {
        // Metadata of the pods created from this template.
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodSpec `json:"spec,omitempty" yaml:"spec,omitempty"`
}

type PodTemplate struct {
    ObjectType `json:",inline" yaml:",inline"`
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodTemplateSpec `json:"spec,omitempty" yaml:"spec,omitempty"`

    // Status represents the current information about a PodTemplate. 
    Status PodTemplateStatus `json:"status,omitempty" yaml:"status,omitempty"`
}

Bulk Pod Creation

By themselves, PodTemplates don't do anything. They are there to be used. I propose to extend POST /pods with 2 URL parameters:

  1. number=<int>: The number of pods to create. When number > 1, the Name, if provided, is treated as a prefix, to which some uniquifying characters are appended for each pod. If Name is not provided, it is auto-generated according to our general approach to this (e.g., autosetName=true might be necessary).
  2. template=<reference to PodTemplate>: Take the pod's metadata and spec from the specified PodTemplate rather than from the json payload. If the specified PodTemplate doesn't exist, that's an error. The client should be able to use resourceVersion preconditions to ensure they're using a sufficiently up-to-date PodTemplate.

More about the format of object references below.

Replication Controller

Currently (even in the v1beta3 proposal), ReplicationControllerSpec contains an inline Template PodTemplate. It's awkward to nest a full-fledged object in another object, so at minimum this should be PodTemplateSpec instead.

There are 3 alternative approaches to using a PodTemplate in ReplicationController:

  1. Use a POST URL parameter template=<reference to PodTemplate>, similar to pod creation, which would be copied into the PodTemplateSpec in the ReplicationControllerSpec at creation time.
  2. Replace the inline PodTemplateSpec with a reference to a PodTemplate in the ReplicationControllerSpec.
  3. Support both (i.e., one of) the inline PodTemplateSpec and reference to a PodTemplate, the former for simplicity and the latter for all the other reasons we'd like to do this. We could also support (1) in this case.

In order to produce the decoupling and security properties I was looking for when I proposed this issue, the replication controller service needs to be able to utilize the template at pod creation time rather than at the time the replication controller is created. Therefore, the ReplicationControllerSpec needs a reference to the PodTemplate. This has the disadvantage of creating a hard dependency between two objects -- the replication controller could fail if its pod template were deleted -- but we could disallow deletion of PodTemplates that were in use. We already have another creation-order dependency -- services must be created before their clients -- so that wouldn't be a new issue.

I'm tempted to recommend (3), so we could support both simple and more sophisticated use cases, but (A) I'm concerned that inline PodTemplateSpecs in ReplicationControllerSpec will create problems down the road for auth and for API refactoring and (B) kubecfg could paper over the complexity of dealing with multiple objects for now and a Real Config solution or higher-level API should be able to deal with it later.

So, I recommend (2): PodTemplate by reference only.

Inter-object references

This could (and probably should) be forked into its own issue if there's a lot of debate.

We don't currently have any cross-references between objects in our API. We just have indirect references via label selectors.

Possible options:

  1. Label selector.
  2. UID.
  3. JSON of identifying metadata: Kind, Namespace, Name.
  4. Partial object URL (e.g., path only, or path only without version).
  5. Full object URL.
  6. All of the above.
  7. Something else?

Using label selectors would require adding a unique label to facilitate unique references, which is sort of contrary to what labels are for, or a non-label tie-breaking field to select the correct one from the set. Additionally, the consistency model would be more complex -- after adding a new template, users would want to ensure that the replication controller would use it before performing an action that would cause new pods to be created, such as killing pods or increasing the replica count. This seems overly complex for only a small benefit.

UID has the problems that it isn't even indexed currently, would be hard for users to reason about, and couldn't be specified without additional communication with the apiserver and processing in the client. In particular, it would be hostile to configuration.

JSON would require another encoding for URL parameters.

Therefore, I suggest consistency with API object references from outside the system, so either (4) or (5). The reason to not use (5) is because the domain name and version are not necessarily stable (esp. if we replicate and/or self-host apiserver), so I recommend (4), path without API version. This form would be used both in URL parameters (pod creation from template) and in object fields (replication controller). This form also happens to be the most concise.

@smarterclayton

This comment has been minimized.

Contributor

smarterclayton commented Sep 24, 2014

Re: references, I agree with reasoning about 1, 2, 3, and 5. I would also agree with 4 as better than the alternatives. One problem with 4 is when we rename resources (minions -> nodes).

Re: templates, Some potential problems that could crop up with referenced pods:

  1. How does the replication controller validate that the provided pod template is safe if it can't read the template? We've got a few cases of that - label selector of controller needs to select the pods it creates, and RestartPolicy=always is the only allowed type. Does the replication controller need a new API endpoint of GET /pods that it can use as a validating oracle (return 1 if this attribute matches this label selector)?

  2. If the problem is replication controllers being able to create arbitrary pods, couldn't we also address problem by having replication controller controllers (the code that creates pods) pass a reference to the controller resource (/resourceControllers/1), and have the /pods endpoint handle reading the template out of the replication controller?

    Admittedly that means that the pods endpoint has to be able to decompose a resourceControllers response, and access it, which opens the door to other forms of injection style attacks. But it could also mean that any object which has a field "Spec" with type "PodTemplateSpec" can clone pods (assuming that we have a general solution for doing a reference -> endpoint GET, which I'm not positive we're at yet).

@kargakis

This comment has been minimized.

Member

kargakis commented Dec 16, 2016

so it appears I can create pod templates, but If I cannot use them in pods, what's their use?

DaemonSets and StatefulSets may use PodTemplates as a way of retaining history thus enabling rollbacks

@smarterclayton

This comment has been minimized.

Contributor

smarterclayton commented Dec 16, 2016

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Jun 22, 2017

An alternative to fully moving the pod template out and replacing it with a reference to an external resource:

Create a pod template subresource for all the controllers (and even for CRDs eventually).

This would address the following use cases:

  • Create an API endpoint that could have different permissions
  • Create a uniform polymorphic API for pod template mutations, which could be used by kubectl set commands, for example
  • Facilitate rollouts of new ConfigMaps (#22368) by making it easier to update references from controller pod templates
  • Facilitate rollouts upon PodPreset changes, by enabling it to target the pod template subresource rather than pods directly

@pmorie @pwittrock @erictune

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Jun 30, 2017

If we had API support for apply (#17333), PodPreset could mutate the template spec without changing the user's intent.

@fejta-bot

This comment has been minimized.

fejta-bot commented Dec 31, 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Jan 23, 2018

/remove-lifecycle stale
/lifecycle frozen

@enisoc

This comment has been minimized.

Member

enisoc commented Apr 5, 2018

I'm not sure if admission plugins were already discussed as part of the motivation for potentially making use of the top-level PodTemplate resource, but here is a concrete use case where such a design would have been helpful:

#61886 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment