Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate the pod template from replicationController #170

Closed
bgrant0607 opened this issue Jun 19, 2014 · 78 comments
Closed

Separate the pod template from replicationController #170

bgrant0607 opened this issue Jun 19, 2014 · 78 comments
Labels
area/api Indicates an issue on api area. area/app-lifecycle kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog.

Comments

@bgrant0607
Copy link
Member

We should separate the pod template from replicationController, to make it possible to create pods from template without replicationController (e.g., for cron jobs, for deferred execution using hooks). This would also make updates cleaner.

@bgrant0607
Copy link
Member Author

If we were to remove replicationController from the core apiserver into a separate service, I'd leave the pod template in the core.

@bgrant0607 bgrant0607 changed the title Consider separating the pod template from replicationController Separate the pod template from replicationController Jul 11, 2014
@erictune
Copy link
Member

I'll take a shot at this.

@bgrant0607
@lavalamp
Please share any other thoughts on podTemplates.

Brian mentioned cron. This makes me think he wants to use podTemplates for delegation.
That is, is there some mechanism where principal A can define a /podTemplate, and then grant principal B permission to create /pods which derive from a certain /podTemplate, but which run as A. (I guess a replicationController is effectively "another principal" which A can delegate to?)

Will the delegation of power be part of the /podTemplate message, or will that be stored in some sideband ACL?
Does the PUT /pods method get extended to allow creating a /pod from a template instead of using the desiredState? Or is there a new non-REST method?

@lavalamp
Copy link
Member

PUT /pods should only take pods, IMO. We should potentially offer something that "fills out" a pod template, but I think that should work like our current replication controller, which is an external component.

@bgrant0607
Copy link
Member Author

The point of this proposal is to further narrow the responsibility and API of the replicationController to its bare essentials. The replicationController should just spawn new replicas.

Right now, essentially a full copy of the pod API is embedded in the replicationController API. As an external, independently versioned API, it would be challenging to keep synchronized with the core API. Additionally, with the replicationController creating pods by value rather than by reference, it needs to be delegated the authority to create ~arbitrary pods as ~arbitrary users (once we support multi-tenancy) -- this would mean it could do anything as anybody. This is even more of an issue once we introduce an auto-scaler layered on the replicationController.

It's very difficult to develop a signature-based approach layered on a literal pod creation API that could be made both usable and secure. OTOH, if the replicationController could only spawn instances from templates owned by the core, then it's power could be restricted. Think of this as the principle of least privilege and separation of concerns.

Including the template by reference in the replicationController API would also facilitate rollbacks to previous pod configurations. A standalone template could be used for cron and other forms of deferred execution.

Even if we were to add a more general templating/configuration-generation mechanism in the future, we can't have turtles all the way down. A pod template would be useful for spawning the config generator, among other things.

As with the current replicationController API, pods would have no relationship to the template from which they were generated other than their labels and any other provenance information we kept. Changes to the template would have no effect on pods created from it previously.

I'd be fine with a separate API endpoint for creating pods from a template, just as we have for replicationController today.

@erictune
Copy link
Member

Okay, putting together above comments and my own thoughts...

For the initial PR, we just need to have a pod template type which unambiguously and completely defines a /pod. Later PRs can extend /podTemplate as needed to support authorization of delegated use.

Here are a few examples with delegation and the types of expansion of templates that might occur:
- cron service runs a /pod, but passes to pod's environment a string identifying the datecode for this run.
- third-party auto-scaler makes more of a pod, or makes pods that request more or less resources.
- map-reduce service makes several pods, setting environment variables that control input and output file paths.
- ABTester service makes pods with two different values for an Environment variable that controls a new feature, and two different values of another Environment variable that controls a tag added to the logs of these pods (e.g. experiment_27354_mode_a)

Considerations for podTemplate:

  1. Ease and succinctness of definition of podTemplate
  2. Ease of reasoning about the security implications of giving a user permission to instantiate pods from a podTemplate
  3. Work with YAML as well as json.
  4. allow templates to generate many different kinds of pods.
    Item 4 seems much less important than 1 and 2. Therefore, this rules out a podTemplate which holds a schema definition, or jpath expression, or anything else which allows fully general manipulation of json-type data. Item 3 above further reinforces this.

Therefore, a /podTemplate will look something like this:

{ "id": "awesomePodTemplate",
  "pod": "<object exactly following /pod schema>", 
   "allowExtraEnvVars": [
     "MOTD": 
       "Today's pod brought to you by a replication controller."],
   "allowModifiedResourcesRequestsAndLimits": 1,
   "delegatedPodMakers": ["alice@example.com", "replicationcontroller@kubernetes.io"],
}

Note the specific, capability-like descriptions of allowed modifications to the /pod object.

However, the first PR will just have:

{ "id": "myPodTemplate",
  "pod": "<object exactly following /pod schema>", 
}

The /pod schema will get a new member "actsAsUser". This affects which user the pod acts as.
Initially, this will have no affect. As we add authentication (#443), the following authorization code can be added to the apiserver:

if authenticatedUser == request.pod.actAsUser { return auth.Authorized }
return auth.notAuthorized

In a later PRs, the /pod schema will be extended to have a "fromPodTemplateId" member which references the id of the /podTemplate that this /pod is modeled on. This adds an interesting twist: we can't use the user-provided name alone to identify the /podTemplate. We need to specify which user's namespace the name lies in. Maybe "actAsUser" identifies this or maybe we need a globally unique id for a podTemplate.

With that member added, the authorization check for creating a /pod would look like this:

if authenticatedUser == request.pod.actAsUser { return Authorized }
if auth.Can(authenticatedUser, auth.MakePodsFor, request.pod.actAsUser) {
    tpl := findPodTemplate(request.fromPodTemplateId)
    if tpl != nil {
      if tpl.Generates(request.pod) {
         return auth.Authorized
      }
    }
  }
}
return auth.NotAuthorized

@erictune
Copy link
Member

Other use case: pod's port can come from a range, to allow duplicate pods on the same host. Would this go in the template?

@lavalamp
Copy link
Member

I wonder if we should maybe add an "owner" field to the JSONBase, so that all objects in the system could have an owning user. If so, no need to specifically add that field to the PodTemplate.

In a later PRs, the /pod schema will be extended to have a "fromPodTemplateId" member which references the id of the /podTemplate that this /pod is modeled on. This adds an interesting twist: we can't use the user-provided name alone to identify the /podTemplate.

This could be done with a label, which is what our current replicationController does.

I think a step that should come shortly after adding PodTemplate as a resource is changing the replication controller struct to take a podTemplateID instead of a hardcoded PodTemplate.

Port shouldn't be dynamic.

May want @brendanburns to take a look at this when he gets back.

@erictune
Copy link
Member

On Thu, Jul 24, 2014 at 10:24 AM, Daniel Smith notifications@github.com
wrote:

I wonder if we should maybe add an "owner" field to the JSONBase, so that
all objects in the system could have an owning user. If so, no need to
specifically add that field to the PodTemplate.

In a later PRs, the /pod schema will be extended to have a
"fromPodTemplateId" member which references the id of the /podTemplate that
this /pod is modeled on. This adds an interesting twist: we can't use the
user-provided name alone to identify the /podTemplate.

This could be done with a label, which is what our current
replicationController does.

can a label selector select a different user's objects?

I think a step that should come shortly after adding PodTemplate as a
resource is changing the replication controller struct to take a
podTemplateID instead of a hardcoded PodTemplate.

Okay, but again the namespace/user issue is unresolved.

Port shouldn't be dynamic.

May want @brendanburns https://github.com/brendanburns to take a look
at this when he gets back.


Reply to this email directly or view it on GitHub
#170 (comment)
.

@bgrant0607
Copy link
Member Author

Thanks, @erictune .

First of all, while security is part of the motivation for this, I'd drop all user / identity / auth / delegation stuff until we figure out auth[nz] more generally. That said, we'll want to namespace label keys by project implicitly by default to prevent conflicts and overlap across users.

Second, we should leave out most/all forms of substitution and computation more generally. A more general config mechanism is a separate issue. I was thinking of take what replicationController supports today and moving it to a separate object, which we might want to garbage collect after some amount of time in the case that it hasn't been used.

However, I think it's not too early to think about the override model, and whether we want one eventually, even though we wouldn't implement it initially. Env. vars. and resources are good examples.

It would be useful to think about how splitting out the template (and overrides) would interact with updates driven by declarative configuration. Does the replicationController change to a new template, or does one update its template? How does one update pods controlled by the replicationController? Some ideas were discussed in #492 .

Duplicate pods on the same host: We implement IP per pod, so no port allocation range is necessary.

fromPodTemplateId: The template must behave as a cookie cutter -- once a pod is created from a template, it has no relationship to the template. The template may be changed or deleted without affecting pods created from it, and pods created from it may be modified independently. We probably do want to record provenance information for debugging and/or auditing, though. It would include information like the template id, time, replication controller id (if created by one), user, etc.

@smarterclayton
Copy link
Contributor

@bgrant0607 Can you describe config generators a bit more as mentioned in 146? Haven't heard you mention that yet, but I suspect it matches use cases we are looking to solve as well.

@bgrant0607
Copy link
Member Author

Created #1007 to start the broader config discussion.

#503 contains another example that could use the pod template: job controller.

I'd like a bulk-creation operation to go with the pod template, so that a replication controller could send one operation to create N pods. This will eventually be important for performance, gang scheduling, usage analytics, etc.

@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. kind/design Categorizes issue or PR as related to design. labels Sep 19, 2014
@bgrant0607
Copy link
Member Author

@smarterclayton @erictune @lavalamp

Trying to make this concrete.

Standalone Pod Template

From #1225:

type PodTemplate struct {
    ObjectType `json:",inline" yaml:",inline"`
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodSpec `json:"spec,omitempty" yaml:"spec,omitempty"`
}

It should also have a Status PodTemplateStatus, for consistency with all other API objects. I could imagine recording status data like timestamp of last pod created (e.g., if we wanted to put a TTL on template objects).

There is the question of whether we want metadata like labels and annotations to come from the template or to be provided at pod instantiation time or both. Taking metadata from the template is easier to use and more secure. Providing metadata at instantiation time would allow more flexible template reuse. I'm going to declare that flexible template is a problem for the config system, not the PodTemplate, so I recommend we should take metadata from the PodTemplate.

The Metadata in the struct above is the PodTemplate's metadata. Typically, the pods created from the template will have the same labels and at least some of the same annotations, but may have additional annotations, such as to record the template from which they were created. However, for cleanliness (and flexibility, also), I recommend a separate field for pod metadata, PodMetadata ObjectMetadata.

I could also foresee us adding more fields in the future, such as authorization info, TTL, etc.

Therefore, I propose a PodTemplateSpec, which includes PodMetadata, PodSpec, and whatever other desired state fields we need.

type PodTemplateSpec struct {
        // Metadata of the pods created from this template.
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodSpec `json:"spec,omitempty" yaml:"spec,omitempty"`
}

type PodTemplate struct {
    ObjectType `json:",inline" yaml:",inline"`
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodTemplateSpec `json:"spec,omitempty" yaml:"spec,omitempty"`

    // Status represents the current information about a PodTemplate. 
    Status PodTemplateStatus `json:"status,omitempty" yaml:"status,omitempty"`
}

Bulk Pod Creation

By themselves, PodTemplates don't do anything. They are there to be used. I propose to extend POST /pods with 2 URL parameters:

  1. number=<int>: The number of pods to create. When number > 1, the Name, if provided, is treated as a prefix, to which some uniquifying characters are appended for each pod. If Name is not provided, it is auto-generated according to our general approach to this (e.g., autosetName=true might be necessary).
  2. template=<reference to PodTemplate>: Take the pod's metadata and spec from the specified PodTemplate rather than from the json payload. If the specified PodTemplate doesn't exist, that's an error. The client should be able to use resourceVersion preconditions to ensure they're using a sufficiently up-to-date PodTemplate.

More about the format of object references below.

Replication Controller

Currently (even in the v1beta3 proposal), ReplicationControllerSpec contains an inline Template PodTemplate. It's awkward to nest a full-fledged object in another object, so at minimum this should be PodTemplateSpec instead.

There are 3 alternative approaches to using a PodTemplate in ReplicationController:

  1. Use a POST URL parameter template=<reference to PodTemplate>, similar to pod creation, which would be copied into the PodTemplateSpec in the ReplicationControllerSpec at creation time.
  2. Replace the inline PodTemplateSpec with a reference to a PodTemplate in the ReplicationControllerSpec.
  3. Support both (i.e., one of) the inline PodTemplateSpec and reference to a PodTemplate, the former for simplicity and the latter for all the other reasons we'd like to do this. We could also support (1) in this case.

In order to produce the decoupling and security properties I was looking for when I proposed this issue, the replication controller service needs to be able to utilize the template at pod creation time rather than at the time the replication controller is created. Therefore, the ReplicationControllerSpec needs a reference to the PodTemplate. This has the disadvantage of creating a hard dependency between two objects -- the replication controller could fail if its pod template were deleted -- but we could disallow deletion of PodTemplates that were in use. We already have another creation-order dependency -- services must be created before their clients -- so that wouldn't be a new issue.

I'm tempted to recommend (3), so we could support both simple and more sophisticated use cases, but (A) I'm concerned that inline PodTemplateSpecs in ReplicationControllerSpec will create problems down the road for auth and for API refactoring and (B) kubecfg could paper over the complexity of dealing with multiple objects for now and a Real Config solution or higher-level API should be able to deal with it later.

So, I recommend (2): PodTemplate by reference only.

Inter-object references

This could (and probably should) be forked into its own issue if there's a lot of debate.

We don't currently have any cross-references between objects in our API. We just have indirect references via label selectors.

Possible options:

  1. Label selector.
  2. UID.
  3. JSON of identifying metadata: Kind, Namespace, Name.
  4. Partial object URL (e.g., path only, or path only without version).
  5. Full object URL.
  6. All of the above.
  7. Something else?

Using label selectors would require adding a unique label to facilitate unique references, which is sort of contrary to what labels are for, or a non-label tie-breaking field to select the correct one from the set. Additionally, the consistency model would be more complex -- after adding a new template, users would want to ensure that the replication controller would use it before performing an action that would cause new pods to be created, such as killing pods or increasing the replica count. This seems overly complex for only a small benefit.

UID has the problems that it isn't even indexed currently, would be hard for users to reason about, and couldn't be specified without additional communication with the apiserver and processing in the client. In particular, it would be hostile to configuration.

JSON would require another encoding for URL parameters.

Therefore, I suggest consistency with API object references from outside the system, so either (4) or (5). The reason to not use (5) is because the domain name and version are not necessarily stable (esp. if we replicate and/or self-host apiserver), so I recommend (4), path without API version. This form would be used both in URL parameters (pod creation from template) and in object fields (replication controller). This form also happens to be the most concise.

@smarterclayton
Copy link
Contributor

Re: references, I agree with reasoning about 1, 2, 3, and 5. I would also agree with 4 as better than the alternatives. One problem with 4 is when we rename resources (minions -> nodes).

Re: templates, Some potential problems that could crop up with referenced pods:

  1. How does the replication controller validate that the provided pod template is safe if it can't read the template? We've got a few cases of that - label selector of controller needs to select the pods it creates, and RestartPolicy=always is the only allowed type. Does the replication controller need a new API endpoint of GET /pods that it can use as a validating oracle (return 1 if this attribute matches this label selector)?

  2. If the problem is replication controllers being able to create arbitrary pods, couldn't we also address problem by having replication controller controllers (the code that creates pods) pass a reference to the controller resource (/resourceControllers/1), and have the /pods endpoint handle reading the template out of the replication controller?

    Admittedly that means that the pods endpoint has to be able to decompose a resourceControllers response, and access it, which opens the door to other forms of injection style attacks. But it could also mean that any object which has a field "Spec" with type "PodTemplateSpec" can clone pods (assuming that we have a general solution for doing a reference -> endpoint GET, which I'm not positive we're at yet).

@bgrant0607
Copy link
Member Author

An alternative to fully moving the pod template out and replacing it with a reference to an external resource:

Create a pod template subresource for all the controllers (and even for CRDs eventually).

This would address the following use cases:

  • Create an API endpoint that could have different permissions
  • Create a uniform polymorphic API for pod template mutations, which could be used by kubectl set commands, for example
  • Facilitate rollouts of new ConfigMaps (Facilitate ConfigMap rollouts / management #22368) by making it easier to update references from controller pod templates
  • Facilitate rollouts upon PodPreset changes, by enabling it to target the pod template subresource rather than pods directly

@pmorie @pwittrock @erictune

@bgrant0607
Copy link
Member Author

If we had API support for apply (#17333), PodPreset could mutate the template spec without changing the user's intent.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017
@bgrant0607
Copy link
Member Author

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 23, 2018
@spiffxp spiffxp removed the triaged label Mar 16, 2018
@enisoc
Copy link
Member

enisoc commented Apr 5, 2018

I'm not sure if admission plugins were already discussed as part of the motivation for potentially making use of the top-level PodTemplate resource, but here is a concrete use case where such a design would have been helpful:

#61886 (comment)

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/enhancement labels Jun 5, 2018
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
backend/awsvpc: allow RBAC for instances
@MadhavJivrajani
Copy link
Contributor

/remove-kind design
/kind feature

kind/design will soon be removed from k/k in favor of kind/feature. Relevant discussion can be found here: kubernetes/community#5641

@k8s-ci-robot k8s-ci-robot removed the kind/design Categorizes issue or PR as related to design. label Jun 29, 2021
@bgrant0607
Copy link
Member Author

This was the original design (sunit prototype = pod template, replicate = replica set):
https://twitter.com/bgrant0607/status/1121058263959654400

But it can be addressed through a combination of type imports and duck typing.

I don't think this will happen, so closing.
/close

@k8s-ci-robot
Copy link
Contributor

@bgrant0607: Closing this issue.

In response to this:

This was the original design (sunit prototype = pod template, replicate = replica set):
https://twitter.com/bgrant0607/status/1121058263959654400

But it can be addressed through a combination of type imports and duck typing.

I don't think this will happen, so closing.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/app-lifecycle kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/service-catalog Categorizes an issue or PR as relevant to SIG Service Catalog.
Projects
None yet
Development

No branches or pull requests