Shared Pod Storage (e.g. Config Storage) #6923

invino4 · 2015-04-16T17:57:02Z

I am working on a Kubernetes extension that would run in its own pod but would like to store some information (e.g. configuration) that has similar requirements as the data stored in API Server. In particular it would be nice if the storage had the following properties:

Durable: Survives pod restarts, migrations, and upgrades.
Consistent: Even if there are multiple pods writing the same resource concurrently, consistency is maintained.
Addressable: When a pod starts it can find the storage allocated for its application.
Isolated: Different applications don't accidentally write over each others data even if they use the same names, paths, keys, etc.
Semi-structured: Something like JSON-store would be fine.
Service Oriented: Ideally this is provided as a simple RESTful service such that it is accessible from any operating environment and doesn't require linking any specific client library.
Timestamped: Each resource has a logical-clock timestamp (i.e. eTag) that can be used for optimistic concurrency control. Ideally without imposing any particular versioning scheme or strategy on the resources themselves.
Watch Support: It is possible to watch or be notified of updates to resources (or even small set of resources, e.g. subfolders).

Possibilities:

Use Volumes

Applications can always mount a volume and write their own storage to files. This is not ideal for a few reasons:

It requires a lot of reimplementation for each application.
Disks that survive cluster and pod restart are not a guaranteed part of all Kubernetes deployments making apps more difficult to deploy and move between Kube environments.
PD can have undesirable limitations (e.g. how many pods can mount them simultaneously).

Have applications run their own storage as a pod

Running your own etcd cluster is an obvious choice. However, running storage as a pod is actually quite tricky. Durability and consistency are difficult to provide and are often better maintained by a dedicated team that knows about backup and recovery. Durability itself requires some off pod storage which brings you back to volumes. Storage pods that are restarted need to rediscover and reattach the persistent storage used by a previous instance even though their identities are not related in any way (see Nominal Services).

API Server's Etcd Cluster

The etcd server used by the API Server provides almost all of these features with the exception of isolation. One possibility would be to expose a thin shim service at the API Server that wraps its underlying etcd cluster and re-exposes scoped portions of its namespace to pods. The shim would enforce scoping and authorization. The shim could also provide some adaption of the etcd watch interface to be more consistent with the semantics exposed by the API Server itself.

Pros: This possibility unifies the problems of durability, consistency, addressability with the API Servers almost identical requirements. Since any deployment must already solve these issues for API Server it doesn't create any additional burden. In hosted environments (e.g. GKE) the host provides Etcd and its durable storage in a way that is independent of the Kubernetes model allowing for backup, restore, and survivability to be implemented in ways that would be difficult if run directly as a pod. API Server already implements a sophisticated RESTful web service endpoint with authentication, authorization, timestamping, and watch support. This logic could be shared by the shim without addition complexity or duplication.

Cons: Opening up the API Server's Etcd server (even if the shim correctly implements isolation) to third-party applications will create additional load on both API Server and Etcd. This would affect the responsiveness of API Server and scalability in ways that may be difficult to predict. Bugs in the shim or Etcd might expose the data from either the master or other applications to corruption or deletion.

erictune · 2015-04-16T19:19:21Z

@smarterclayton can you comment on this?

smarterclayton · 2015-04-17T01:23:19Z

Will do

On Apr 16, 2015, at 3:19 PM, Eric Tune notifications@github.com wrote:

@smarterclayton can you comment on this?

—
Reply to this email directly or view it on GitHub.

smarterclayton · 2015-04-17T02:35:57Z

We've talked about exposing "Etcd-as-a-service" - allowing clients to request an endpoint with which they can interact using etcd client tools directly, but secured and provisioned dynamically. A subset of the keyspace would be carved up for each use case and offered on a distinct endpoint. One advantage would be the ability to scale that out - for simple systems it could reuse the apiserver, for larger systems it could be sharded and decoupled.

Once volume as a service lands (persistent volumes) it should be progressively easier to do things like run gluster and use shared, secured mounts (service accounts and security contexts will let you allocate unique Unix uids for each namespace as needed, for sharing remote storage).

Single server etcd should become much easier when pds are in place - you should be easily able to run small etcd servers that are resilient to failure (perhaps not clustered) for low cost (40-50mb per?)

Trying to think of other ideas we've covered...

thockin · 2015-04-17T04:03:19Z

The nascent config API object?

On Thu, Apr 16, 2015 at 7:36 PM, Clayton Coleman notifications@github.com
wrote:

We've talked about exposing "Etcd-as-a-service" - allowing clients to
request an endpoint with which they can interact using etcd client tools
directly, but secured and provisioned dynamically. A subset of the keyspace
would be carved up for each use case and offered on a distinct endpoint.
One advantage would be the ability to scale that out - for simple systems
it could reuse the apiserver, for larger systems it could be sharded and
decoupled.

Once volume as a service lands (persistent volumes) it should be
progressively easier to do things like run gluster and use shared, secured
mounts (service accounts and security contexts will let you allocate unique
Unix uids for each namespace as needed, for sharing remote storage).

Single server etcd should become much easier when pds are in place - you
should be easily able to run small etcd servers that are resilient to
failure (perhaps not clustered) for low cost (40-50mb per?)

Trying to think of other ideas we've covered...

—
Reply to this email directly or view it on GitHub
#6923 (comment)
.

smarterclayton · 2015-04-17T04:06:49Z

I thinking would be an exact match for the etcd API, and be targeted at etcd. I feel something is wrong about mutating the etcd API to be even more generic, mostly because then people need custom clients in every language.

On Apr 17, 2015, at 12:03 AM, Tim Hockin notifications@github.com wrote:

The nascent config API object?

On Thu, Apr 16, 2015 at 7:36 PM, Clayton Coleman notifications@github.com
wrote:

We've talked about exposing "Etcd-as-a-service" - allowing clients to
request an endpoint with which they can interact using etcd client tools
directly, but secured and provisioned dynamically. A subset of the keyspace
would be carved up for each use case and offered on a distinct endpoint.
One advantage would be the ability to scale that out - for simple systems
it could reuse the apiserver, for larger systems it could be sharded and
decoupled.

Once volume as a service lands (persistent volumes) it should be
progressively easier to do things like run gluster and use shared, secured
mounts (service accounts and security contexts will let you allocate unique
Unix uids for each namespace as needed, for sharing remote storage).

Single server etcd should become much easier when pds are in place - you
should be easily able to run small etcd servers that are resilient to
failure (perhaps not clustered) for low cost (40-50mb per?)

Trying to think of other ideas we've covered...

—
Reply to this email directly or view it on GitHub
#6923 (comment)
.

—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2015-04-17T04:27:33Z

Lots of related threads: #991, #1553, #1627, #2068, #6477. If it's truly an extension of Kubernetes, we do plan to provide generic storage for API plugins (#991).

Exposing etcd directly would be problematic if we ever were to support another storage backend: #1957.

thockin · 2015-04-17T04:31:06Z

Yeah, if we expose etcd it should be as a service, not because it happens
to already be present.

On Thu, Apr 16, 2015 at 9:27 PM, Brian Grant notifications@github.com
wrote:

Lots of related threads: #991
#991, #1553
#1553, #1627
#1627, #2068
#2068, #6477
#6477. If it's
truly an extension of Kubernetes, we do plan to provide generic storage for
API plugins (#991
#991).

Exposing etcd directly would be problematic if we ever were to support
another storage backend: #1957
#1957.

—
Reply to this email directly or view it on GitHub
#6923 (comment)
.

smarterclayton · 2015-04-17T05:00:01Z

Yeah, I'm thinking an optional API endpoint either provisioned automatically for clients or available via requests.

In the long term, this is effectively the service broker pattern - request an instance of X (where X can be anything) provisioned and attached to your namespace (external service created, environment set on an RC, secrets set in the ns that grant access). You can then talk to your local service to access a remote resource without having to know anything about the implementation of that resource. Common in paas environments.

On Apr 17, 2015, at 12:31 AM, Tim Hockin notifications@github.com wrote:

Yeah, if we expose etcd it should be as a service, not because it happens
to already be present.

On Thu, Apr 16, 2015 at 9:27 PM, Brian Grant notifications@github.com
wrote:

Lots of related threads: #991
#991, #1553
#1553, #1627
#1627, #2068
#2068, #6477
#6477. If it's
truly an extension of Kubernetes, we do plan to provide generic storage for
API plugins (#991
#991).

Exposing etcd directly would be problematic if we ever were to support
another storage backend: #1957
#1957.

—
Reply to this email directly or view it on GitHub
#6923 (comment)
.

—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2015-04-17T05:35:28Z

PaaSes give me the impression that there's a hard distinction between Apps and Services -- the former are run on the PaaS while the latter are run using an underlying IaaS orchestration layer. I think it's great if we can support that, but even better if applications that would be run as Services in a PaaS environment could be run on Kubernetes. In this case, we definitely want to be able to run etcd clusters on Kubernetes, using features such as nominal services (#260).

That said, I'd like extensions/plugins to be on equal footing with core objects. I don't see a compelling reason to not allow them to use the same etcd instance (via the apiserver) so long as we apply proper per-user (not per-plugin) storage quotas and request rate limits. In my experience, users can't really do more damage than they'd do with just pods alone: out-of-control replication, crash loops, enormous (multi-megabyte) command line args or env var lists, ...

As for watch scaling, I fully expect that eventually virtually every container running in the cluster will have multiple watches active simultaneously. We're going to need a replicated fanout layer to handle that.

smarterclayton · 2015-04-17T18:02:52Z

----- Original Message -----

PaaSes give me the impression that there's a hard distinction between Apps
and Services -- the former are run on the PaaS while the latter are run
using an underlying IaaS orchestration layer. I think it's great if we can
support that, but even better if applications that would be run as Services
in a PaaS environment could be run on Kubernetes. In this case, we
definitely want to be able to run etcd clusters on Kubernetes, using
features such as nominal services (#260).

Agree - it should be easy to bridge the distinction and we should have the concepts that blur the line between

Allocate me an app/service/instance outside the cluster
Allocate me an app/service/instance inside the cluster (but I don't want to know about the details)
I want to allocate a new thing inside the cluster

The tools and interaction for something running in the cluster to all of the above should be identical - only provisioning changes.

That said, I'd like extensions/plugins to be on equal footing with core
objects. I don't see a compelling reason to not allow them to use the same
etcd instance (via the apiserver) so long as we apply proper per-user (not
per-plugin) storage quotas and request rate limits. In my experience, users
can't really do more damage than they'd do with just pods alone:
out-of-control replication, crash loops, enormous (multi-megabyte) command
line args or env var lists, ...

As for watch scaling, I fully expect that eventually virtually every
container running in the cluster will have multiple watches active
simultaneously. We're going to need a replicated fanout layer to handle
that.

Reply to this email directly or view it on GitHub:
#6923 (comment)

thockin · 2016-04-25T21:39:39Z

Closing for lack of activity and because we have config map now.

erictune added priority/support labels Apr 16, 2015

davidopp self-assigned this Apr 21, 2015

jefferai mentioned this issue May 1, 2015

Pluggable storage backends (was Support for Consul K/V storage) #1957

Closed

thockin added kind/support Categorizes issue or PR as a support question. and removed kind/support Categorizes issue or PR as a support question. priority/support labels May 19, 2015

davidopp mentioned this issue Jun 7, 2015

Upstream Kubernetes-Mesos framework #8882

Merged

4 tasks

bgrant0607 mentioned this issue Jul 24, 2015

"v2" API (API/client redesign umbrella issue) #8190

Closed

ghost removed the team/master label Aug 20, 2015

bgrant0607 added this to the v1.2-candidate milestone Sep 12, 2015

bgrant0607 unassigned davidopp Sep 12, 2015

bgrant0607 modified the milestones: next-candidate, v1.2-candidate Jan 29, 2016

bgrant0607 removed the area/app-lifecycle label Feb 8, 2016

thockin closed this as completed Apr 25, 2016

liggitt mentioned this issue Jun 19, 2024

Update opentelemetry dependencies to the latest release. #125575

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared Pod Storage (e.g. Config Storage) #6923

Shared Pod Storage (e.g. Config Storage) #6923

invino4 commented Apr 16, 2015

erictune commented Apr 16, 2015

smarterclayton commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

thockin commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

bgrant0607 commented Apr 17, 2015

thockin commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

bgrant0607 commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

thockin commented Apr 25, 2016

Shared Pod Storage (e.g. Config Storage) #6923

Shared Pod Storage (e.g. Config Storage) #6923

Comments

invino4 commented Apr 16, 2015

Possibilities:

Use Volumes

Have applications run their own storage as a pod

API Server's Etcd Cluster

erictune commented Apr 16, 2015

smarterclayton commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

thockin commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

bgrant0607 commented Apr 17, 2015

thockin commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

bgrant0607 commented Apr 17, 2015

smarterclayton commented Apr 17, 2015

thockin commented Apr 25, 2016