Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod migration #3949

Open
bgrant0607 opened this issue Jan 29, 2015 · 19 comments
Open

Pod migration #3949

bgrant0607 opened this issue Jan 29, 2015 · 19 comments

Comments

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Jan 29, 2015

Filing this issue for discussion and tracking, since it has come up a number of times.

Starting with background:

Pods are scheduled, started, and eventually terminate. They are replaced with new pods by replication controller (or some other controller, once we add more controllers). That's both reality and the model. Today pods are replaced reactively, but eventually it will replace pods proactively for planned moves. We currently do not preempt pods in order to schedule other pods, and likely won't for some time.

Currently, new pods have no obvious relationship to the pods they replace. They have different names, different uids, different IP addresses, different hostnames (since we set the pod hostname to pod name), and newly initialized volumes.

Replication controllers themselves are not durable objects. They are tied to deployments. New deployments create new replication controllers. This simplifies sophisticated deployment and rollout strategies without making simple scenarios complex. Both rollout tools/components and auto-scaling will deal with groups of replication controllers.

Naming/discovery is addressed using services, DNS, and the Endpoints API. The evolution of these mechanisms is being discussed in #2585.

This is a flexible model that facilitates transparency, simplifies handling of inevitable distributed-systems scenarios, facilitates high availability, and facilitates dynamic deployment and scaling.

But the model is not without issues. The main ones are:

  1. Data durability
  2. Self-discovery
  3. Work/role assignment

Data durability is being discussed in the persistent storage proposal #3318. We will also need to address it for local storage, but local storage is less relevant to "migration", anyway, since it's not feasible to migrate. For remote storage, it will be possible to detach and reattach the devices to new pods/hosts.

Self-discovery: Pods know their IP addresses, but currently do not know the names nor IPs of services targeting them. This will be solved by the service redesign #2585 and downward API #386.

Work/role assignment: We encourage dynamic role assignment: master election, fine-grain locking, sharding, task queues, pubsub, etc. That said, some servers are "pet-like", particularly those requiring large amounts of persistent storage. Many of these are replicated and/or sharded, with application-specific clustering implementations that tie together names/addresses and persistent data. We've discussed a concept tentatively called "nominal services" #260 to stably assign names and IP addresses to individual pods, and we aim to address that in the service redesign #2585.

So, do we need "pod migration", and, if so, what should it mean? I think it minimally should mean that the replacement pod has the same hostname, IP address, and storage.

We should aim to minimize disruption for high-availability servers. We could we do, besides the things planned above?

  • Don't use the pod name as the host name. Associate a name with the IP address instead (e.g., by hashing the address). Pods created by replication controllers aren't currently predictable, so this wouldn't be a regression.
  • Migrate the pod IP address. Currently pod IP addresses are statically partitioned among hosts and are not migratable. This would likely be problematic on some cloud providers with the way we're currently configuring routing, but could be done with an overlay network.
  • Lifecycle hooks pre- and post-migration.
  • Actual live state transfer via CRIU

With respect to Kubernetes objects, the "migrated" pod would still be a new pod with a new name and uid. The orchestration of the migration would be performed by a controller -- possibly an enhanced replication controller, perhaps in collaboration with a network controller to move the address, similar to the separation of concerns in the persistent storage proposal. During the migration process, the old and new pods would coexist, and that coexistence would be visible to clients of the Kubernetes API, but the application being migrated and its clients would not need to be aware of the migration.

/cc @smarterclayton @thockin @alex-mohr

@thockin
Copy link
Member

@thockin thockin commented Jan 30, 2015

Fixing the hostname seems like an obvious change. We should consider where
else that information might pop up. In a different issue we are discussing
allowing users to expose their own pod UID and name as custom env vars.
This would still be safe as long as we don't do live migration. We solved
this internally by allocating a virtual UID at pod creation time which
travels across migrations (live or otherwise), but is allocated as a UUID.
When a migrating controller knows that a new pod is a migration it sets the
VUID of the new pod.

I don't know if we will ever really get to pervasive live migration, but I
hope so

On Thu, Jan 29, 2015 at 1:53 PM, Brian Grant notifications@github.com
wrote:

Filing this issue for discussion and tracking, since it has come up a
number of times.

Starting with background:

Pods are scheduled, started, and eventually terminate. They are replaced
with new pods by replication controller (or some other controller, once we
add more controllers). That's both reality and the model. Today pods are
replaced reactively, but eventually it will replace pods proactively for
planned moves. We currently do not preempt pods in order to schedule other
pods, and likely won't for some time.

Currently, new pods have no obvious relationship to the pods they replace.
They have different names, different uids, different IP addresses,
different hostnames (since we set the pod hostname to pod name), and newly
initialized volumes.

Replication controllers themselves are not durable objects. They are tied
to deployments. New deployments create new replication controllers. This
simplifies sophisticated deployment and rollout strategies without making
simple scenarios complex. Both rollout tools/components and auto-scaling
will deal with groups of replication controllers.

Naming/discovery is addressed using services, DNS, and the Endpoints API.
The evolution of these mechanisms is being discussed in #2585
#2585.

This is a flexible model that facilitates transparency, simplifies
handling of inevitable distributed-systems scenarios, facilitates high
availability, and facilitates dynamic deployment and scaling.

But the model is not without issues. The main ones are:

  1. Data durability
  2. Self-discovery
  3. Work/role assignment

Data durability is being discussed in the persistent storage proposal
#3318 #3318. We
will also need to address it for local storage, but local storage is less
relevant to "migration", anyway, since it's not feasible to migrate. For
remote storage, it will be possible to detach and reattach the devices to
new pods/hosts.

Self-discovery: Pods know their IP addresses, but currently do not know
the names nor IPs of services targeting them. This will be solved by the
service redesign #2585
#2585 and
downward API #386
#386.

Work/role assignment: We encourage dynamic role assignment: master
election, fine-grain locking, sharding, task queues, pubsub, etc. That
said, some servers are "pet-like", particularly those requiring large
amounts of persistent storage. Many of these are replicated and/or sharded,
with application-specific clustering implementations that tie together
names/addresses and persistent data. We've discussed a concept tentatively
called "nominal services" #260
#260 to stably
assign names and IP addresses to individual pods, and we aim to address
that in the service redesign #2585
#2585.

So, do we need "pod migration", and, if so, what should it mean? I think
it minimally should mean that the replacement pod has the same hostname, IP
address, and storage.

We should aim to minimize disruption for high-availability servers. We
could we do, besides the things planned above?

  • Don't use the pod name as the host name. Associate a name with the
    IP address instead (e.g., by hashing the address). Pods created by
    replication controllers aren't currently predictable, so this wouldn't be a
    regression.
  • Migrate the pod IP address. Currently pod IP addresses are
    statically partitioned among hosts and are not migratable. This would
    likely be problematic on some cloud providers with the way we're currently
    configuring routing, but could be done with an overlay network.
  • Lifecycle hooks pre- and post-migration.
  • Actual live state transfer via CRIU http://criu.org/Docker

With respect to Kubernetes objects, the "migrated" pod would still be a
new pod with a new name and uid. The orchestration of the migration would
be performed by a controller -- possibly an enhanced replication
controller, perhaps in collaboration with a network controller to move the
address, similar to the separation of concerns in the persistent storage
proposal. During the migration process, the old and new pods would coexist,
and that coexistence would be visible to clients of the Kubernetes API, but
the application being migrated and its clients would not need to be aware
of the migration.

/cc @smarterclayton https://github.com/smarterclayton @thockin
https://github.com/thockin @alex-mohr https://github.com/alex-mohr

Reply to this email directly or view it on GitHub
#3949.

@bgrant0607
Copy link
Member Author

@bgrant0607 bgrant0607 commented Jan 30, 2015

I'd make the requirement that anything using the Kubernetes API to introspect or manage their own pods would need to be migration-aware. They could use the post-migration hook to get the pod's new name.

Why would someone want the uid? What issue is that? (Other than #386.)

@smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Feb 26, 2015

They want the uid as a unique instance identifier. We could give them anything, but self registration into an external system is one way (I'm pod foo, serving X, here's my unique identifier)

@gaocegege
Copy link
Contributor

@gaocegege gaocegege commented Apr 19, 2017

Hi, I'm interested in pod live migration.

Is there any progress? Docker has built-in C/R operations in experimental mode, could we develop a feature based on that?

@thockin
Copy link
Member

@thockin thockin commented Apr 19, 2017

@warmchang
Copy link
Contributor

@warmchang warmchang commented Aug 18, 2017

pod live migration? I think it's not necessary to do this.
you can use service to shielding the destroy and rebirth of pod.

@ktosiek
Copy link

@ktosiek ktosiek commented Aug 18, 2017

@warmchang Live migration would let one reduce downtime for services that don't support failover and have a long startup time.
JIRA and Jenkins would be examples of such services.

You can run them under a single-instance StatefulSet (not under RC, as they do not guarantee stopping the old instance), but then you'll see those slow restarts (potentially multiple ones) on a rolling cluster reboot.

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Jan 3, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@craigbox
Copy link

@craigbox craigbox commented Jan 3, 2018

/lifecycle frozen

@bgrant0607
Copy link
Member Author

@bgrant0607 bgrant0607 commented Jan 23, 2018

/remove-lifecycle stale

@WeiZhang555
Copy link

@WeiZhang555 WeiZhang555 commented Aug 8, 2018

Hi everyone, is there any progress or more discussion around this? I am developer from kata-containers , and currently is investigating live migration for K8S+kata container.

With KataContainers, we can run K8S on bare metal and run serverless containers because each container is also a VM/server. In this design, we need to support the live migration of the VM based container for stateful workloads, one example scenario is for OS upgrade and we need to reduce breaking down time for the workloads.

I know that this can be related to a lot of thing including runtime(dockershim, crio or cri-containerd), networking and storage.
From kata-containers's perspective, do you have a preference of CRI definition for migration?

PS: I am trying to draw the data flow diagram for live migration, but want to know your ideas first, as I'm not that familiar with K8S design and implementation as you.

Thank you!

cc /@bgrant0607

@rektide
Copy link

@rektide rektide commented Jul 23, 2019

More so than migration, I'm interested in radically lowering my load's start time by using checkpoints of preinitialized instances. This would enable additional function as a service types of use cases in Kubernetes. Something more like cloning than migrating.

@manishbansal8843
Copy link

@manishbansal8843 manishbansal8843 commented Aug 7, 2019

I am also quite interested in this feature for lowering my pod start-up time. I have a service which receives un-scheduled requests in burst mode. At that time, HPA does scale up my application. However, since it takes 2-3 minutes to get my pod ready, meanwhile my requests gets timed out. Had k8s been able to load the pre-initialized pods, start up time would come down by considerable amount of time.

@lenhattan86
Copy link

@lenhattan86 lenhattan86 commented Nov 3, 2019

I am interested in this issue.
I believe that live pod migration is important too. Some users run long-term services in pods, live migration must be very useful. At lease, it needs to support storage migration.

@vutuong
Copy link

@vutuong vutuong commented Dec 19, 2019

Hi everyone, is there any active project about pod migration ? I am Master student from South Korea , and currently is investigating live migration for K8S. If there are any project about this, please let me know. I want to join it.
Why is this needed:
One of the key challenges for edge computing is keeping quality of service guarantees better than traditional cloud services while offloading services to the end user’s nearest edge server. However, when the end user moves away from the nearby edge server, the quality of service will significantly
decreases.
Therefore, efficient live migration is vital to enable the mobility of edge services in the edge computing environment.
Live migration of Docker containers is achievable by using checkpoint and restore tool ( CRIU).
This is reason why K8S should support checkpoint and restore tool or pod migration.

@sedwards
Copy link

@sedwards sedwards commented Jan 20, 2020

This page has some useful information on the subject.

https://kubernetes.io/blog/2015/07/how-did-quake-demo-from-dockercon-work/

@schrej
Copy link

@schrej schrej commented Jul 6, 2020

Hi all. I've recently chosen this issue as the topic of my master thesis. While I'll mostly be working on this for my thesis, I'd love to contribute to the project if I manage to find a good solution.

After reading a lot of code and getting a very rough grasp about the internals of kubelet I have to decide how to integrate migration into the API. As mentioned in this issue before, this is one of the hardest parts of implementing pod migration. I've come up with a few ideas and would greatly appreciate any feedback you can give or alternative suggestions if you have any.

As it's too long for a issue comment, and because you can't add inline-comments here I've outsourced it to Google Docs.
https://docs.google.com/document/d/13gjez8EACjhlQ7QxelkAVlPjzAAA2_K7-oDFBLbLP8g/edit

Thanks!

@tedyu
Copy link
Contributor

@tedyu tedyu commented Jul 7, 2020

I tried commenting on the above proposal.

It seems the handling of multi-paragraph comments is not as good as that of google doc.

Depending on other people's feedback, you can consider moving to google doc.

@schrej
Copy link

@schrej schrej commented Jul 7, 2020

I've moved the proposal to Google Docs. That should also make it easier to start a conversation as Docs should send notifications when one replies to comments.
@tedyu I've copied your (are they yours?) comments over as well and responded to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.