Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnt-instance migrate endless copying memory #950

Closed
Ganeti-Issues-Migrator opened this issue Jun 24, 2017 · 18 comments · Fixed by #1529
Closed

gnt-instance migrate endless copying memory #950

Ganeti-Issues-Migrator opened this issue Jun 24, 2017 · 18 comments · Fixed by #1529

Comments

@Ganeti-Issues-Migrator
Copy link

Originally reported of Google Code with ID 894.

What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".

<b>What distribution are you using?</b>
# gnt-cluster --version
gnt-cluster (ganeti v2.11.3) 2.11.3
# gnt-cluster version
Software version: 2.11.3
Internode protocol: 2110000
Configuration format: 2110000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.11.3
# hspace --version
hspace (ganeti) version v2.11.3
compiled with ghc 7.4
running on linux x86_64
# cat /etc/debian_version
7.6
# apt-cache policy ganeti
ganeti:
  Installed: 2.11.3-2~bpo70+1
  Candidate: 2.11.3-2~bpo70+1
  Package pin: 2.11.3-2~bpo70+1
  Version table:
 *** 2.11.3-2~bpo70+1 990
        100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages
        100 /var/lib/dpkg/status
     2.10.5-1~bpo70+1 990
        100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages
     2.9.5-1~bpo70+1 990
        100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages


<b>What steps will reproduce the problem?</b>
gnt-instance migrate -f vmX


<b>What is the expected output? What do you see instead?</b>
- Tue Jul 22 20:07:41 2014 * memory transfer progress: 600.64 %
- Tue Jul 22 20:07:53 2014 * memory transfer progress: 618.39 % 
+ "Timeout" which cancels the migration if the memory transfer progress reaches a certain limit


<b>Please provide any additional information below.</b>
The VM has got a gaming server running which seams to be modifying to much memory.

Originally added on 2014-07-22 18:15:38 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

-- Empty comment --

Originally added on 2014-07-24 16:50:02 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

Hi!

This is not actually a problem of Ganeti, but a common problem with live migrations. I guess the instance you are moving is very busy and that's why the hypervisor is not fast enough to copy the memory delta's fast enough before it changes state again. As far as I remember there are hypervisor parameter to tune that.

Fixing this as invalid as it's nothing Ganeti can do anything about.

Helga

Originally added on 2014-07-25 07:37:21 +0000 UTC.

Changed State: Invalid

@Ganeti-Issues-Migrator
Copy link
Author

Well if Ganeti would abort the migration after a certain limit one could increase the migration down time and migrate it.

But this way you have a job which is locking your system until you kill the job somehow on an untidy way.

Originally added on 2014-07-25 08:12:15 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

Hm, that's a good point. I'll schedule it for 2.13 as it might be more possible to implement that after jobs became processes.

Originally added on 2014-07-25 08:14:36 +0000 UTC.

Changed State: Accepted

@Ganeti-Issues-Migrator
Copy link
Author

-- Empty comment --

Originally added on 2014-07-25 08:14:55 +0000 UTC.

Added Labels: Type-Enhancement Priority-Medium
Added to Milestone: Release2.13

@Ganeti-Issues-Migrator
Copy link
Author

we have an similar issue. instance migration hangs forever but we dont get any "memory transfer progress:" output.

using this commands the migration process works without problems:

echo 'migrate_set_capability xbzrle on' |  /usr/bin/socat STDIO UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/wserver1.monitor

echo 'migrate_set_cache_size 1024m' |  /usr/bin/socat STDIO UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/wserver1.monitor

is it possible to implement this kvm features into ganeti somehow?

Originally added on 2014-09-08 08:03:00 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

Support for the 'migrate_set_capability' HMP command has been implemented in v2.12 (commit 937ff98)

Originally added on 2014-09-08 08:46:14 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

Killing a job forcefully has been recently implemented, targeted for 2.13, see issue #938.

Originally added on 2014-09-25 13:34:55 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

Well killing the job will stop the migration,
but it should revert the changes it has done otherwise you may end up with a inconsistent cluster

Originally added on 2014-09-25 22:24:16 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

Ganeti currently doesn't support interruptible jobs (op-codes).
This would be a useful, but significant change and would need its own design.

For the KVM parameter tuning we will welcome patches.

Originally added on 2015-06-03 13:56:47 +0000 UTC.

Removed Labels: Priority-Medium
Changed State: PatchesWelcome
Added to Milestone: Unplanned
Removed from Milestone: Release2.13

@Ganeti-Issues-Migrator
Copy link
Author

Another useful situation for being able to interrupt jobs would be `replace-disks`.

Originally added on 2015-06-03 15:04:11 +0000 UTC.

@Ganeti-Issues-Migrator
Copy link
Author

As this was the first thing I found in google, you can abort migrations with this command:

echo "migrate_cancel" |  /usr/bin/socat STDIO UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/<vm-name>.monitor

see http://www.linux-kvm.org/page/Migration

Originally added on 2016-07-22 09:46:56 +0000 UTC.

@neufeind
Copy link

neufeind commented Jun 6, 2019

Great - the "migrate_cancel" worked for me. Through gnt-job there seems to have been no way to cancel the running job, as far as I see.

@kasimon
Copy link

kasimon commented Jun 6, 2019

@neufeind You can also temporarily increase the migration parameters, as can be seen here: https://ahwhattheheck.wordpress.com/2014/12/16/live-migrating-busy-vms-in-a-ganeti-cluster/

We use that a lot for VMs with heavy io load.

@bpfoley
Copy link
Contributor

bpfoley commented Jun 6, 2019 via email

@saschalucas
Copy link
Member

there is an other solution: post-copy-migration. It should be there with qemu-2.6 but works for me also with qemu-2.5 on ubuntu-16.04.

$ gnt-cluster modify -H kvm:migration_caps=postcopy-ram # (or x-postcopy-ram on qemu-2.5)

After one cycle of memory transfer (100%) run on the source node:

$ echo "migrate_start_postcopy" | socat STDIO UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/some.vm.monitor

I've observed that the migrate_start_postcopy command must timed right to not confuse ganeti. The best is to run it right after an update of ganetis migration status. The migration finshes before ganeti fires the next "info migrate". If a post-copy-migration is still running ganeti parses the status as failed.

I've tested this with ganeti-2.14 and 2.15.

In some development branch this feature was added to ganeti. But it seems never released???

@saschalucas
Copy link
Member

saschalucas commented Jun 19, 2019

postcopy migration is implemetet in commits commits 1aecc0c and 041ed56 in the attic/master branch, but never released:

it checks weather postcopy migration capability is enabled as a hypervisor parameter and if so, after two cycles of memory transfer (200%) switches to postcopy migration.

@saschalucas
Copy link
Member

just for the record: since qemu-2.11[1] the postcopy-ram capability must be set on both sides (the sender and the receiver).

More precise they must match. For example a previous migration with "migrate_set_capability postcopy-ram on" is remembered for the next migration. If the HV-parameter migration_caps has changed in the way that postcopy-ram is removed, it must be turned off explicitly for the next migration on the sender: "migrate_set_capability postcopy-ram off". Otherwise the migration will fail.

This means, regardless of the Ganeti settings, qemu's postcopy-ram state must be synchronized between sender and receiver.

[1] https://patchwork.kernel.org/patch/10202687/

@saschalucas saschalucas linked a pull request Oct 12, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants