New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-copy migration support to LXD #4072

Merged
merged 9 commits into from Dec 5, 2017

Conversation

4 participants
@adrianreber
Contributor

adrianreber commented Dec 4, 2017

These patches add pre-copy migration support to LXD. These patches are using the existing pre-copy migration support in LXC (LXC needs the following patches lxc/lxc#1950).

Pre-copy migration is based on CRIU and CRIU uses the kernel's dirty memory tracking support:

https://www.kernel.org/doc/Documentation/vm/soft-dirty.txt
https://criu.org/Memory_changes_tracking

The pre-copy migration support defaults to off for now as there is no way to query LXC if the architecture/kernel/criu combination support dirty memory tracking. There have been discussion in adding the necessary feature checking to LXC but it is not yet implemented.

If pre-copy migration should be used it is necessary to set 'migration.pre_copy.enabled' to 'true'.

LXD will query if the destination supports pre-copy migration and only then it will start to do multiple pre-copy migration steps. The number of pre-copy iterations can be controlled with 'migration.pre_copy.max' and defaults to 10.

To avoid to do unnecessary pre-copy iterations one more parameter was introduced. With 'migration.pre_copy.pre_migrated_pages' the percentage (defaults to 70%) of pre-copied pages can be controlled if more than 'migration.pre_copy.pre_migrated_pages' have been transferred by the last pre-dump, pre-dumping will stop earlier even if 'migration.pre_copy.max' has not been reached.

This provides the user with two options to control the number of pre-copy migration steps.

@lxc-jenkins

This comment has been minimized.

Show comment
Hide comment
@lxc-jenkins

lxc-jenkins Dec 4, 2017

This pull request didn't trigger Jenkins as its author isn't in the whitelist.

An organization member must perform one of the following:

  • To have this branch tested by Jenkins, use the "ok to test" command.
  • To have a one time test done, use the "test this please" command.

Those commands are simple Github comments of the format: "jenkins: COMMAND"

lxc-jenkins commented Dec 4, 2017

This pull request didn't trigger Jenkins as its author isn't in the whitelist.

An organization member must perform one of the following:

  • To have this branch tested by Jenkins, use the "ok to test" command.
  • To have a one time test done, use the "test this please" command.

Those commands are simple Github comments of the format: "jenkins: COMMAND"

Show outdated Hide outdated doc/containers.md
@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Dec 5, 2017

Member

Other than bikeshedding about key names and missing bash completion, this looks good to me.

Member

stgraber commented Dec 5, 2017

Other than bikeshedding about key names and missing bash completion, this looks good to me.

@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Dec 5, 2017

Member

jenkins: ok to test

Member

brauner commented Dec 5, 2017

jenkins: ok to test

@stgraber

This comment has been minimized.

Show comment
Hide comment
@stgraber

stgraber Dec 5, 2017

Member

@brauner @adrianreber merge conflict

Member

stgraber commented Dec 5, 2017

@brauner @adrianreber merge conflict

adrianreber and others added some commits Nov 29, 2017

migrate: prepare for pre-copy migration
The upcoming pre-copy migration support needs additional parameters to
the Migrate() function. In order to have a cleaner interface this patch
modifies the Migrate() function to use one struct as parameter instead
of currently five (and more in the future).

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
migration: prepare for pre-copy migration (part 2)
In addition to the previous pre-copy migration prepare commit which
changed the parameters of the Migrate() function this extends the
protocol between migration source and destination to detect if
pre-copy migration should be used.

Currently pre-copy migration defaults to off and can be enabled by
setting 'migration.incremental.memory' to true. If it is 'true' and the
migration destination side also acknowledges that it supports pre-copy
migration the variable 'use_pre_dumps' is set to 'true' and the
following commits can use this variable to use pre-copy migration or not
in the upcoming patches.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
migrate: prepare for pre-copy migration (part 3)
This introduces a protocol for the rsync transfers to support a
variable number of pre-dumps. The number of pre-dumps depends on the
workload, the threshold and the number of pre-dumps.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
migrate: implement pre-copy migration
This uses the CRIU's pre-dump support to implement pre-copy migration in
LXD. To implement this on the CRIU side, CRIU uses the soft-dirty bit in
the pagemap:

 https://www.kernel.org/doc/Documentation/vm/soft-dirty.txt
 https://criu.org/Memory_changes_tracking

As long as there is no feature detection in LXC the pre-copy migration
needs to be enabled explicitly as not all architecture/kernel/criu
combinations support dirty pages tracking.

To enable pre-copy migration in LXD the following parameter needs to be set
to 'true':

 migration.incremental.memory

Currently it defaults to 'false'. As LXD does not yet know how at which
point it has done enough pre-copying the following parameter exists:

 migration.incremental.memory.iterations

This defaults to '10'. This means that LXD will do 10 pre-dumps before
doing the final dump.

Thanks to protobuf the whole pre-copy migration should be transparent
enough, that if the receiving side does not know anything about pre-copy
migration (and especially the additional rsync transfers) it just does
the non-optimized migration even if 'migration.incremental.memory' is set
to 'true'.

The following things will be improved in follow up commits:

 * read out CRIU's 'stats-dump' file to know how many pages are
   part of the previous dump
 * make pre-copy migration the default if LXC tells us the
   architecture/kernel/criu combination supports it
 * update documentation
 * update test cases

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
migrate.proto: silence protobuf compiler warning
The protobuf compiler complains that it defaults to 'proto2':

  [libprotobuf WARNING google/protobuf/compiler/parser.cc:546]
  No syntax specified for the proto file: lxd/migrate.proto.
  Please use 'syntax = "proto2";' or 'syntax = "proto3";'
  to specify a syntax version. (Defaulted to proto2 syntax.)

This just explicitly sets the default.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
migrate: older than lxc 2.0.4 will fail
Testing the older than liblxc 2.0.4 code paths does not seem to work
anymore. This adds a debug message for that code path:

 liblxc version is older than 2.0.4 and the live migration will probably fail

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
migrate: add option to specify max pre-dump iterations
This adds the configuration option to specify the maximum number of
pre-dump iterations:

 migration.incremental.memory.iterations (defaults to 10 right now)

If the remaining memory pages are not below (or above) the not-yet-existing
threshold the migration will end by doing one final dump and transfer
before restoring the container on the destination.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
migrate: finish pre-dumping if threshold reached
This adds support to stop the pre-dumping if a certain threshold has
been reached. Instead of blindly relying on the 'max' iterations this
commit makes it possible to finish the pre-dumping if enough memory
has already been migrated using the pre-copy migration optimization.

The configuration option is:

 migration.incremental.memory.goal (default to 70%)

If the current pre-dump more pages (%) migrated using pre-dumping
than defined in above variable the pre-copy migration is finished even
if the 'max' iterations has not been reached.

The lower the number the sooner the pre-copy migration will finish, but
the container downtime during migration will be longer as a result of
the time required to transfer the memory pages.

The higher the number is, the shorter the container downtime during
migration.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
doc: add "migration_pre_copy" api extension
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
@brauner

This comment has been minimized.

Show comment
Hide comment
@brauner

brauner Dec 5, 2017

Member

@stgraber updated.

Member

brauner commented Dec 5, 2017

@stgraber updated.

@stgraber stgraber merged commit 6c0baf8 into lxc:master Dec 5, 2017

3 of 5 checks passed

Testsuite Test started
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Branch target Branch target is correct
Details
Signed-off-by All commits signed-off
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment