Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Add support for zero-copy migrations #606

Merged
merged 2 commits into from
Sep 8, 2022
Merged

Conversation

mz-pdm
Copy link
Member

@mz-pdm mz-pdm commented Aug 19, 2022

Zero-copy migrations help with migrating very large VMs (>1 TB) or
busy large VMs.

Zero-copy migrations are added using a new migration policy. Since it
is a new feature, there may be unanticipated problems so the policy is
supposed to be used only when the already available migration policies
are insufficient to migrate a VM and the user is willing to accept the
risk.

Zero-copy can be used only with parallel migrations so it requires
parallel migrations enabled in Engine config. If they are disabled in
VM/cluster configuration, they are still used if the zero-copy policy
is enabled.

Zero-copy migrations cannot be currently used with encrypted
migrations. If encrypted migrations are requested, the given policy
can still be used but zero-copy is disabled.

The policy uses longer downtimes since it is aimed at very large VMs
and it is unlikely that short downtimes are any useful for such VMs.

It is supposed that Vdsm on the host supports zero-copy migrations,
since they are used only when parallel migrations are enabled,
i.e. with cluster version 4.7 by default, and it is supposed that the
hosts run an up-to-date OS. If the host doesn’t support zero-copy
migrations then migration API violation will be logged there and
zero-copy won’t be enabled for the migration.

See also
https://ovirt.org/develop/release-management/features/virt/zerocopy-migrations.html.

Bug-Url: https://bugzilla.redhat.com/2089434

@mz-pdm
Copy link
Member Author

mz-pdm commented Aug 19, 2022

Not much tested yet, but the basic idea works.

@@ -789,7 +789,8 @@ select fn_db_add_config_value('DefaultMigrationCompression','false','general');
select fn_db_add_config_value('DefaultMigrationEncryption','false','general');

-- Keep the ids and names in sync with front end LocalizedMigrationPolicies.properties. The descriptions have been moved there
select fn_db_add_config_value_for_versions_up_to('MigrationPolicies','[{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827b"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Minimal downtime","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"abort","params":[]}]}},{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827c"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":true,"enableGuestEvents":true,"name":"Suspend workload if needed","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}},{"id":{"uuid":"a7aeedb2-8d66-4e51-bb22-32595027ce71"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Post-copy migration","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"postcopy","params":[]},{"action":"abort","params":[]}]}}]','4.7');
select fn_db_add_config_value_for_versions_up_to('MigrationPolicies','[{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827b"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Minimal downtime","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"abort","params":[]}]}},{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827c"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":true,"enableGuestEvents":true,"name":"Suspend workload if needed","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}},{"id":{"uuid":"a7aeedb2-8d66-4e51-bb22-32595027ce71"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Post-copy migration","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"postcopy","params":[]},{"action":"abort","params":[]}]}}]','4.6');
select fn_db_add_config_value_for_versions_up_to('MigrationPolicies','[{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827b"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Minimal downtime","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"abort","params":[]}]}},{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827c"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":true,"enableGuestEvents":true,"name":"Suspend workload if needed","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}},{"id":{"uuid":"a7aeedb2-8d66-4e51-bb22-32595027ce71"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Post-copy migration","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"postcopy","params":[]},{"action":"abort","params":[]}]}},{"id":{"uuid":"57237b82-b8c2-425f-b425-114b35219626"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"zerocopy":true,"name":"Very large VMs","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["500"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["700"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["1000"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["1500"]}}],"initialItems":[{"action":"setDowntime","params":["300"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}}]','4.7');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same ids?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are ids of the particular migration policies, not of the list of the migration policies.

@@ -1462,6 +1463,9 @@ select fn_db_update_default_config_value('ServerRebootTimeout', '300', '600', 'g
-- Increase the lifetime of VDS certificates from 398 to 3650 days
select fn_db_update_default_config_value('VdsCertificateValidityInDays', '398', '1827', 'general', false);

-- Add Very large VMs migration policy
select fn_db_update_default_config_value('MigrationPolicies', '[{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827b"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Minimal downtime","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"abort","params":[]}]}},{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827c"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":true,"enableGuestEvents":true,"name":"Suspend workload if needed","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}},{"id":{"uuid":"a7aeedb2-8d66-4e51-bb22-32595027ce71"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Post-copy migration","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"postcopy","params":[]},{"action":"abort","params":[]}]}}]', '[{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827b"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Minimal downtime","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"abort","params":[]}]}},{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827c"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":true,"enableGuestEvents":true,"name":"Suspend workload if needed","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}},{"id":{"uuid":"a7aeedb2-8d66-4e51-bb22-32595027ce71"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Post-copy migration","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"postcopy","params":[]},{"action":"abort","params":[]}]}},{"id":{"uuid":"57237b82-b8c2-425f-b425-114b35219626"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"zerocopy":true,"name":"Very large VMs","description":"","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["500"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["700"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["1000"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["1500"]}}],"initialItems":[{"action":"setDowntime","params":["300"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}}]', '4.7', false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing 4.7 MigrationPolicies only when they have the default value? Should thiis value be updated always? Or if this value can be customized by users, how users, who already created their own migration policies in 4.7, wil receive this new zerocopy policy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing 4.7 MigrationPolicies only when they have the default value? Should thiis value be updated always?

It would overwrite contingent users' customizations.

Or if this value can be customized by users, how users, who already created their own migration policies in 4.7, wil receive this new zerocopy policy?

I don't know. Do we have a mechanism for that? If not then I'd say that:

  • Having the new policy is not important for all users.
  • Users who change their policies take responsibility for incorporating updates from upstream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mz-pdm.

Copy link
Member

@mwperina mwperina Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this smells like bad design, there should probably be 2 options:

  • DefaultMigrationPolicies, which would include our default migration policies and which wouldn't be modifiable by users
  • MigrationPolicies, which would be modifiable by users

Having those 2 options and merging them together inside engine backend would allow us easily update internal migration policies without affecting custom migration policies.

Unfortunately with current status we would need to add a description how to add zerocopy migration policy into 4.5.3 version if you already have your own custom policies

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe there is a person on this earth that defined a custom migration policy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering what's written in the comments above, I think splitting it to DefaultMigrationPolicies and MigrationPolicies wouldn't bring a real benefit. If we were going to add more policies in the future, I'd be for reworking it some way (the option content is indeed horrible) but at the current state of matters it's not worth it.

@@ -1522,6 +1522,7 @@ public void refreshMigrationPolicies() {
selectedPolicyId = getMigrationPolicies().getSelectedItem().getId();
}

// TODO: Check/disable the new policy when parallel migrations are not enabled in Engine config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably cannot disable an item in the dropdown, but if the policy is not supported, you can remove it from the list before setting it

policies.remove(policy)
getMigrationPolicies().setItems(policies);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks for the advice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mz-pdm
Copy link
Member Author

mz-pdm commented Sep 2, 2022

Many changes, it looks working now.

@mz-pdm mz-pdm added the verified label Sep 2, 2022
Zero-copy migrations help with migrating very large VMs (>1 TB) or
busy large VMs.

Zero-copy migrations are added using a new migration policy.  Since it
is a new feature, there may be unanticipated problems so the policy is
supposed to be used only when the already available migration policies
are insufficient to migrate a VM and the user is willing to accept the
risk.

Zero-copy can be used only with parallel migrations so it requires
parallel migrations enabled in Engine config.  If they are disabled in
VM/cluster configuration, they are still used if the zero-copy policy
is enabled.

Zero-copy migrations cannot be currently used with encrypted
migrations.  If encrypted migrations are requested, the given policy
can still be used but zero-copy is disabled.

The policy uses longer downtimes since it is aimed at very large VMs
and it is unlikely that short downtimes are any useful for such VMs.

It is supposed that Vdsm on the host supports zero-copy migrations,
since they are used only when parallel migrations are enabled,
i.e. with cluster version 4.7 by default, and it is supposed that the
hosts run an up-to-date OS.  If the host doesn’t support zero-copy
migrations then migration API violation will be logged there and
zero-copy won’t be enabled for the migration.

See also
https://ovirt.org/develop/release-management/features/virt/zerocopy-migrations.html.

Bug-Url: https://bugzilla.redhat.com/2089434
@mz-pdm
Copy link
Member Author

mz-pdm commented Sep 6, 2022

Database value updated for the camelCase change.

@ljelinkova
Copy link
Contributor

/ost

@smelamud smelamud merged commit 2554965 into oVirt:master Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants