Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong allocation of MIPS on source and target Host after VM migration #247

Closed
ehsankhodayar opened this issue Jul 31, 2020 · 17 comments · Fixed by #248
Closed

Wrong allocation of MIPS on source and target Host after VM migration #247

ehsankhodayar opened this issue Jul 31, 2020 · 17 comments · Fixed by #248
Assignees
Labels
bug critical A critical bug

Comments

@ehsankhodayar
Copy link

ehsankhodayar commented Jul 31, 2020

When a new migration map is returned from the getOptimizedAllocationMap() method, the DatacenterSimple class calls the requestVmMigration() method to send the VM_MIGRATE event for itself to finish the VM migration with a delay. However, before sending the event the requestVmMigration() method calls the targetHost.addMigratingInVm(sourceVm) to add the VM in the migrationIn list of the destination host.

The addMigratingInVm(sourceVm) method rather than just adding the VM to the migrationIn list of the destination host, it tries to allocate resources for the new incoming VM. That is intended to reserve the requested resources for the VM, but does not create the VM yet. When the data center receives the event, it requests the destination host to allocate resources for the new incoming VM again. The destination host does not consider the resources previously reserved for that VM, and here is where the conflict occurs.

I have spent lots of time to find this problem because I saw a lot of warning with this message "Creation of VM x on host x failed" during the migration process. Firstly, I thought that this is the problem of my own VmAllocationPolicy. But after lots of hours, I diagnosed the problem as mentioned above.

For example, let consider a simple example of a data center with 2 hosts each with 4 PEs and also a VM with 2 Pes. Now, after some time, the VM on the first host is selected for migrating to the second host. At the moment, the second host has just 2 free PEs.

As I mentioned above, the data center firstly adds the VM to the migratingIn list of the second Host by calling the addMigratingInVm() on the second host. That host allocates some resources for that VM (2 PEs in this example). Next, the data center sends the VM_MIGRATE event for itself.
Finally, when the data center receives the event and asks the destination host to create that VM, the destination host is not able to allocate any resource to that VM, because some resources were allocated in the past.

My CloudSim Plus versing is 5.4.1

@ehsankhodayar
Copy link
Author

ehsankhodayar commented Jul 31, 2020

I solved the problem by adding the destroyVmInternal(vm); instruction to the removeMigratingInVm(final Vm vm) method in the class HostSimple. I hope it does not make another problem in the other states of the program.

@manoelcampos
Copy link
Collaborator

Hello @ehsankhodayar.
Thanks for your contribution. I've been really busy with my PhD.
Could you send a Pull Request?
You can update your fork following the instructions here.

@ehsankhodayar
Copy link
Author

Hi @manoelcampos .
Thanks for your response. At the moment I'm also so busy because of M.S.c and I'm implementing a heavy simulation with this useful and perfect toolkit. I'll do it as soon as possible.

@manoelcampos manoelcampos self-assigned this Aug 27, 2020
@manoelcampos manoelcampos added the in-progress Someone has started to work on the issue. The progress may be available at the dev branch. label Aug 27, 2020
@manoelcampos manoelcampos changed the title A big bug in VM migration process VM migration tries to allocate resources for a VM twice in the target Host Aug 27, 2020
@manoelcampos manoelcampos removed the in-progress Someone has started to work on the issue. The progress may be available at the dev branch. label Aug 27, 2020
@manoelcampos
Copy link
Collaborator

Hello @ehsankhodayar
I just fixed the issue. All tests are passing and I couldn't find any unexpected behaviour with my simulation scenarios.
Could you please download and compile the sources for version 5.4.4 (it's not in Maven Central yet)
and try with your simulations?

This is a tricky issue and is related to #94, which is inherited from CloudSim and I don't have time to address it in the short term.

For this current issue, I just made sure the resources are not allocated twice. When the VM starts migrating to a target Host, the resources are reserved and when the migration finishes, it's not tried to allocate resources again.

Thanks for reporting and I appreciate if you could give me some feedback.

@manoelcampos
Copy link
Collaborator

In fact, it causes a side effect: the MIPS allocation is not restored in the target Host.
If the migrating VM has 1000 MIPS for each PE, during migration, all these MIPS continue to be allocated in the source Host where the VM is running.
During migration, 10% of this capacity is used by the target host to perform the VM migration process.
After the migration process finishes, the original 1000 MIPS should be allocated to the target Host.
After the change, the allocated capacity in the target Hosts remains in 10% of the 1000 MIPS.

@manoelcampos manoelcampos reopened this Aug 27, 2020
@manoelcampos
Copy link
Collaborator

I checked that despite the Host.allocateResourcesForVm() is called when the migration starts and finishes, resources are allocated just once, since any previously allocated resources are destroyed and the new requested capacity is allocated.

I'm wondering if you are able to provide a minimal simulation example that shows the issue, which will make it easier to find the root cause.

@ehsankhodayar
Copy link
Author

ehsankhodayar commented Aug 28, 2020

Hello @manoelcampos
Thanks for your fast responses and I'm so sorry that I could not find free time to send the pull request.

In fact rather than the resource allocation problem that I mentioned firstly, I found another related issue to the migration process that I described in #250.

Now, do I have to check the new version or wait and check them after fixing the above problem?

@manoelcampos
Copy link
Collaborator

I'm reworking it and let you know when I have some news.
Could you please copy your last comment into a new issue?

@manoelcampos
Copy link
Collaborator

About this issue, the work is being done in PR #248.

@ehsankhodayar
Copy link
Author

ehsankhodayar commented Aug 28, 2020

Of course, I make a new issue with the title of Wrong resource allocation on dynamic resource utilization during VM migration (#250).

manoelcampos added a commit that referenced this issue Aug 28, 2020
- Version bump to 5.4.4

Signed-off-by: Manoel Campos <manoelcampos@gmail.com>
manoelcampos added a commit that referenced this issue Aug 28, 2020
Signed-off-by: Manoel Campos <manoelcampos@gmail.com>
@manoelcampos manoelcampos changed the title VM migration tries to allocate resources for a VM twice in the target Host Wrong allocation of MIPS on source and target Host after VM migration Aug 28, 2020
@manoelcampos
Copy link
Collaborator

manoelcampos commented Aug 28, 2020

At least, the issue isn't that the MIPS capacity are allocated twice, but are allocated incorrectly for the source and target Host after VM migration. You can check the updated MigrationExample1 at master branch , where we have 3 Hosts and 4 VM. The configuration of Hosts that will participate in the migration process is as follows:

  • Host 1/DC 1 with 1000 MIPS x 5 PEs (5000 total MIPS)
  • Host 2/DC 1 with 1000 MIPS x 5 PEs (5000 total MIPS)

The initial allocation for VMs in those Hosts are (every VM requests 1000 MIPS by PE):

  • 0.10: Vm 1 in Host 1: total allocated 2000 MIPs (divided by 2 PEs)
  • 0.10: Vm 2 in Host 1: total allocated 2000 MIPs (divided by 2 PEs)
  • 0.10: Vm 3 in Host 2 total allocated 1000 MIPs (divided by 1 PEs)

The initial total allocated MIPS capacity for those Hosts are:

  • Host 1 allocated 4000 MIPS from 5000 capacity
  • Host 2 allocated 1000 MIPS from 5000 capacity

VM 1 migration from Host 1 to Host 2 starts at time 2.10. At this time the MIPS allocation for that VM in the target Host is as follows:

  • Vm 1 in Host 2 total allocated 200 MIPs (divided by 2 PEs)

It is 10% of the MIPS requested by the VM and represents the migration overhead imposed over the target Host.

The current allocation for the source and target Hosts when the migration starts is:

  • Host 1 allocated 4000 MIPS from 5000 capacity - OK (it doesn't change in the source host during migration, because the VM is still running there)
  • Host 2 allocated 3000 MIPS from 5000 capacity - OK (the capacity for the migrating VM was reserved)

However, the issue happens when the migration finishes at time 12.10, where the total allocated MIPS capacity on the source and target Hosts don't match the expected values. The allocated MIPS should have been reduced on the source Host 1 and increased on target Host 2.

The actual and expected allocated MIPS after migration are as follows:

Host Actual Allocation Expected Allocation
Host 1 4222.22 MIPS 2000 MIPS
Host 2 1200 MIPS 3000 MIPS
  • Host 1: the allocation should have reduced to 2000 since VM 1 was migrated, but it increases about 10%
  • Host 2: the allocation should be kept in 3000 MIPS, since resources for VM 1 were already reserved, but it was reduced

manoelcampos added a commit that referenced this issue Aug 28, 2020
Signed-off-by: Manoel Campos <manoelcampos@gmail.com>
manoelcampos added a commit that referenced this issue Aug 28, 2020
This reverts commit 4452eb9.
manoelcampos added a commit that referenced this issue Aug 28, 2020
Signed-off-by: Manoel Campos <manoelcampos@gmail.com>
manoelcampos added a commit that referenced this issue Aug 28, 2020
- The new VM migration listeners introduced in #249
  helped find the root cause.

Signed-off-by: Manoel Campos <manoelcampos@gmail.com>
manoelcampos added a commit that referenced this issue Aug 28, 2020
- The new VM migration listeners introduced in #249
  helped find the root cause.

Signed-off-by: Manoel Campos <manoelcampos@gmail.com>
manoelcampos added a commit that referenced this issue Aug 28, 2020
- The new VM migration listeners introduced in #249 helped find the root cause.

Signed-off-by: Manoel Campos <manoelcampos@gmail.com>
@manoelcampos manoelcampos added the critical A critical bug label Aug 28, 2020
@manoelcampos
Copy link
Collaborator

@ehsankhodayar

As I said, this issue occurs due to the messy code inherited from CloudSim, as thoroughly described in #94.
After introducing some new features (see #249 and #251) to help to track the issue down, the solution was simple, as you can check in PR #248.

Thanks for taking the time to report and providing such helpful comments.

@manoelcampos
Copy link
Collaborator

Could you please try the new 5.5.0 release at master branch and let me know how it goes?
Thanks.

@ehsankhodayar
Copy link
Author

ehsankhodayar commented Aug 28, 2020

Thank you very much for your new updates. I tested the new version and spend a lot of time by the Intellij IDEA debugger and traced the code multiple times. Fortunately, I understood that the problem is completely fixed. However, at first, I became a little confused and I was getting a wrong decision but totally a the end I understood that the problem has been fixed. But I think the consideration of VM overhead is not necessary because hole the time the total capacity is reserved for the source VM and just at the finishVmMigration method in the DatacenterSimple class in just one line of the program the overhead is considered and nothing happened at that time which could be affected with this consideration. In my own research similar to Buyya2012 I just take a simple assumption for the PDM metric and it is about 10% of the total MIPS capacity of the VM during its all migrations. In the end, I should note that there is a bug in the logic of VmAllocationPolicyMigrationAbstract class due to its createTemporaryVm approach. In my own research, I completely changed this class and didn't use the createTemporaryVm method. Instead of that, I used a temporary VM list for each host and evaluate the suitability of each host for target VM with a customized isSuitableForVm method which works based on the mentioned temporary VM list. The main problem with the VmAllocationPolicyMigrationAbstract class occurs when it uses the createTemporaryVm method and after finishing the migration check-up process it does not return the temporary allocated resources to the target host. In other words, the restoreAllocation method in this class does not work correctly. for example, after some migration check-ups the available resources of checked hosts, such as Pe, will be decreased without having any VM. You can check it with the debugger before line 850 in the DatacenterSimple class when the VM1 is going to be requested for migrating out to host2 and at that time number of available PEs at the host2 is about just one Pe while it is just serving the VM3.

@ehsankhodayar
Copy link
Author

ehsankhodayar commented Aug 28, 2020

In fact, according to the last problem that I mentioned in the previous comment, an error should be occurred due to the lack of available PEs (or other resources) in the destination host for the source VM. But it is the problem of the VmSchedulerAbstract class that in line 113 it just considers the free Pe list.

@manoelcampos
Copy link
Collaborator

manoelcampos commented Aug 29, 2020

But I think the consideration of VM overhead is not necessary because hole the time the total capacity is reserved for the source VM and just at the finishVmMigration method in the DatacenterSimple class in just one line of the program the overhead is considered and nothing happened at that time which could be affected with this consideration.

I agree with that. The overhead is useless if all the requested VM capacity is reserved in advance in the target Host. But changing that will require a lot of work, which unfortunately I don't have because I'm finishing my Ph.D.
In the short and medium term, I'm just able to fix bugs. But usually they aren't tricky as this one and I can quickly fix most of the them. Changing current behaviour for this kind of issue requires a lot of time. Sorry for that.

In my own research similar to Buyya2012 I just take a simple assumption for the PDM metric and it is about 10% of the total MIPS capacity of the VM during its all migrations. In the end, I should note that there is a bug in the logic of VmAllocationPolicyMigrationAbstract class due to its createTemporaryVm approach. In my own research, I completely changed this class and didn't use the createTemporaryVm method. Instead of that, I used a temporary VM list for each host and evaluate the suitability of each host for target VM with a customized isSuitableForVm method which works based on the mentioned temporary VM list. The main problem with the VmAllocationPolicyMigrationAbstract class occurs when it uses the createTemporaryVm method and after finishing the migration check-up process it does not return the temporary allocated resources to the target host. In other words, the restoreAllocation method in this class does not work correctly.

I totally agree with that too. The entire process of evaluating a new VM placement, and methods such as createTemporaryVm() and restoreAllocation() are a complete mess inherited from CloudSim. You can check how complex, wrong and buggy this approach is by taking a look at issue #94. But fixing that requires huge time and effort. It's needed to completely rewrite this VM migration process. This #94 issue is being around since 2017. I've already fixed tons of issues. However, I'm maintaining the project almost alone and haven't been able to focus on that issue. This way, you may have an idea of how messy this part of the code is and how complex it's to fix that. In commit 5050318, I included some TODO notes that give an overview of the issues with this migration process (besides other TODOs around the code).

for example, after some migration check-ups the available resources of checked hosts, such as Pe, will be decreased without having any VM. You can check it with the debugger before line 850 in the DatacenterSimple class when the VM1 is going to be requested for migrating out to host2 and at that time number of available PEs at the host2 is about just one Pe while it is just serving the VM3.

I didn't get it. Are you able to provide an example such as the MigrationExample1 that shows the issue?

@ehsankhodayar
Copy link
Author

Thanks a lot for all of your responses. I can understand because I'm currently in the same situation for my thesis. We can make it fine in the future.

I didn't get it. Are you able to provide an example such as the MigrationExample1 that shows the issue?

yes, it is simple. just before line 850 in the DatacenterSimple class when the VM1 is going to be requested for migrating out to host2 check the number of available PEs of host2 and then you can get it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug critical A critical bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants