New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault Injection for Hosts' CPUs and Recovery mechanism which dynamically removes failed PEs from VMs and starts VM's snapshots when a Host or VM completely fails #81

Closed
RaysaOliveira opened this Issue May 3, 2017 · 0 comments

Comments

Projects
None yet
2 participants
@RaysaOliveira
Contributor

RaysaOliveira commented May 3, 2017

Feature

Implements a mechanism to inject random failures into Hosts' PEs (CPUs).

The Host Fault Injection class enables injecting random failures into Hosts PEs. It uses a given Pseudo Random Number Generator (PRNG) following some statistical distribution to generate times of failure. PRNGs such as the new PoissonDistr can be used for this purpose. Internally, it's created other PRNG to define how many Host PEs will fail when a fault is generated.

The HostFaultInjection class works as a fault injector for the Hosts of a given Datacenter. The mechanism considers the following situations.

Removal of failed PEs from VMs

If the number of working PEs is lower than the total required by all Vms, then failed PEs will be removed from running VMs, using a round-robin algorithm, one PE by Vm at a time. If all PEs are removed from a VM, such a VM is destroyed.

Management of affected VMs

Generated failures may or may not affect running VMs. If the number of working PEs remaining into a Host is higher than the total PEs required by all VMs, the failure will not cause any side effect.
If there are N free PEs into the Host and the number of failed PEs is less or equal to N, no VM will be affected.

If no VMs is affect by the failure, failed Host PEs are just set to Pe.Status.FAILED and they will be unavailable. If new VMs are tried to be placed into that Host, such PEs will not be available for them.

Start a VM snapshot (clone) when all VMs from the same broker fail

If all PEs of a Host fail, all its VMs are immediately destroyed. When all VMs from a given broker fail (doesn't matter in which Host they were), a clone for the last failed VM is created. This cloning process copies previous Cloudlets which were executing or waiting into the failed VM to the cloned VM. By cloning a VM, it is simulated starting a snapshot of that VM, as in a real cloud infrastructure.

Increase completion time for cloudlets affected by removed VM PEs

Consider a VM has N PEs. If some of its PEs fail and there were Cloudlets using these PEs, Cloudlets will continue to be executed but should spend more time to finish.
- Example: the VM has 2 PEs and a Cloudlet is using all of them. If one PE fails, the Cloudlet will spend the double of the time to finish using just the remaining PE.

VM Migration when Host is overloaded because of failures

If failure of PEs into a Host increase the percentage of CPU usage, which may cause Host overload, using a VmAllocationPolicyMigration should make VMs to be migrated to another Host.

Implementation Details

Using the HostFaultInject.addVmCloner() method, a VmCloner object may be set to define how to clone a given VM when all PEs it was using fail. Setting a VmCloner enables simulating the creation of a snapshot for that VM. This way, the HostFaultInject.addVmCloner will use this object to create a new VM when the all VMs from a specific broker fail, recovering from the failure.

Since each broker represents a customer, you can simulate the execution of multiple VMs, representing the same service such as a Web Server. These multiple VMs may be used to simulate load balancing and fault tolerance of a hosted service. If you have, for instance, 3 VMs simulating the replication of the same service, this scenario has a 2-fault tolerance level. That means your service will keep running if the maximum of 2 failures happen.

In this scenario, using the VmCloner you get a 3-fault-tolerance level. That is, if all these 3 VMs are destroyed, then a snapshot of the last destroyed VM will be created. The snapshot will take some time to be started, which is randomly chosen internally, simulating the time to get the new VM up and running. Meanwhile, the service will experience some downtime.

See #105 for more details.

Available Examples

@RaysaOliveira RaysaOliveira changed the title from Provide a Fault Injection mechanism for Hosts' PEs (CPUs) to Provide a Fault Injection and recovery mechanism for Hosts' PEs (CPUs) May 3, 2017

@RaysaOliveira RaysaOliveira changed the title from Provide a Fault Injection and recovery mechanism for Hosts' PEs (CPUs) to Provide a Fault Injection and Recovery mechanism for Hosts' PEs (CPUs) May 3, 2017

@manoelcampos manoelcampos added the feature label May 4, 2017

@manoelcampos manoelcampos added this to the CloudSim Plus 2.0 milestone May 4, 2017

@manoelcampos manoelcampos changed the title from Provide a Fault Injection and Recovery mechanism for Hosts' PEs (CPUs) to Fault Injection for Hosts' CPUs and Recovery mechanisms which readjust VM's PEs number and start VM's snapshots when a Host or VM completely fails May 4, 2017

@manoelcampos manoelcampos changed the title from Fault Injection for Hosts' CPUs and Recovery mechanisms which readjust VM's PEs number and start VM's snapshots when a Host or VM completely fails to Fault Injection for Hosts' CPUs and Recovery mechanism which readjust VM's PEs number and start VM's snapshots when a Host or VM completely fails May 4, 2017

@manoelcampos manoelcampos changed the title from Fault Injection for Hosts' CPUs and Recovery mechanism which readjust VM's PEs number and start VM's snapshots when a Host or VM completely fails to Fault Injection for Hosts' CPUs and Recovery mechanism which readjust VM's PEs number and starts VM's snapshots when a Host or VM completely fails May 4, 2017

RaysaOliveira added a commit to RaysaOliveira/cloudsim-plus that referenced this issue May 4, 2017

Updates #81
- Fixed some test failures.
- Added Vm Cloner and Cloudlets Cloner Functions to HostFaultInjection
  to allow creating clone from a given VM and re-create all its
  Cloudlets inside the clone, simulating
  the initilialization of a VM snapshot into
  a different Host when a previous one fails.
- Removed the duplicated attribute schedulingInterval from PowerHost.
  This attribute is got from the Datacenter.
- Added new PowerVm constructor that doesn't require an ID
- Added a new submitCloudlets method into the DatacenterBroker
  that accepts a list of cloudlets and a VM to which
  such cloudlets will be bound to.
- Added a getCloudletList method int the CloudletScheduler
  to get the list of all cloudlets which are executing
  or waiting inside a given VM.

RaysaOliveira added a commit to RaysaOliveira/cloudsim-plus that referenced this issue May 4, 2017

Updates #81
- Documentation updated.
- Several refactorings.
- Updated the HostFaultInjection to automatically set the broker
  of cloned VMs and Cloudlets if one is not set.

RaysaOliveira added a commit to RaysaOliveira/cloudsim-plus that referenced this issue May 4, 2017

Updates #81
- Moved the class PoissonProcess to the distribution package.
- Moved HostFaultInjection to org.cloudsimplus.faultinjection package.
- Created new basic example inside the examples module.

RaysaOliveira added a commit to RaysaOliveira/cloudsim-plus that referenced this issue May 4, 2017

Updates #81
Refactored the HostFaultInjection to use the poisson random number generator
given by the developer.

@RaysaOliveira RaysaOliveira referenced this issue May 15, 2017

Merged

Closes #81 #84

@manoelcampos manoelcampos changed the title from Fault Injection for Hosts' CPUs and Recovery mechanism which readjust VM's PEs number and starts VM's snapshots when a Host or VM completely fails to Fault Injection for Hosts' CPUs and Recovery mechanism which dynamically removes failed PEs from VMs and starts VM's snapshots when a Host or VM completely fails May 28, 2017

RaysaOliveira added a commit to RaysaOliveira/cloudsim-plus that referenced this issue Jun 5, 2017

Updates #81
Created a VmCloner class to store the Vm and Cloudlets Cloner Functions.
It also defines the maximum number of VM clones to be created using
a VmCloner object.
Now, the HostFaultInjection class accepts a VmCloner object,
instead of setting Vm Cloner and Cloudlets Cloner individually.

RaysaOliveira added a commit to RaysaOliveira/cloudsim-plus that referenced this issue Jun 5, 2017

Updates #81
- Changed the faultArrivalTimesGenerator into the HostFaultInjection class
  documentation to indicate the faultArrivalTimesGenerator is
  considered to be in hours (not minutes anymore).
- Included a FaultToleranceLevel inside the JSON SLA Contracts (see SlaContract class).
  Now, the number of VMs to create for each broker is based on this k-fault-tolerance
  level. The AWS EC2 Template to be used to create these k VMs is
  based on the max price the customer is willing to pay hourly for all
  VMs. This way, the price for each VM template cannot be
  higher than maxPrice/k. If this is the case,
  the cheaper VM will be selected and the k will be recomputed
  to avoiding violating the contract price.
  If even the cheaper VM is more expensive than the
  contract price, it will be created only one instance of it,
  violating the contract price, but avoiding the customer
  services to be stopped.

RaysaOliveira added a commit to RaysaOliveira/cloudsim-plus that referenced this issue Jun 5, 2017

Updates #81
- Created a VmCloner class to store the Vm and Cloudlets Cloner Functions.
  It also defines the maximum number of VM clones to be created using
  a VmCloner object.
  Now, the HostFaultInjection class accepts a VmCloner object,
  instead of setting Vm Cloner and Cloudlets Cloner individually.
- Changed the faultArrivalTimesGenerator into the HostFaultInjection class
  documentation to indicate the faultArrivalTimesGenerator is
  considered to be in hours (not minutes anymore).
- Included a FaultToleranceLevel inside the JSON SLA Contracts (see SlaContract class).
  Now, the number of VMs to create for each broker is based on this k-fault-tolerance
  level. The AWS EC2 Template to be used to create these k VMs is
  based on the max price the customer is willing to pay hourly for all
  VMs. This way, the price for each VM template cannot be
  higher than maxPrice/k. If this is the case,
  the cheaper VM will be selected and the k will be recomputed
  to avoiding violating the contract price.
  If even the cheaper VM is more expensive than the
  contract price, it will be created only one instance of it,
  violating the contract price, but avoiding the customer
  services to be stopped.

manoelcampos added a commit that referenced this issue Jun 5, 2017

Updates #81 (PR #105)
* Updates #81

- Created a VmCloner class to store the Vm and Cloudlets Cloner Functions.
  It also defines the maximum number of VM clones to be created using
  a VmCloner object.
  Now, the HostFaultInjection class accepts a VmCloner object,
  instead of setting Vm Cloner and Cloudlets Cloner individually.
- Changed the faultArrivalTimesGenerator into the HostFaultInjection class
  documentation to indicate the faultArrivalTimesGenerator is
  considered to be in hours (not minutes anymore).
- Included a FaultToleranceLevel inside the JSON SLA Contracts (see SlaContract class).
  Now, the number of VMs to create for each broker is based on this k-fault-tolerance
  level. The AWS EC2 Template to be used to create these k VMs is
  based on the max price the customer is willing to pay hourly for all
  VMs. This way, the price for each VM template cannot be
  higher than maxPrice/k. If this is the case,
  the cheaper VM will be selected and the k will be recomputed
  to avoiding violating the contract price.
  If even the cheaper VM is more expensive than the
  contract price, it will be created only one instance of it,
  violating the contract price, but avoiding the customer
  services to be stopped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment