# Recovering from Failures

John knows that, with any technology, a service interruption, disk failure, natural disaster, or any other unforeseen event is simply a matter of time. He also knows that how you prepare for these interruptions and how you recover must be planned for and constantly tested. Because he's familiar with business continuity planning, John knows he must have a plan for recovering his instances in the event that a failure occurs.

John intends to spread his marketing instances across multiple Availability Zones and use a load balancer and an EC2 Auto Scaling group to offer availability and resilience. But what about his other workloads that don't require these types of resources? He needs to know what his options are for recovery of the instances and the storage attached to them so he can make the design decision on which storage service or services can meet his service level agreements (SLA) with his users.

There are multiple components to virtualized servers. Some he is responsible for as a customer and some AWS is responsible for under the Shared Responsibility Model.(opens in a new tab) AWS is responsible for protecting the infrastructure that runs all of the services offered in the AWS Cloud. This infrastructure is composed of the hardware, software, networking, and facilities that run AWS Cloud services. 

## Recovery options

You can choose how and what you want to recover. First, John investigates recovery options when the EC2 instance status checks fail.

To learn more, expand each of the following three categories.

### Status checks for your instances
–
The status check information, together with the data provided by Amazon CloudWatch, gives you detailed operational visibility into each of your instances. 



Amazon EC2 performs automated checks on every running EC2 instance to identify hardware and software issues. These automated checks detect whether specific issues are affecting your instances. The event status data augments the information that Amazon EC2 already provides about the state of each instance (such as pending, running, or stopping) and the utilization metrics that CloudWatch monitors (CPU utilization, network traffic, and disk activity).



Status checks are performed every minute, returning a pass or fail status. If all checks pass, the overall status of the instance is OK. If one or more checks fail, the overall status is impaired. Status checks are built into Amazon EC2, so they cannot be turned off or deleted.

When a status check fails, the corresponding CloudWatch metric for status checks is incremented. For more information, see Status check metrics(opens in a new tab). You can use these metrics to create CloudWatch alarms that are initiated based on the result of the status checks. For example, you can create an alarm to warn you if status checks fail on a specific instance.

### Recover your instance
–
Automatic recovery improves instance availability by recovering the instance if it becomes impaired because of an underlying hardware issue. Automatic recovery migrates the instance to other hardware during an instance reboot while retaining its instance ID, private IP addresses, Elastic IP addresses, and all instance metadata.



To automatically recover an instance when a system status check failure occurs, you can configure this option as default or create a CloudWatch alarm. 



A recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata. If the impaired instance has a public IPv4 address, the instance retains the public IPv4 address after recovery. If the impaired instance is in a placement group, the recovered instance runs in the placement group. During instance recovery, the instance is migrated as part of an instance reboot and any data that is in-memory is lost.

### Simplified automatic recovery
–
Instances that support simplified automatic recovery are configured by default to recover a failed instance.  When the setting is configured, it applies to new and existing instance.



Simplified automatic recovery is initiated in response to system status check failures. It does not take place during Service Health Dashboard events or any other events that impact the underlying hardware.

## What information do you need for recovery?

John wants to know what he would need to do if he wanted to back up and recover a single instance that had Amazon EFS, Amazon S3, and Amazon EBS volumes.

Making snapshots of an EBS volume gives you a consistent, point-in-time copy of the data in the volume, but that alone is not enough to recover your instance because it's only the data volume and contains no information about the instance type, OS, software, and so on installed on the original instance. For an EC2 instance to be recovered fully, you need to have the EBS snapshots and a copy of the AMI used to build the instance, updated and with the applicable software versions and patches. If you use the original AMI that you used to build the instance but did not create a gold image after you installed software and patches, you would have to launch the original AMI and customize it with all of the software, configurations, and patches to get the applications back to their pre-interrupted state. Finally, if you had storage connected to Amazon EFS and Amazon S3, you would need to have documentation of the mount points and bucket names. 

You can ensure that all these items are backed up using custom scripts or third-party tools, or you can use AWS Backup(opens in a new tab) for a one-stop backup solution.

## AWS Backup to recover files and instances

Backing up and restoring an EC2 instance requires more protection than just the instance’s individual EBS volumes. To restore an instance, you need to restore all the EBS volumes and also recreate an identical instance containing the instance type, VPC, security group, IAM role, and the like.

If you use AWS Backup, it protects all EBS volumes attached to the instance and attaches them to an AMI that stores all parameters from the original EC2 instance (except for Elastic Inference Accelerator and user data scripts).

When the backup is complete, you can easily restore the full instance using the console, API, or AWS Command Line Interface (AWS CLI). You can restore and edit all parameters using the API or AWS CLI and, in the console, and you can restore and edit all 16 parameters from your original EC2 instance.

## Backup and restore research complete

John is done with his recovery research and is impressed with the built-in resilience of the AWS components. For now, he's done with this research and is diagramming a potential solution for marketing when Sofía walks in to check on him.