#Elasticsearch snapshot and restore

Elasticsearch snapshot and restore API's allows to create snapshots of individual indices or an entire cluster into a remote repository. The API's allows to take a snapshot and save it to many repository types like file system, shared UNC paths, Amazon S3 (and other cloud providers), HDFS and etc.

In this post I will briefly explain how to take a cluster snapshot running on one machine and restore it on another. I will focus on how to take a snapshot specifically on Amazon EC2 instance using Amazon S3 as a repository and restore it on another Amazon EC2 instance.

I deliberately keep it as simple as possible and if you wish to have more advanced options, you can always refer to Elasticsearch documentation here and here

##Prerequisites

In order to continue I assume that you have at least two running Amazon EC2 instances
On each instance you already have Elasticsearch installed
On each instance you need to install cloud-aws plugin. Here is how to install it:

  #Under your Elasticsearch installation usually under /usr/share/elasticsearch run
  sudo bin/plugin install cloud-aws
    
  #You'll need to restart elasticsearch (make sure you know how to do it without harming your cluster) 
  service elasticsearch restart

For elaborated information on how to have the prerequisites done, you may refer to the following article

##REST APIs for cluster snapshot to S3 and restore Define a the snapshot configuration in Elasticsearch

  #Set snapshot definitions. Refer to Elasticsearch documentation for advanced options
  PUT _snapshot/my_snapshot
  {
    "type": "s3",
    "settings": {
      "bucket": "your_predifined_s3_bucket",
      "region": "us-west-1",
      "base_path": "path_under_s3_bucket",
      "access_key": "your_amazon_s3_accesskey",
      "secret_key": "your_amazon_s3_secretkey"
    }
  }

Take a snapshot and give it a name (ex. snapshot_1)

  #Run the snapshot process
  PUT /_snapshot/my_snapshot/snapshot_1?wait_for_completion=true

Get snapshot process status (it may take time to complete the operation)

  #Get snapshot status
  GET /_snapshot/my_snapshot/_status

Validate the snapshot

  #Validate snapshot
  POST /_snapshot/my_snapshot/_verify

##Reload the snapshot on another machine Define the same snapshot configuration as above:

  #In order to reload cluster at another machine
  PUT _snapshot/my_snapshot
  {
    "type": "s3",
    "settings": {
      "bucket": "your_predifined_s3_bucket",
      "region": "us-west-1",
      "base_path": "path_under_s3_bucket",
      "access_key": "your_amazon_s3_accesskey",
      "secret_key": "your_amazon_s3_secretkey"
    }
  }

Run the restore process on the second cluster

  #Run restore
  POST /_snapshot/my_snapshot/snapshot_1/_restore

Run validations as above on the second cluster

Please note that you can have control on many parameters like the indexes to be snapshot/restored, index metadata, the snapshot rate, bulk sizes and many more parameters. All are explained well in the following Elasticsearch documentation

Hope that this post helps :)

Follow me on:

Medium | Twitter | Linkedin | Stackoverflow | GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

es-snapshot-restore-s3.md

es-snapshot-restore-s3.md

Files

es-snapshot-restore-s3.md

Latest commit

History

es-snapshot-restore-s3.md

File metadata and controls