Skip to content

Latest commit

 

History

History
 
 

export-neptune-to-elasticsearch

Export Neptune to ElasticSearch

The Neptune Full-Text Search CloudFormation templates provide a mechanism for indexing all new data that is added to an Amazon Neptune database in ElasticSearch. However, there are situations in which you may want to index existing data in a Neptune database prior to enabling the full-text search integration.

This solution allows you to index existing data in an Amazon Neptune database in ElasticSearch before enabling Neptune's full-text search integration.

Once you have populated ElasticSearch with your existing Neptune data, you can remove this solution from your account.

Prerequisites

Before provisioning the solution ensure the following conditions are met:

  • You have an existing Neptune cluster and an existing ElasticSearch cluster in the same VPC
  • ElasticSearch is version 7.1 or above
  • You have at least one subnet with a route to the internet:
    • Either, a subnet with the Auto-assign public IPv4 address set to Yes, a route table with a route destination of 0.0.0.0/0, and an internet gateway set to Target (for example, igw-1a2b3c4d).
    • Or, a subnet with the Auto-assign public IPv4 address set to No, a route table with a route destination of 0.0.0.0/0, and a NAT gateway set to Target (for example, nat-12345678901234567). For more details, see Routing.
  • You have VPC security groups that can be used to access your Neptune and ElasticSearch clusters.

This solution uses neptune-export to export data from your Neptune database. We recommend using neptune-export against a static version of your data. Either suspend writes to your database while the export is taking place, or run the export against a snapshot or clone of your database.

neptune-export uses long-running queries to get data from Neptune. You may need to increase the neptune_query_timeout DB parameter in order to run the export solution against large datasets.

The export process uses SSL to connect to Neptune. It currently supports IAM Database Authentication for Gremlin, but not SPARQL.

Installation

  1. Launch the Neptune-to-ElasticSearch CloudFormation stack for your Region from the table below.

  2. Once the stack has been provisioned, open a terminal and run the StartExportCommand AWS Command Line Interface (CLI) command from the CloudFormation output. For example:

    aws lambda invoke \
      --function-name arn:aws:lambda:eu-west-1:000000000000:function:export-neptune-to-kinesis-xxxx \
      --region eu-west-1 \
      /dev/stdout
    

    The function returns the name and ID of an AWS Batch job that begins the export from Neptune.

  3. Once you have successfully populated ElasticSearch with existing data in your Neptune database, you can remove this solution from your account by deleting the CloudFormation stack.

Region Stack
US East (N. Virginia)
US East (Ohio)
US West (Oregon)
Europe (Ireland)
Europe (London)
Europe (Frankfurt)
Europe (Stockholm)
Asia Pacific (Mumbai)
Asia Pacific (Seoul)
Asia Pacific (Singapore)
Asia Pacific (Sydney)
Asia Pacific (Tokyo)

Solution overview

Export Neptune to ElasticSearch

  1. You trigger the export process via an AWS Lambda Function
  2. The export process uses AWS Batch to host and execute neptune-export, which exports data from Neptune and publishes it to an Amazon Kinesis Data Stream in the Neptune Streams format.
  3. A second AWS Lambda function polls the Kinesis Stream and publishes records to your Amazon ElasticSearch cluster. This function uses the same parsing and publishing code as the Neptune Streams ElasticSearch integration solution.

Monitoring and troubleshooting

To diagnose issues with the export from Neptune to Kinesis, consult the Amazon CloudWatch logs for your AWS Batch export-neptune-to-kinesis-job. These logs will indicate whether neptune-export was successfully downloaded to the batch instance, and the progress of the export job. When reviewing the logs, ensure that:

  • neptune-export has been successfully downloaded to the Batch compute instance
  • neptune-export has successfully exported nodes and relationships from Neptune and published them to Kinesis

If your job is stuck in a RUNNABLE state, you may need to review the network and security settings for your AWS Batch compute environment. See Verify the network and security settings of the compute environment in this knowledge article.

To diagnose issues with the indexing of data in Amazon ElasticSearch, consult the Amazon CloudWatch logs for your kinesis-to-elasticsearch AWS Lambda function. These logs will show the Lambda connecting to ElasticSearch, and will indicate how many records from the Kinesis Stream have been processed.

Example performance

Neptune ElasticSearch Vertices Edges Concurrency Kinesis Shards Batch Size Duration
r4.2xlarge 5.large 21932 66622 2 8 100 47 seconds
r5.12xlarge r5.4xlarge 281,707,103 1,770,726,703 4 32 200 4 hours