Add PLACEMENT_AWARE partition group support. (#221)

hazelcast · Jan 22, 2021 · 94420e2 · 94420e2
1 parent 8199c00
commit 94420e2
Show file tree

Hide file tree

Showing 14 changed files with 494 additions and 33 deletions.
diff --git a/README.md b/README.md
@@ -318,9 +318,27 @@ The plugin works correctly on the AWS Elastic Beanstalk environment. While deplo
 * IAM instance profile contains IAM role which has `ec2:DescribeInstances` permission (or your Hazelcast configuration contains `access-key` and `secret-key`)
 * Deployment policy is `Rolling` (instead of the default `All at once` which may cause the whole Hazelcast members to restart at the same time and therefore lose data)
 
-## Zone Aware
+## High Availability
 
-Hazelcast AWS Discovery plugin supports Hazelcast Zone Aware feature for both EC2 and ECS. When using `ZONE_AWARE` configuration, backups are created in the other Availability Zone.
+By default, Hazelcast distributes partition replicas (backups) randomly and equally among cluster members. However, this is not safe in terms of high availability when a partition and its replicas are stored on the same rack, using the same network, or power source. To deal with that, Hazelcast offers logical partition grouping, so that a partition
+itself and its backup(s) would not be stored within the same group. This way Hazelcast guarantees that a possible failure
+affecting more than one member at a time will not cause data loss. The details of partition groups can be found in the
+documentation: 
+[Partition Group Configuration](https://docs.hazelcast.org/docs/latest/manual/html-single/#partition-group-configuration)
+
+In addition to two built-in grouping options `ZONE_AWARE` and `PLACEMENT_AWARE`, you can customize the formation of
+these groups based on the network interfaces of members. See more details on custom groups in the documentation:
+[Custom Partition Groups](https://docs.hazelcast.org/docs/latest/manual/html-single/#custom).
+
+
+### Multi-Zone Deployments
+
+If `ZONE_AWARE` partition group is enabled, the backup(s) of a partition is always stored in a different availability
+zone. Hazelcast AWS Discovery plugin supports ZONE_AWARE feature for both EC2 and ECS.
+
+***NOTE:*** *When using the `ZONE_AWARE` partition grouping, a cluster spanning multiple Availability Zones (AZ)
+should have an equal number of members in each AZ. Otherwise, it will result in uneven partition distribution among
+the members.*
 
 #### XML Configuration
 
@@ -334,7 +352,7 @@ Hazelcast AWS Discovery plugin supports Hazelcast Zone Aware feature for both EC
 hazelcast:
   partition-group:
     enabled: true
-    group-type: ZONE-AWARE
+    group-type: ZONE_AWARE
 ```
 
 #### Java-based Configuration
@@ -345,8 +363,61 @@ config.getPartitionGroupConfig()
     .setGroupType(MemberGroupType.ZONE_AWARE);
 ```
 
-***NOTE:*** *When using the `ZONE_AWARE` partition grouping, a cluster spanning multiple Availability Zones (AZ) should have an equal number of members in each AZ. Otherwise, it will result in uneven partition distribution among the members.*
+### Partition Placement Group Deployments
+
+[AWS Partition Placement Group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition)
+(PPG) ensures low latency between the instances in the same partition of a placement group
+and also provides availability since no two partitions share the same underlying hardware. As long as the partitions of a 
+PPG contain an equal number of instances, it will be good practice for Hazelcast clusters formed within a single zone.
+
+If EC2 instances belong to a PPG and `PLACEMENT_AWARE` partition group is enabled, then Hazelcast members will be grouped
+by the partitions of the PPG. For instance, the Hazelcast members in the first partition of a PPG named `ppg` will belong
+to the partition group of `ppg-1`, and those in the second partition will belong to `ppg-2` and so on. Furthermore, these
+groups will be specific to each availability zone. That is, they are formed with zone names as well: `us-east-1-ppg-1`,
+`us-east-2-ppg-1`, and the like. However, if a Hazelcast cluster spans multiple availability zones then you should
+consider using `ZONE_AWARE`.
+
+### Cluster Placement Group Deployments
+
+[AWS Cluster Placement Group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-cluster)
+(CPG) ensures low latency by packing instances close together inside an availability zone.
+If you favor latency over availability, then CPG will serve your purpose.
+
+***NOTE:*** *In the case of CPG, using `PLACEMENT_AWARE` has no effect, so can use the default Hazelcast partition group
+strategy.*
+
+### Spread Placement Group Deployments
+
+[AWS Spread Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-spread)
+(SPG) ensures high availability in a single zone by placing each instance in a group on a
+distinct rack. It provides better latency than multi-zone deployment, but worse than Cluster Placement Group. SPG is
+limited to 7 instances, so if you need a larger Hazelcast cluster within a single zone, you should use PPG instead.
+
+***NOTE:*** *In the case of SPG, using `PLACEMENT_AWARE` has no effect, so can use the default Hazelcast partition group
+strategy.*
+
+#### XML Configuration
+
+```xml
+<partition-group enabled="true" group-type="PLACEMENT_AWARE" />
+```
+
+#### YAML Configuration
+
+```yaml
+hazelcast:
+  partition-group:
+    enabled: true
+    group-type: PLACEMENT_AWARE
+```
+
+#### Java-based Configuration
 
+```java
+config.getPartitionGroupConfig()
+    .setEnabled(true)
+    .setGroupType(MemberGroupType.PLACEMENT_AWARE);
+```
 
 ## Autoscaling
 

diff --git a/src/main/java/com/hazelcast/aws/AwsClient.java b/src/main/java/com/hazelcast/aws/AwsClient.java
@@ -16,6 +16,7 @@
 package com.hazelcast.aws;
 
 import java.util.Map;
+import java.util.Optional;
 
 /**
  * Responsible for fetching discovery information from AWS APIs.
@@ -24,4 +25,22 @@ interface AwsClient {
     Map<String, String> getAddresses();
 
     String getAvailabilityZone();
+
+    /**
+     * Returns the placement group name of the service if specified.
+     *
+     * @see <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html">Placement Groups</a>
+     */
+    default Optional<String> getPlacementGroup() {
+        return Optional.empty();
+    }
+
+    /**
+     * Returns the partition number of the service if it belongs to a partition placement group.
+     *
+     * @see <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition">Partition Placement Groups</a>
+     */
+    default Optional<String> getPlacementPartitionNumber() {
+        return Optional.empty();
+    }
 }
diff --git a/src/main/java/com/hazelcast/aws/AwsDiscoveryStrategy.java b/src/main/java/com/hazelcast/aws/AwsDiscoveryStrategy.java
@@ -30,6 +30,7 @@
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
+import java.util.Optional;
 
 import static com.hazelcast.aws.AwsProperties.ACCESS_KEY;
 import static com.hazelcast.aws.AwsProperties.CLUSTER;
@@ -63,6 +64,9 @@ public class AwsDiscoveryStrategy
     private static final int DEFAULT_CONNECTION_TIMEOUT_SECONDS = 10;
     private static final int DEFAULT_READ_TIMEOUT_SECONDS = 10;
 
+    // Corresponds to PartitionGroupMetaData.PARTITION_GROUP_PLACEMENT
+    static final String PARTITION_GROUP_PLACEMENT = "hazelcast.partition.group.placement";
+
     private final AwsClient awsClient;
     private final PortRange portRange;
 
@@ -132,10 +136,40 @@ public Map<String, String> discoverLocalMetadata() {
             String availabilityZone = awsClient.getAvailabilityZone();
             LOGGER.info(String.format("Availability zone found: '%s'", availabilityZone));
             memberMetadata.put(PartitionGroupMetaData.PARTITION_GROUP_ZONE, availabilityZone);
+
+            getPlacementGroup().ifPresent(pg ->
+                    memberMetadata.put(PARTITION_GROUP_PLACEMENT, availabilityZone + '-' + pg));
         }
         return memberMetadata;
     }
 
+    /**
+     * Resolves the placement group of the resource if it belongs to any.
+     * <p>
+     * If the placement group is Cluster Placement Group or Spread Placement Group, then returns
+     * the group name. If it is Partition Placement Group, then returns the group name with the
+     * partition number prefixed by '-' appended.
+     * <p>
+     * When forming partition groups, this name should be combined with zone name. Otherwise
+     * two resources in different zones but in the same placement group will be assumed as
+     * a single group.
+     *
+     * @see AwsClient#getPlacementGroup()
+     * @see AwsClient#getPlacementPartitionNumber()
+     * @return  Placement group name if exists, empty otherwise.
+     */
+    private Optional<String> getPlacementGroup() {
+        Optional<String> placementGroup = awsClient.getPlacementGroup();
+        if (!placementGroup.isPresent()) {
+            LOGGER.fine("No placement group is found.");
+            return Optional.empty();
+        }
+        StringBuilder result = new StringBuilder(placementGroup.get());
+        awsClient.getPlacementPartitionNumber().ifPresent(ppn -> result.append('-').append(ppn));
+        LOGGER.info(String.format("Placement group found: '%s'", result.toString()));
+        return Optional.of(result.toString());
+    }
+
     @Override
     public Iterable<DiscoveryNode> discoverNodes() {
         try {

diff --git a/src/main/java/com/hazelcast/aws/AwsDiscoveryStrategyFactory.java b/src/main/java/com/hazelcast/aws/AwsDiscoveryStrategyFactory.java
@@ -127,6 +127,7 @@ static boolean isEndpointAvailable(String url) {
                 .withReadTimeoutSeconds(1)
                 .withRetries(1)
                 .get()
+                .getBody()
                 .isEmpty();
     }
 

diff --git a/src/main/java/com/hazelcast/aws/AwsEc2Api.java b/src/main/java/com/hazelcast/aws/AwsEc2Api.java
@@ -203,7 +203,8 @@ private String callAwsService(Map<String, String> attributes, Map<String, String
         String query = canonicalQueryString(attributes);
         return createRestClient(urlFor(endpoint, query), awsConfig)
             .withHeaders(headers)
-            .get();
+            .get()
+            .getBody();
     }
 
     private static String urlFor(String endpoint, String query) {

diff --git a/src/main/java/com/hazelcast/aws/AwsEc2Client.java b/src/main/java/com/hazelcast/aws/AwsEc2Client.java
@@ -16,6 +16,7 @@
 package com.hazelcast.aws;
 
 import java.util.Map;
+import java.util.Optional;
 
 class AwsEc2Client implements AwsClient {
     private final AwsEc2Api awsEc2Api;
@@ -37,4 +38,14 @@ public Map<String, String> getAddresses() {
     public String getAvailabilityZone() {
         return awsMetadataApi.availabilityZoneEc2();
     }
+
+    @Override
+    public Optional<String> getPlacementGroup() {
+        return awsMetadataApi.placementGroupEc2();
+    }
+
+    @Override
+    public Optional<String> getPlacementPartitionNumber() {
+        return awsMetadataApi.placementPartitionNumberEc2();
+    }
 }
diff --git a/src/main/java/com/hazelcast/aws/AwsEcsApi.java b/src/main/java/com/hazelcast/aws/AwsEcsApi.java
@@ -138,7 +138,8 @@ private String callAwsService(String body, Map<String, String> headers) {
         return createRestClient(urlFor(endpoint), awsConfig)
             .withHeaders(headers)
             .withBody(body)
-            .post();
+            .post()
+            .getBody();
     }
 
     private static JsonObject toJson(String jsonString) {

diff --git a/src/main/java/com/hazelcast/aws/AwsMetadataApi.java b/src/main/java/com/hazelcast/aws/AwsMetadataApi.java
@@ -17,8 +17,14 @@
 
 import com.hazelcast.internal.json.Json;
 import com.hazelcast.internal.json.JsonObject;
+import com.hazelcast.logging.ILogger;
+import com.hazelcast.logging.Logger;
+
+import java.util.Optional;
 
 import static com.hazelcast.aws.AwsRequestUtils.createRestClient;
+import static com.hazelcast.aws.RestClient.HTTP_NOT_FOUND;
+import static com.hazelcast.aws.RestClient.HTTP_OK;
 
 /**
  * Responsible for connecting to AWS EC2 and ECS Metadata API.
@@ -28,6 +34,7 @@
  * @see <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint.html">ECS Task Metadata</a>
  */
 class AwsMetadataApi {
+    private static final ILogger LOGGER = Logger.getLogger(AwsMetadataApi.class);
     private static final String EC2_METADATA_ENDPOINT = "http://169.254.169.254/latest/meta-data";
     private static final String ECS_IAM_ROLE_METADATA_ENDPOINT = "http://169.254.170.2" + System.getenv(
         "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI");
@@ -60,22 +67,62 @@ class AwsMetadataApi {
 
     String availabilityZoneEc2() {
         String uri = ec2MetadataEndpoint.concat("/placement/availability-zone/");
-        return createRestClient(uri, awsConfig).get();
+        return createRestClient(uri, awsConfig).get().getBody();
+    }
+
+    Optional<String> placementGroupEc2() {
+        return getOptionalMetadata(ec2MetadataEndpoint.concat("/placement/group-name/"), "placement group");
+    }
+
+    Optional<String> placementPartitionNumberEc2() {
+        return getOptionalMetadata(ec2MetadataEndpoint.concat("/placement/partition-number/"), "partition number");
+    }
+
+    /**
+     * Resolves an optional metadata that exists for some instances only.
+     * HTTP_OK and HTTP_NOT_FOUND responses are assumed valid. Any other
+     * response code or an exception thrown during retries will yield
+     * a warning log and an empty result will be returned.
+     *
+     * @param uri  Metadata URI
+     * @param loggedName  Metadata name to be used when logging.
+     * @return  The metadata if the endpoint exists, empty otherwise.
+     */
+    private Optional<String> getOptionalMetadata(String uri, String loggedName) {
+        RestClient.Response response;
+        try {
+            response = createRestClient(uri, awsConfig)
+                    .expectResponseCodes(HTTP_OK, HTTP_NOT_FOUND)
+                    .get();
+        } catch (Exception e) {
+            // Failed to get a response with code OK or NOT_FOUND after retries
+            LOGGER.warning(String.format("Could not resolve the %s metadata", loggedName));
+            return Optional.empty();
+        }
+        int responseCode = response.getCode();
+        if (responseCode == HTTP_OK) {
+            return Optional.of(response.getBody());
+        } else if (responseCode == HTTP_NOT_FOUND) {
+            LOGGER.fine(String.format("No %s information is found.", loggedName));
+            return Optional.empty();
+        } else {
+            throw new RuntimeException(String.format("Unexpected response code: %d", responseCode));
+        }
     }
 
     String defaultIamRoleEc2() {
         String uri = ec2MetadataEndpoint.concat(SECURITY_CREDENTIALS_URI);
-        return createRestClient(uri, awsConfig).get();
+        return createRestClient(uri, awsConfig).get().getBody();
     }
 
     AwsCredentials credentialsEc2(String iamRole) {
         String uri = ec2MetadataEndpoint.concat(SECURITY_CREDENTIALS_URI).concat(iamRole);
-        String response = createRestClient(uri, awsConfig).get();
+        String response = createRestClient(uri, awsConfig).get().getBody();
         return parseCredentials(response);
     }
 
     AwsCredentials credentialsEcs() {
-        String response = createRestClient(ecsIamRoleEndpoint, awsConfig).get();
+        String response = createRestClient(ecsIamRoleEndpoint, awsConfig).get().getBody();
         return parseCredentials(response);
     }
 
@@ -89,7 +136,7 @@ private static AwsCredentials parseCredentials(String response) {
     }
 
     EcsMetadata metadataEcs() {
-        String response = createRestClient(ecsTaskMetadataEndpoint, awsConfig).get();
+        String response = createRestClient(ecsTaskMetadataEndpoint, awsConfig).get().getBody();
         return parseEcsMetadata(response);
     }