LIKAFKA-8933: Maintain partitions across non-identity mirror maker #55

WarrenGreen · 2017-03-09T22:38:57Z

No description provided.

smccauliff · 2017-03-09T23:19:57Z

src/main/java/com/linkedin/kmf/services/KeyedProduceService.java

+        _nextIndexPerKey.put(key, new AtomicLong(0));
+        _sensors.addPartitionSensors(partition);
+      }
+      _produceExecutor.scheduleWithFixedDelay(new ProduceRunnable(partition, key), _produceDelayMs, _produceDelayMs, TimeUnit.MILLISECONDS);


Seems like there could be more code reuse if the ProduceService has a protected method like createProduceRunnable(int partition). This way this entire method does not need to be duplicated.

smccauliff · 2017-03-09T23:22:09Z

src/main/java/com/linkedin/kmf/services/KeyedProduceService.java

+    HashMap<Integer, String> keyMapping = new HashMap<>();
+
+    int keyIndex = 0;
+    while (keyMapping.size() < partitionNum) {


Why not just have the key be the partition number? I don't see the advantage of this.

We chatted about this a while ago. The issue is that mirror maker does not propagate partition numbers across clusters so we have to use keys. The partitions will then be assigned by a modulus of a hash of the key. If we don't purposefully select our keys, we could have a non-uniform assignment across partitions.

Sorry I wasn't clear. I guess I'm not sure what the difference between just setting the key to be a partition number vs a pseudo random number. Does this help with covering all the partitions in the destination cluster?

We aren't just setting it to be a random number. We are setting it to be a random number which correctly maps to a unique partition when hashed. By searching for keys which each hash to unique partitions we are maintaining uniform partition assignments across the pipeline

smccauliff · 2017-03-09T23:23:15Z

src/main/java/com/linkedin/kmf/services/KeyedProduceService.java

+import static org.apache.kafka.common.utils.Utils.murmur2;
+
+
+public class KeyedProduceService extends ProduceService implements Service {


This class does not have any documentation. I know there are many Kafka monitor classes that do not have documentation, but really that's not a good idea.

smccauliff · 2017-03-09T23:26:05Z

src/main/java/com/linkedin/kmf/services/KeyedProduceService.java

+      try {
+        long nextIndex = _nextIndexPerKey.get(_key).get();
+        String message = Utils.jsonFromFields(_topic, nextIndex, System.currentTimeMillis(), _producerId, _recordSize);
+        BaseProducerRecord record = new BaseProducerRecord(_topic, null, _key, message);


Reuse would probably be higher if there was a super class with protected method byte[] getKey(). This can return null in the super class and in the keyed version it can return the key bytes.

We would still have to change how we handle line 105. We need the partition number to be non-null for the metrics but if we pass the partition number into the BaseProducerRecord, it will override the key.

We could get better code reuse if changed the logic of ProduceRunnable in the base class to insert null as the partition number if the key is non-null. I considered this but worried that the change in the base class wasn't congruent with the base classes purpose.

@smccauliff @lindong28 Do you think the improved code reuse is worth changing the base class in this way?

smccauliff · 2017-03-09T23:28:32Z

src/main/java/com/linkedin/kmf/services/ProduceService.java

@@ -191,12 +191,12 @@ public boolean isRunning() {
    return _running.get();
  }

-  private class ProduceMetrics {
+  protected class ProduceMetrics {


Probably the visibility of this class does not need to change if the other refactorings are implemented.

smccauliff · 2017-03-09T23:31:22Z

src/main/java/com/linkedin/kmf/services/KeyedConsumeService.java

+  }
+
+  private void consume() throws Exception {
+    Map<String, Long> nextIndexes = new HashMap<>();


This method in the original consumer could reused by making this Map<Object, Long> not elegant, but probably ok. The other possibility would be type parameterize the consumer, but that seems like more work for little gain.

smccauliff · 2017-03-09T23:33:11Z

src/main/java/com/linkedin/kmf/services/KeyedConsumeService.java

+        _sensors._consumeError.record();
+        continue;
+      }
+      String key = record.key();


Have a protected method like Object sequenceNumberKeyFromRecord(GenericRecord consumed) that returns the key used in the hash map. This can be string or something for the keyed version and just Integer for the non-keyed version.

lindong28 · 2017-03-10T19:00:27Z

@WarrenGreen Thanks for the patch. I am not sure why we need both KeyedProduceService and ProduceService. Is it possible to just modify the ProduceService so that it always produce message with key? Similarly I am not sure why we need a separate KeyedConsumeService.

WarrenGreen · 2017-03-10T19:06:32Z

@lindong28 Yeah, I think that will be the best way to avoid code reuse. I'll update today.

smccauliff · 2017-03-13T20:05:33Z

This is what I think is going to happen with how the keys are handled. Can you let me know I'm wrong about this?

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import org.apache.kafka.common.utils.Utils;

import static org.apache.kafka.common.utils.Utils.murmur2;

public class PartitionMapping {

  private static final int MAX_PARTITIONS = 128;

  public static void main(String argv[]) throws Exception {

    for (int srcPartitionCount = 1; srcPartitionCount < MAX_PARTITIONS; srcPartitionCount++) {
      for (int destPartitionCount = 1; destPartitionCount < MAX_PARTITIONS; destPartitionCount++) {
        Map<Integer, String> srcPartitionToKey = buildPartitionToKeyMapping(srcPartitionCount);
        Set<Integer> coveredDestinationPartitions = new HashSet<>(destPartitionCount);
        for (String key : srcPartitionToKey.values()) {
          int destinationPartition = keyToDestinationPartition(key.getBytes(), destPartitionCount);
          coveredDestinationPartitions.add(destinationPartition);
        }

        System.out.println(srcPartitionCount + "|" + destPartitionCount + "|" + coveredDestinationPartitions.size());
      }
    }
  }

  /**
   * From Kafka Monitor, produce service
   */
  private static Map<Integer, String> buildPartitionToKeyMapping(int srcPartitionCount) {
    Map<Integer, String> partitionToKey = new HashMap<>();

    int keyIndex = 0;
    while (partitionToKey.size() < srcPartitionCount) {
      String key = keyIndex + "";
      int partition = getDefaultPartitionForKey(key, srcPartitionCount);
      if (!partitionToKey.containsKey(partition)) {
        partitionToKey.put(partition, key);
      }
      keyIndex++;
    }

    return partitionToKey;
  }

  /**
   * From DefaultPartitioner, kafka.  This is what the mirror maker producer is doing.  No?
   */
  private static int keyToDestinationPartition(byte[] key, int destPartitionCount) {
    return toPositive(Utils.murmur2(key)) % destPartitionCount;
  }

  private static int toPositive(int number) {
      return number & 0x7fffffff;
  }

  /**
   * From kafka monitor, produce service
   */
  private static int getDefaultPartitionForKey(String key, int srcPartitionCount) {
    byte[] keyBytes = key.getBytes();
    return toPositive(murmur2(keyBytes)) % srcPartitionCount;
  }
}

'''

smccauliff · 2017-03-15T23:15:22Z

If the source and destination cluster have the same number of partitions then this works fine and I remove by objections to the way keys are used. Then my only concern is code reuse.

WarrenGreen · 2017-03-15T23:31:37Z

Yeah, I intend to put in the docs the contract that requires 1:1 partition count across clusters. This contract is already a part of mirror maker but it will be good to reiterate it here. As for code reuse, the latest commit removes the keyed classes and simply changes the default behavior of ProducerService. Warren Green 248.390.8849

…

On Wed, Mar 15, 2017 at 4:15 PM, Sean McCauliff ***@***.***> wrote: If the source and destination cluster have the same number of partitions then this works fine and I remove by objections to the way keys are used. Then my only concern is code reuse. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABHLZft1YL9ndt3qYYHEwsdb5k2TAk6eks5rmHELgaJpZM4MYuo-> .

lindong28 · 2017-03-16T18:05:42Z

src/main/java/com/linkedin/kmf/apps/MirrorPipelineMonitor.java

@@ -70,6 +70,7 @@ public MirrorPipelineMonitor(Map<String, Object> props, String name) throws Exce
                                                                     String topic,
                                                                     String name) throws Exception {
    Map<String, Object> topicManagementProps = createTopicManagementServiceProps(props, topic);
+    topicManagementProps.put(CommonServiceConfig.ZOOKEEPER_CONNECT_CONFIG, "");


What is the purpose of this empty zk url?

A zookeeper_connect_config is a requirement for TopicManagementServiceConfig

https://github.com/linkedin/kafka-monitor/blob/master/src/main/java/com/linkedin/kmf/services/configs/TopicManagementServiceConfig.java

Yeah I get it is required config. But why do we provide an empty string here? Since it is required config, shouldn't it already be specified in the props which is passed it MirrorPipelineMonitor?

In general assigning an empty string to a required config seems weird. After taking a further look at the code, I realize that you have this line is because you want to change the PARTITIONS_TO_BROKER_RATIO_THRESHOLD based on its current value in the props and the current number of brokers in each cluster. However, this reveals another problem with MirrorPipelineMonitor, i.e. what if the number of brokers change after the MirrorPipelineMonitor starts running? If SRE doubles the number of brokers in one cluster, TopicManagementService will double partition number in that cluster only which breaks MirrorPipelineMonitor, right? It seems that the only choice is to build this logic into TopicManagementService so that it is able to keep partition number equivalent across multiple clusters.

So we prematurely create the TopicManagementServiceConfig as you requested in the previous commit so that we can retrieve the partitionsToBrokerRatio to determine the number of partitions we want to create. Because we create it prematurely, we don't have access to the zk url yet.

I agree that broker count needs to be managed dynamically and there is a TODO to do this. That is not what this commit is regarding though.

@WarrenGreen The reviewer usually makes suggestion based on the code snippet and it is possible that the comment itself may be wrong. You probably need to address the comment that you think is right, and defend the approach if you think the comment is wrong, since you own the code and have spent more time on the patch than anyone else.

I am not sure this can be a TODO. I think the goal is for us to monitor a pipeline of multiple clusters even when the number of live brokers can change. I don't think we have to do everything in this patch. But we have to do that before we are able to monitor the pipeline of multiple clusters in PROD, right?

I'll address this in another patch before reaching prod.

lindong28 · 2017-03-16T18:06:44Z

src/main/java/com/linkedin/kmf/services/ConsumeService.java

@@ -131,7 +131,7 @@ public void run() {
  }

  private void consume() throws Exception {
-    Map<Integer, Long> nextIndexes = new HashMap<>();
+    Map<String, Long> nextIndexes = new HashMap<>();


Why do we change the key from partition number to record's key? It seems to me that the original code would work without change, no?

Addressed below

lindong28 · 2017-03-16T18:18:32Z

src/main/java/com/linkedin/kmf/services/ProduceService.java

    }

    public void run() {
      try {
-        long nextIndex = _nextIndexPerPartition.get(_partition).get();
+        long nextIndex = _nextIndexPerKey.get(_key).get();


It is a bit weird that _nextIndexPerKey is indexed by key while _recordsProducedPerPartition is still indexed by partition. In order to keep the change smaller and make the code style consistent, how about we still use _nextIndexPerPartition? We only need key to construct BaseProducerRecord.

Yeah, that's a good idea

lindong28 · 2017-03-16T18:29:34Z

src/main/java/com/linkedin/kmf/services/ProduceService.java

+
+    int keyIndex = 0;
+    while (keyMapping.size() < partitionNum) {
+      String key = keyIndex + "";


Can you use Integer.toString(keyIndex). Also, it is not very clear to me what the index in keyIndex means. Can we name it e.g. nextInt?

lindong28 · 2017-03-16T18:30:59Z

src/main/java/com/linkedin/kmf/services/ProduceService.java

+  }
+
+  //TODO: Remove method and convert uses to Util.toPositive when kafka version dependency reaches 0.10.1.10
+  public static int toPositive(int number) {


Can we bump up the kafka version to 0.10.1.10 now to avoid copy&paste code?

I'll look into it. Previously we were using 10.0.18 and there was a compatibility issue between clients when we attempted to update the kafka version.

lindong28 · 2017-03-16T18:38:42Z

src/main/java/com/linkedin/kmf/services/ProduceService.java

+    return number & 0x7fffffff;
+  }
+
+  private int getDefaultPartitionForKey(String key) {


We probably need a way for user to configure the partitioning algorithm. The patch currently assumes that MM will only use the new producer's partitioning algorithm. This is OK for users of vanilla Kafka. But we actually have hotfix in LinkedIn which changes the MM to use old producer's partitioning algorithm. This is because the algorithm used by old producer and new producer is different and we may cause message re-order if we simply change the algorithm.

We can user to configure this similar to how produce.producer.class in ProduceServiceConfig is configured.

Yeah, I was thinking the same thing but found that the current design of partitioners would require us to base pipeline monitoring design decisions on partitioner implementations details.

Options would be to:

Base our design on partitioner implementations details

Create key hashing classes in monitor that correspond to hashing strategies in kafka

Create key hashing classes in kafka and have partitioners import them and pipeline monitor import them

Option 3 has is the cleanest solution but has the broadest implications. Option 2 may be a good compromise between the options.

Let's not rely on change in the open source Kafka. I think we can do 2. Is option 2 similar to how produce.producer.class is configured in ProduceServiceConfig?

I agree. Let's stick with option 2.

produce.producer.class is most similar to option 3 but didn't require changes in open source kafka. Option 2 is somewhat similar.

lindong28 · 2017-03-16T19:21:49Z

src/main/java/com/linkedin/kmf/services/ProduceService.java

        String message = Utils.jsonFromFields(_topic, nextIndex, System.currentTimeMillis(), _producerId, _recordSize);
-        BaseProducerRecord record = new BaseProducerRecord(_topic, _partition, null, message);
+        BaseProducerRecord record = new BaseProducerRecord(_topic, null, _key, message);


We probably still need to provide partition to the record because some producer implementation such as KafkaRestProducer only accepts partition.

That should be fine. Partition number will override our key for as long as the partition is propagated through the pipeline but since they should correspond to each other, there shouldn't be issues

But that is based on the assumption that the underlying KMBaseProducer will use the same hashing algorithm to map key to the partition, right? I am not sure every implementation of KMBaseProducer will do that. For example, what if KafkaRestProducer will be extended in the future to allow user to specify the destination partition but not the message key? It seems easier to just provide both information and leave it to the actual KMBaseProducer implementation to decide which one to use.

I think we're saying the same thing. We want both the partition and key to be passed.

Here is another reason to include both partition and key. If we include only the key, we have to make sure that producer always use the same partitioning algorithm that is specified in the config and used by ProduceService. It is necessary but inconvenient to do so. For example, user have to change this algorithm to switch producer class from old producer and new producer. Things go wrong if they forgot to do so. We can avoid this inconvenience if user only wants to monitor one cluster. We can not avoid this inconvenience for user that wants to monitor a pipeline, and that inconvenience may be justified since user has to keep the producer's partition algorithm in-sync with MM's partitioning algorithm anyway.

Ah I see. Sure. Let's just keep both.

lindong28 · 2017-03-22T17:29:01Z

src/main/java/com/linkedin/kmf/partitioner/DefaultPartitioner.java

+import static org.apache.kafka.common.utils.Utils.murmur2;
+
+
+public class DefaultPartitioner  implements Partitioner {


Can you remove the extra space between DefaultPartitioner and implements? Also, we probably need both old producer partition and new producer partitioner and it may be good idea to distinguish between them in the class name.

Can you reply to the second point in the comment?

I've named it DefaultPartitioner based on the name used in Kafka Server. LinkedIn-Kafka-Clients producer also references this partitioner as the "DefaultPartitioner"

The point is that we need partitioner for both old producer and new producer, right? After that, we need a better name to distinguish between the two. Currently you only have partitioner for old producer. What if open source user uses new producer in MM?

Updated naming

lindong28 · 2017-03-22T17:38:45Z

src/main/java/com/linkedin/kmf/services/ProduceService.java

@@ -80,6 +81,8 @@ public ProduceService(Map<String, Object> props, String name) throws Exception {
    _brokerList = config.getString(ProduceServiceConfig.BOOTSTRAP_SERVERS_CONFIG);
    String producerClass = config.getString(ProduceServiceConfig.PRODUCER_CLASS_CONFIG);

+    String partitionerClassName = config.getString(ProduceServiceConfig.PARTITIONER_CLASS_CONFIG);
+    _partitioner = (Partitioner) Class.forName(partitionerClassName).getConstructor().newInstance();


I am not sure. But is it possible to just do config.getConfiguredInstance(ProduceServiceConfig.PARTITIONER_CLASS_CONFIG, Partitioner)?

Yeah, that seems to work

lindong28 · 2017-03-22T17:39:35Z

src/main/java/com/linkedin/kmf/partitioner/Partitioner.java

+ */
+package com.linkedin.kmf.partitioner;
+
+public interface Partitioner {


Can you name the interface KMPartitioner to avoid class name conflicts with open source classes, similar to KMBaseProducer?

lindong28 · 2017-03-22T17:58:35Z

src/main/java/com/linkedin/kmf/services/configs/ProduceServiceConfig.java

@@ -70,6 +74,11 @@
                                    NewProducer.class.getCanonicalName(),
                                    ConfigDef.Importance.LOW,
                                    PRODUCER_CLASS_DOC)
+                            .define(PARTITIONER_CLASS_CONFIG,
+                                    ConfigDef.Type.STRING,
+                                    DefaultPartitioner.class.getCanonicalName(),


We probably need to decide the default value based on the producer class (e.g. new or old). If user has explicitly configured both producer class and partitioner class that conflicts with each other, we should probably throw warning and quit. But its OK if you don't do it here. I can do it in another patch.

lindong28 · 2017-03-24T00:46:42Z

src/main/java/com/linkedin/kmf/partitioner/OldByteArrayKMPartitioner.java

+import java.util.Arrays;
+
+public class OldByteArrayKMPartitioner implements KMPartitioner {
+  public int getPartitionForKey(String key, int partitionNum) {


Can you explain why we need OldByteArrayKMPartitioner in addition to OldKMPartitioner?

I don't know if there is a demand right now but it was one of the partitioners designed for the OldProducer

I think we don't need it. We have added OldKMPartitioner and NewKMPartitioner because we have use-cases for both. It should be safe to remove this OldByteArrayKMPartitioner and only add it if it is useful.

lindong28 · 2017-03-24T00:47:18Z

src/main/java/com/linkedin/kmf/partitioner/KMPartitioner.java

+
+public interface KMPartitioner {
+
+  int getPartitionForKey(String key, int numPartitions);


Can we name this method partition(...) since the open source Partitioner use this method name?

For sure. Will update after coming to consensus to above comments

…sters (#55) Currently ProduceService produces message without key to the specific partition. Messages to the same partition will have incremental sequence number in the payload to allow ConsumeService to detect message loss and out-of-order message delivery. However, when we monitor a pipeline of clusters connected by MirrorMaker, the message to the same partition in the destination cluster may not have incremental sequence number since MM produces message without key. This patch addresses the problem by producing message with key that will map to the given partition using the specified partitioner class. The partitioner class needs to be the same partitioner used by MM. It is configured as org.apache.kafka.clients.producer.internals.DefaultPartitioner by default.

lindong28 · 2017-03-24T01:13:24Z

src/main/java/com/linkedin/kmf/services/configs/ProduceServiceConfig.java

@@ -33,6 +34,9 @@
  public static final String PRODUCER_CLASS_DOC = "Producer class that will be instantiated as producer in the produce service. "
                                                  + "It can be NewProducer, or full class name of any class that implements the KMBaseProducer interface. ";

+  public static final String PARTITIONER_CLASS_CONFIG = "produce.partitioner.class";
+  public static final String PARTITIONER_CLASS_DOC = "KMPartitioner class that corresponds to the partitioner used the target cluster.";


I changed this to ... used in the ...

lindong28 · 2017-03-24T01:13:38Z

@WarrenGreen Thanks for the patch. LGTM.

lindong28 · 2017-03-26T18:36:46Z

@WarrenGreen Have you tested this patch before you update the patch? If not, can you to test it using both kafka-monitor-start.sh and single-cluster-monitor.sh in the future before every update?

smccauliff reviewed Mar 9, 2017

View reviewed changes

WarrenGreen force-pushed the master branch from 29b9f8a to 1e273af Compare March 13, 2017 22:02

lindong28 reviewed Mar 16, 2017

View reviewed changes

LIKAFKA-8933: Maintain partitions across non-identity mirror maker

79edf5e

WarrenGreen force-pushed the master branch from 1e273af to 79edf5e Compare March 22, 2017 06:13

Version bump kafka dependency to 0.10.1.1

f60140e

lindong28 reviewed Mar 22, 2017

View reviewed changes

Warren Green(Online Monetization) added 2 commits March 22, 2017 14:00

Clean whitespace and refactor naming

c1eaffc

Update naming of partitioners

eed2f94

lindong28 reviewed Mar 24, 2017

View reviewed changes

Rename partitioner function

355917f

lindong28 merged commit ded25fb into linkedin:master Mar 24, 2017

lindong28 reviewed Mar 24, 2017

View reviewed changes

		import static org.apache.kafka.common.utils.Utils.murmur2;


		public class KeyedProduceService extends ProduceService implements Service {

		import static org.apache.kafka.common.utils.Utils.murmur2;


		public class DefaultPartitioner implements Partitioner {


		public interface KMPartitioner {

		int getPartitionForKey(String key, int numPartitions);

LIKAFKA-8933: Maintain partitions across non-identity mirror maker #55

LIKAFKA-8933: Maintain partitions across non-identity mirror maker #55

Conversation

WarrenGreen commented Mar 9, 2017

smccauliff Mar 9, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 commented Mar 10, 2017 • edited

WarrenGreen commented Mar 10, 2017

smccauliff commented Mar 13, 2017

smccauliff commented Mar 15, 2017

WarrenGreen commented Mar 15, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 Mar 16, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 Mar 21, 2017 • edited

Choose a reason for hiding this comment

WarrenGreen Mar 21, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 Mar 16, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 Mar 16, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 Mar 21, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 Mar 22, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WarrenGreen Mar 23, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindong28 commented Mar 24, 2017

lindong28 commented Mar 26, 2017

smccauliff Mar 9, 2017 •

edited

lindong28 commented Mar 10, 2017 •

edited

lindong28 Mar 16, 2017 •

edited

lindong28 Mar 21, 2017 •

edited

WarrenGreen Mar 21, 2017 •

edited

lindong28 Mar 16, 2017 •

edited

lindong28 Mar 16, 2017 •

edited

lindong28 Mar 21, 2017 •

edited

lindong28 Mar 22, 2017 •

edited

WarrenGreen Mar 23, 2017 •

edited