Skip to content

symeta/msk-eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

msk-eda

1. eda context review

1.1 traditional sync request-response model pros & cons analysis

image

1.2 ASYNC message model:

  • point-point model (queue/router)
  • pub-sub model

1.3 Even Bus:

image

2. msk overview

2.1 kafka application scenario

    1. Distributed Message Cache

image

    1. Distributed Event Bus

image

2.2 msk managed scope

image

2.3 msk sizing

2.4 msk configuration

1. default.replication.factor

 same with AZ quantity

2. num.partitions

Screenshot 2023-05-10 at 11 56 53

3. delete.topic.enable

4. retention.ms

This configuration controls the maximum time we will retain a log before we will discard old log segments to free up space if we are using the "delete" retention policy. This represents an SLA on how soon consumers must read their data. If set to -1, no time limit is applied.

  • Type: long
  • Default: 604800000 (7 days)
  • Valid Values: [-1,...]
  • Server Default Property: log.retention.ms
  • Importance: medium

5. other config items:

  • auto.create.topics.enable=true
  • min.insync.replicas=2
  • num.io.threads=8
  • num.network.threads=5
  • num.replica.fetchers=2
  • replica.lag.time.max.ms=30000
  • socket.receive.buffer.bytes=102400
  • socket.request.max.bytes=104857600
  • socket.send.buffer.bytes=102400
  • zookeeper.session.timeout.ms=18000

2.5 msk producer sample code (python)

pip install kafka-pythonpip install kafka-python
from kafka import KafkaProducer
def send_data(_kafka_topic, _producer):
    while True:
        data = get_random_record()
        partition_key = str(data["rowkey"])
        print(data)
        _producer.send(_kafka_topic, json.dumps(data).encode('utf-8'))

if __name__ == '__main__':
    producer = KafkaProducer(bootstrap_servers="<msk cluster broker list >")
    KAFKA_TOPIC = "<topic name>"
    send_data(KAFKA_TOPIC, producer)

producer acks

The default value is 1, which means as long as the producer receives an ack from the leader broker of that topic, it would take it as a successful commit and continue with the next message. It’s not recommended to set acks=0, because then you don’t get any guarantee on the commit. acks=all would make sure that the producer gets acks from all the in-sync replicas of this topic. It gives the strongest message durability, but it also takes long time which results in higher latency. So, you need to decide what is more important for you

2.6 msk consumer sample code (pyspark)

from pyspark.sql import SparkSession
spark = SparkSession \
        .builder \
        .appName("<app name>") \
        .config("spark.sql.debug.maxToStringFields", "100") \
        .getOrCreate()
kafka_df = spark.readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", "<msk cluster broker list>") \
    .option("kafka.security.protocol", "SSL") \
    .option("failOnDataLoss", "false") \
    .option("subscribe", "topic1") \
    .option("includeHeaders", "true") \
    .option("startingOffsets", "latest") \
    .option("spark.streaming.kafka.maxRatePerPartition", "50") \
    .load()

2.7 msk scaling

2.8 Kafka Connect

2.9 msk monotoring

image

2.10 msk cli

msk cluster broker list sample:

<broker1 endpoint>:9092,<broker2 endpoint>:9092,<broker3 endpoint>:9092

cli sample:

./kafka-topics.sh --bootstrap-server <msk cluster broker list> —list

./kafka-console-consumer.sh --bootstrap-server <msk cluster broker list> --topic <topic name> from beginning

./kafka-console-consumer.sh --bootstrap-server <msk cluster broker list> --topic <topic name>

./kafka-topics.sh —bootstrap-server <msk cluster broker list>  —create —topic <topic name> —partitions 3 —replication-factor 2

2.11 msk tier storage

this feature is to offer the possibility that all kafka cluster data is persistent with optimized storage. version 2.8.2 and above available.

3. useful resources:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published