kafka-faq

169 questions and answers about Apache Kafka and growing. Download the flashcards for spaced repetition!

kafka-faq

When a _____ detects a quota violation, it computes and returns the amount of delay needed to bring the violating client under its quota. It then _____ to the client, refusing to process requests from it until the delay is over. The client will also refrain from sending further requests to the broker during the delay.

broker mutes the channel bi-directionally

For any Linux filesystem used for data directories, enabling the _____ option is recommended, as it disables updating of a file's atime (last access time) attribute when the file is read. This can eliminate a significant number of filesystem writes, as Kafka does not rely on the atime attributes at all.

noatime

Could you use an asynchronous workflow to do expensive work (such as periodic data aggregation) in advance?

Yes

Can message queues receive messages?

Yes

Can message queues hold messages?

Yes

Can message queues deliver messages?

Yes

Can Redis be used as a message broker?

Yes

Can messages be lost in a Redis message broker?

Yes

An application publishes a job to a message queue, then notifies the user of the job status. A _____ picks up the job from the queue, processes it, then signals its completion.

worker

In asynchronous workflows, jobs are processed in the _____ without blocking the user. For example, a tweet can instantly appear on your timeline, but could take some time before it is actually delivered to followers.

background

Can queues add delays to operations?

Yes

When dealing with many inexpensive or realtime operations, are queues a good use case?

They can be, but they can introduce delay and complexity compared to synchronous execution.

A queue has grown significantly, becoming larger than available memory. What are some problems that may appear?

Cache misses, disk reads, slower performance

_____ pressure limits queue sizes, allowing for good throughput / latency for jobs already in the queue. Once filled, the queue's clients are asked to try again later.

Back pressure

What protocol is used in RabbitMQ message queues?

AMQP

Scheduled _____ queues receive tasks, run them and deliver the results.

Task queues

Are real-time payments and financial transactions a use case for event streaming?

Yes

Is real-time shipment/logistics monitoring a use case for event streaming?

Yes

Is real-time IoT device monitoring a use case for event streaming?

Yes

Is user interaction telemetry a use case for event streaming?

Yes

Is microservice implementation a use case for event streaming?

Yes

Kafka's three key capabilities are: To _____/_____ to streams of events To _____ streams of events durably, reliably and indefinitely. To _____ streams of events, as they occur or retrospectively.

publish / subscribe store process

Can Kafka implement continous import/export of your data from/to other systems?

Yes

Kafka is run as a cluster of one or more servers that can span multiple datacenters or cloud regions. Some of these servers form the storage layer, called the _____.

brokers

If a Kafka server fails, the other servers will _____ to ensure continuous operations without any data loss.

take over its work

Reading / writing data to Kafka is done in the form of _____.

events

An event consists of:

_____: "Alice"
_____: "Made a payment of $200 to Bob"
_____: "Jun. 25, 2020 at 2:06 p.m."
_____ (optional)

a key a value a timestamp

metadata (optional)

_____ are client applications that publish (write) events to Kafka

Producers.

_____ are clients that subscribe to (read and process) Kafka events.

consumers

Do produces sometimes need to wait for consumers by design?

No - they are fully decoupled

Does Kafka provide the ability to process an event exactly once?

Yes - guarantees it.

Events are organized and durably stored in _____, similar to files stored in a folder.

topics

An example _____ name could be "payments".

topic

_____ in Kafka are always multi-producer and multi-subscriber: each can always have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events.

Topics

A Kafka event has been consumed. What happens to it?

It is retained for as long as it is defined to be retained, configured per topic.

Topics are partitioned, meaning a topic is spread over a number of "_____" located on different Kafka brokers.

buckets

Topics are distributed via partitioning (buckets). This improves scalability because it allows client applications to both read and write the data from/to many _____ at the same time.

brokers

Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as _____.

they were written

When a new event is published to a topic, it is actually appended to one of the topic's _____. Events with the same event key (such as ID) are all written to the same one.

partitions

Can Kafka replace a traditional message broker?

Yes

Can Kafka be used for log aggregation?

Yes

Can Kafka process data in multiple-stage pipelines, where raw input data is consumed from Topics, then aggregated/enriched/transformed into new topics for further consumptions and processing?

Yes

Can an event represent a payment transaction?

Yes

Can an event represent a geolocation update?

Yes

Can an event represent a shipping order?

Yes

Can Kafka support log aggregation?

Yes

Does Kafka support large data backlogs?

Yes

In Kafka, can you process feeds to create new, derived feeds?

Yes - implemented by partitioning and the consumer model.

Kafka relies heavily on the _____ for storing and caching messages.

filesystem

In Kafka, using the filesystem and relying on _____ is superior to maintaining an in-memory cache or other structure�we at least double the available cache by having automatic access to all free memory, and likely double again by storing a compact byte structure rather than individual objects.

pagecache

Kafka protocol is built around a "_____" abstraction where network requests group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time. The server in turn appends chunks of messages to its log in one go, and the consumer fetches large linear chunks at a time.

message set

Byte copying can be an inefficiency while under large load. To avoid this we employ a standardized binary message format that is shared by the _____, the _____ and the _____ (so data chunks can be transferred without modification between them).

producer, broker and consumer

The _____ maintained by the broker is itself just a directory of files, each populated by a sequence of message sets that have been written to disk in the same format used by the producer and consumer.

message log

The producer sends data directly to the broker that is the _____ for the partition. To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriately direct its requests.

leader

The Kafka _____ works by issuing "fetch" requests to the brokers leading the partitions it wants to consume. It specifies its offset in the log with each request and receives back a chunk of log beginning from that position, with possibility to rewind it to re-consume data as needed.

consumer

In Kafka, data is pushed from the _____ to the _____.

producer broker

In Kafka, data is pulled from the _____ by the _____.

broker consumer

A _____-based system like Kafka has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems.

pull

A consumer can deliberately _____ back to an old offset and re-consume data. This violates the common contract of a queue, but turns out to be an essential feature for many consumers. For example, if the consumer code has a bug and is discovered after some messages are consumed, the consumer can re-consume those messages once the bug is fixed.

rewind

The position of a consumer in each partition is a single integer: the _____ of the next message to consume. This makes the state about what has been consumed very small, just one number for each partition. This state can be periodically checkpointed. This makes the equivalent of message acknowledgements very cheap.

offset

"_____" delivery means messages may be lost but are never redelivered.

At most once

"_____" delivery means messages are never lost but may be redelivered.

At least once

"_____" delivery means messages are delivered once and only once. This is what Kafka implements.

Exactly once

When publishing a message Kafka has a notion of the message being "_____" to the log. It will not be lost as long as one broker that replicates the partition to which this message was written remains "alive". If a producer attempts to publish a message and experiences a network error it cannot be sure if this error happened before or after the message was committed. This is similar to the semantics of inserting into a database table with an autogenerated key.

commited

Kafka replicates the log for each topic's partitions across a configurable number of servers. Can you set this replication factor per topic?

Yes

Kafka replicates the _____ for each topic's partitions across a configurable number of servers.

log

Kafka is meant to be used with replication by default�in fact we implement un-replicated topics as replicated topics where the replication factor is one. The unit of replication is the topic _____.

partition

Under non-failure conditions, each partition in Kafka has a single _____ and zero or more _____.

leader followers

The total number of (partition?) replicas including the leader constitute the replication factor. All reads and writes go to the _____ of the partition.

leader

Typically, there are many more partitions than _____ and the leaders are evenly distributed among _____.

brokers

The logs on the _____ are identical to the leader's log�all have the same offsets and messages in the same order (though, of course, at any given time the leader may have a few as-yet unreplicated messages at the end of its log).

followers

Followers consume messages from the _____ just as a normal Kafka consumer would and apply them to their own log.

leader

A Kafka node is "in sync" if it meets 2 conditions: 1. A node must be able to maintain its session with _____2. If it is a follower, it must _____ writes happening on the leader without falling too far behind.

ZooKeeper replicate

The _____ keeps track of the set of "in sync" nodes.

leader

The determination of _____ replicas is controlled by the replica.lag.time.max.ms configuration.

stuck and lagging

If a follower dies, gets stuck, or falls behind, the _____ will remove it from the list of in sync replicas.

leader

A message is considered committed when all _____ for that partition have applied it to their log.

in sync replicas

A _____ message will not be lost, as long as there is at least one in sync replica alive, at all times.

committed

_____ have the option of waiting for the message to be committed, depending on their preference for tradeoff between latency and durability. This preference is controlled by the acks setting that the producer uses.

Producers

Topics have a setting for the "minimum number" of in-sync replicas that is checked when the _____ requests acknowledgment that a message has been written to the full set of in-sync replicas. If a less stringent acknowledgement is requested by the _____, then the message can be committed, and consumed, even if the number of in-sync replicas is lower than the minimum (e.g. it can be as low as just the leader).

producer

A Kafka partition is a replicated _____ which models the process of coming into consensus on the order of a series of values (generally numbering the log entries 0, 1, 2, ...). A leader chooses the ordering of values provided to it. As long as the leader remains alive, all followers need to only copy the values and ordering the leader chooses.

log

To choose its quorum set, Kafka dynamically maintains a set of _____ that are caught-up to the leader. Only members of this set are eligible for election as leader.

in-sync replicas (ISR)

A write to a Kafka partition is not considered committed until _____ have received the write.

all in-sync replicas

Kafka does not require that crashed nodes recover with all their data intact. Before being allowed to join the _____, a replica must fully re-sync again even if it lost unflushed data in its crash.

ISR

Kafka's guarantee with respect to data loss is predicated on _____ remaining in sync.

at least one replica

Systems must do something when all the replicas die - usually choosing between availability and consistency:

DEFAULT: Wait for a replica in the _____ to come back to life and choose this replica as the leader (hopefully it still has all its data). Kafka will remain unavailable as long as those replicas are down. If they or their data are gone, it is lost.
Choose the first replica (not necessarily in the _____) that comes back to life as the leader. If a non-in-sync replica comes back to life and we allow it to become leader, then its log becomes the source of truth even though it is not guaranteed to have every committed message.

ISR

When writing to Kafka, _____ can choose whether they wait for the message to be acknowledged by replicas. Note that "acknowledgement by all replicas" does not guarantee that the full set of assigned replicas have received the message.

producers

If a topic is configured with only two replicas and one fails (i.e., only one in sync replica remains), then writes that specify _____ will succeed. However, these writes could be lost if the remaining replica also fails. Although this ensures maximum availability of the partition, this behavior may be undesirable to some users who prefer durability over availability.

acks=all

A topic can disable _____ - if all replicas become unavailable, then the partition will remain unavailable until the most recent leader becomes available again. This prefers unavailability over the risk of message loss.

unclean leader election

A topic can specify a minimum _____ - the partition will only accept writes if the size of the ISR is above a certain minimum, in order to prevent the loss of messages that were written to just a single replica, which subsequently becomes unavailable. This setting only takes effect if the producer uses acks=all and guarantees that the message will be acknowledged by at least this many in-sync replicas. This setting offers a trade-off between consistency and availability. A higher setting for minimum ISR size guarantees better consistency since the message is guaranteed to be written to more replicas which reduces the probability that it will be lost. However, it reduces availability since the partition will be unavailable for writes if the number of in-sync replicas drops below the minimum threshold.

ISR size

Partitions A Kafka cluster will manage thousands of topic partitions, balanced within a cluster in a _____ fashion to avoid clustering all partitions for high-volume topics on a small number of nodes.

round-robin

Kafka balances leadership so that each _____ is the leader for a proportional share of its partitions.

node

Log _____ ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition. Use cases: - restoring state after application crashes or system failure- reloading caches after application restarts during operational maintenance

compaction

Log _____ gives us a more granular retention mechanism so that we are guaranteed to retain at least the last update for each primary key (e.g. bill@gmail.com). By doing this we guarantee that the log contains a full snapshot of the final value for every key not just keys that changed recently. This means downstream consumers can restore their own state off this topic without us having to retain a complete log of all changes.

compaction

Log compaction can be useful for _____. This is a style of application design which co-locates query processing with application design and uses a log of changes as the primary store for the application.

Event sourcing

Log _____ is useful when you have a data set in multiple data systems, and one of these systems is a database. For example you might have a database, a cache, a search cluster, and a Hadoop cluster. Each change to the database will need to be reflected in the cache, the search cluster, and eventually in Hadoop. In the case that one is only handling the real-time updates you only need recent log. But if you want to be able to reload the cache or restore a failed search node you may need a complete data set.

compaction

Log _____ is useful with a process that does local computation can be made fault-tolerant by logging out changes that it makes to its local state so another process can reload these changes and carry on if it should fail. A concrete example of this is handling counts, aggregations, and other "group by"-like processing in a stream query system. Samza, a real-time stream-processing framework, uses this feature for exactly this purpose.

compaction

Any _____ that stays caught-up to within the head of the log will see every message that is written; these messages will have sequential offsets. - The topic's min.compaction.lag.ms can be used to guarantee the minimum length of time must pass after a message is written before it could be compacted. I.e. it provides a lower bound on how long each message will remain in the (uncompacted) head. - The topic's max.compaction.lag.ms can be used to guarantee the maximum delay between the time a message is written and the time the message becomes eligible for compaction.

consumer

Ordering of messages is always maintained. Log compaction will never re-order messages, just _____ some.

remove

With log compaction, the offset for a message never changes. It is the permanent _____ for a position in the log.

identifier

Log _____ guarantees that any consumer progressing from the start of the log will see at least the final state of all records in the order they were written. Additionally, all delete markers for deleted records will be seen, provided the consumer reaches the head of the log in a time period less than the topic's delete.retention.ms setting (the default is 24 hours). In other words: since the removal of delete markers happens concurrently with reads, it is possible for a consumer to miss delete markers if it lags by more than delete.retention.ms.

compaction

Log compaction is handled by the _____, a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log.

log cleaner

Can log cleaning be enabled per-topic?

Yes

Kafka cluster has the ability to enforce _____ on requests to control the broker resources used by clients.

quotas

Two types of client quotas can be enforced by Kafka brokers for each group of clients sharing a quota:

_____ quotas define byte-rate thresholds (since 0.9)
_____ quotas define CPU utilization thresholds as a percentage of network and I/O threads (since 0.11)

Network bandwidth Request rate

Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket; in Linux this is done with the _____ system call.

sendfile system call

When a machine crashes or data needs to be re-loaded or re-processed, one needs to do a full load. _____ allows feeding both of these use cases off the same backing topic.

Log compaction. This style of usage of a log is described in more detail in this blog post.

All connections of a quota group share the quota configured for the group. For example, if (user="test-user", client-id="test-client") has a produce quota of 10MB/sec, this is shared across all _____ instances of user "test-user" with the client-id "test-client".

producer

Quotas can be applied to (user, client-id), user or client-id groups. For a given connection, the _____ quota matching the connection is applied.

most specific

The identity of Kafka clients is the _____ which represents an authenticated user in a secure cluster.

user principal

The tuple (_____, _____) defines a secure logical group of clients that share both user principal and client-id.

user, client-id

In a cluster that supports unauthenticated clients, _____ is a grouping of unauthenticated users chosen by the broker using a configurable PrincipalBuilder.

user principal

_____ is a logical grouping of clients with a meaningful name chosen by the client application.

Client-id

By default, each unique client group receives a fixed quota as configured by the cluster. This quota is defined and utilized by clients on a per-_____ basis before getting throttled.

broker

Messages consist of the variable-length items: - _____- opaque _____ byte array - opaque _____ byte array

header key

value

Messages are also known as...

records

Messages are always written in _____.

batches

Kafka consumer tracks the maximum offset it has consumed in each partition and has the capability to _____ offsets so that it can resume from those offsets in the event of a restart.

commit

Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the _____. Any consumer instance in that consumer group should send its offset commits and fetches to that group coordinator (broker). Consumer groups are assigned to coordinators based on their group names.

group coordinator

You have the option of either adding topics manually or having them be created automatically when data is first published to _____.

a non-existent topic

The Kafka cluster will automatically detect any broker shutdown or failure and elect new _____ for the partitions on that machine.

leaders

We refer to the process of replicating data between Kafka clusters "_____" to avoid confusion with the replication that happens amongst the nodes in a single cluster.

mirroring

Does Kafka come with a tool for mirroring data between clusters?

Yes

Too add servers to a Kafka clusters, assign them a _____ and start up Kafka on them.

unique broker ID

New Kafka servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new _____ are created.

topics

The partition reassignment tool can be used to move partitions across _____.

brokers

Kafka lets you apply a _____ to replication traffic, setting an upper bound on the bandwidth used to move replicas from machine to machine. This is useful when rebalancing a cluster, bootstrapping a new broker or adding or removing brokers, as it limits the impact these data-intensive operations will have on users.

throttle

The most important consumer configuration is the _____.

fetch size

Kafka always immediately writes all data to the filesystem and supports the ability to configure the _____ policy that controls when data is forced out of the OS cache and onto disk using the _____. It can force data to disk after a period of time or after a certain number of messages has been written.

flush

Kafka must eventually call _____ to know that data was flushed. When recovering from a crash for any log segment not known to be _____'d Kafka will check the integrity of each message by checking its CRC and also rebuild the accompanying offset index file as part of the recovery process executed on startup.

fsync

EXT4 has had more usage, but recent improvements to the _____ filesystem have shown it to have better performance characteristics for Kafka's workload with no compromise in stability.

XFS

_____ is the practice of capturing, storing, processing, routing and reacting to streams of events built from from event sources (databases, devices, software).

Event streaming

Is implementing data platforms and event-driven architecture a use case for event streaming?

Yes

Kafka distributed systems consist of _____ and _____ that communicate via a binary protocol over TCP.

clients and servers

Servers that run _____ continuously import/export data as event streams, integrating Kafka with your existing systems, databases or other Kafka clusters.

Kafka Connect

_____ allow you to write distributed applications and microservices that read/write/process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures.

Clients

An _____ records the fact that something happened in your system.

event (or "record"/"message")

Does Kafka's performance lower with data size?

Kafka's performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.

Draw a diagram: - A topic has four partitions P1, P2, P3, P4. - Two different producers are independently publishing new events to the topic by writing events over the network to the topic's partitions. Both can write to the same partition if appropriate.- Events with the same key (denoted by their color in the diagram) are written to the same partition.

A topic can be fault-tolerant and highly-available via being replicated across datacenters, so that there are always multiple _____ that have a copy of the data just in case things go wrong, you want to do maintenance on the brokers, and so on. A common production setting is a replication factor of 3, i.e., there will always be three copies of your data. This replication is performed at the level of topic-partitions.

brokers

Can Kafka be used to aggregate monitoring statistics from distributed applications to create centralized feeds of operational data?

Yes

Log aggregation typically collects physical log files off servers and puts them in HDFS or a central server for processing. Kafka abstracts away the details of files and gives a cleaner, lower-latency abstraction of log/event data as _____. This allows for easier support for multiple data sources and distributed data consumption.

a stream of messages

A processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users. Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called _____ is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm and Apache Samza.

Kafka Streams

_____ is a style of application design where state changes are logged as a time-ordered sequence of records.

Event sourcing

Kafka can serve as an external commit-log for a distributed system, helping replicate data between nodes and re-syncing failed nodes to restore their data. The _____ feature in Kafka helps support this usage.

log compaction

Kafka lets you read, write, store, and process _____ across many machines.

events

Do you have to create a topic before writing your events?

Yes

Events in Kafka are durably stored. Can they be read any number of times by any number of consumers?

Yes

_____ allows you to integrate (via 'connectors') and continuously ingest data from existing, external systems into Kafka, and vice versa.

Kafka Connect

You can process events with the _____ Java/Scala client library. The library supports exactly-once processing, stateful operations and aggregations, windowing, joins, processing based on event-time, etc.

Kafka Streams

If your disk usage favors linear reads then read-ahead is effectively pre-populating this cache with useful data on each disk read. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's _____.

pagecache

Efficient message compression requires compressing multiple messages together rather than compressing each message individually. A "_____" of messages can be clumped together compressed and sent to the server in this form. It will be written in compressed form and will remain compressed in the log and will only be decompressed by the consumer.

batch

To enable batching, the Kafka producer will attempt to accumulate data in memory and to send out larger batches of N messages in a single _____.

request

Kafka's topics are divided into a set of totally ordered _____, each consumed by exactly one consumer within each subscribing consumer group at any given time.

partitions

_____ aims to improve the availability of stream applications, consumer groups and other applications built on top of the group rebalance protocol. The rebalance protocol relies on the group coordinator to allocate entity ids to group members. These generated ids are ephemeral and will change when members restart and rejoin.

Kafka�s group management protocol allows group members to provide persistent entity ids. Group membership remains unchanged based on those ids, thus no rebalance will be triggered.

Static membership

If a majority of servers suffer a permanent failure, then you must either choose to lose _____ of your data or violate _____ by taking what remains on an existing server as your new source of truth.

100%

consistency

Each log _____ works as follows:

It chooses the log that has the highest ratio of log head to log tail
It creates a succinct summary of the last offset for each key in the head of the log
It recopies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment (not a fully copy of the log).
The summary of the log head is essentially just a space-compact hash table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner iteration can clean around 366GB of log head (assuming 1k messages).

compactor thread

The _____ controls which partition it publishes messages to. This can be done at random, or by some semantic partitioning function. You can specify a key to partition by and using this to hash to a partition. For example if the key chosen was a user id then all data for a given user would be sent to the same partition. This in turn will allow consumers to make locality assumptions about their consumption. This style of partitioning is explicitly designed to allow locality-sensitive processing in consumers.

client

_____ is a mechanism to give finer-grained per-record retention, rather than the coarser-grained time-based retention. The idea is to selectively remove records where we have a more recent update with the same primary key. This way the log is guaranteed to have at least the last state for each key.This retention policy can be set per-topic, so a single cluster can have some topics where retention is enforced by size or time and other topics where retention is enforced by compaction.

Log compaction

It is possible for producers and consumers to produce/consume very high volumes of data or generate requests at a very high rate and thus monopolize broker resources, cause network saturation and generally DOS other clients and the brokers themselves. Having _____ protects against these issues and is all the more important in large multi-tenant clusters where a small set of badly behaved clients can degrade user experience for the well behaved ones.

quotas

Quota configuration may be defined for _____. It is possible to override the default quota at any of the quota levels. The mechanism is similar to the per-topic log config overrides.

(user, client-id), user and client-id groups

_____ quotas are defined as the byte rate threshold for each group of clients sharing a quota. By default, each unique client group receives a fixed quota in bytes/sec as configured by the cluster. This quota is defined on a per-broker basis. Each group of clients can publish/fetch a maximum of X bytes/sec per broker before clients are throttled.

Network bandwidth

_____ quotas are defined as the percentage of time a client can utilize on request handler I/O threads and network threads of each broker within a quota window. A quota of n% represents n% of one thread, so the quota is out of a total capacity of ((num.io.threads + num.network.threads) * 100)%. Each group of clients may use a total percentage of upto n% across all I/O and network threads in a quota window before being throttled. Since the number of threads allocated for I/O and network threads are typically based on the number of cores available on the broker host, request rate quotas represent the total percentage of CPU that may be used by each group of clients sharing the quota.

Request rate quotas

When a server is stopped gracefully it has two optimizations it will take advantage of:

It will _____ to avoid needing to do any log recovery when it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery takes time so this speeds up intentional restarts.
It will _____ to other replicas prior to shutting down. This will make the leadership transfer faster and minimize the time each partition is unavailable to a few milliseconds.

sync all its logs to disk migrate any partitions the server is the leader for

Whenever a broker stops or crashes, leadership for that broker's partitions transfers to other replicas. When the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes. To avoid this imbalance, Kafka has a notion of _____.

preferred replicas

Does the partition reassignment tool have the ability to automatically generate a reassignment plan for decommissioning brokers?

No - the admin has to come up with a reassignment plan to move the replica for all partitions hosted on the broker to be decommissioned, to the rest of the brokers.

The most important producer configurations are _____, _____ and _____

acks compression batch size

Kafka uses _____ Metrics for metrics reporting in the server. The Java clients use Kafka Metrics, a built-in metrics registry that minimizes transitive dependencies pulled into client applications. Both expose metrics via JMX and can be configured to report stats using pluggable stats reporters to hook up to your monitoring system.

Yammer

Can asynchronous workflows reduce request time for expensive operations (that would otherwise be performed in-line)?

Yes

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kafka-faq

About

katademy-decks/kafka

Folders and files

Latest commit

History

Repository files navigation

kafka-faq

About

Topics

Resources

Stars

Watchers

Forks