Skip to content

katademy-decks/kafka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

169 questions and answers about Apache Kafka and growing. Download the flashcards for spaced repetition!

kafka-faq

When a _____ detects a quota violation, it computes and returns the amount of delay needed to bring the violating client under its quota. It then _____ to the client, refusing to process requests from it until the delay is over. The client will also refrain from sending further requests to the broker during the delay.  broker  mutes the channel bi-directionally
For any Linux filesystem used for data directories, enabling the _____ option is recommended, as it disables updating of a file's atime (last access time) attribute when the file is read. This can eliminate a significant number of filesystem writes, as Kafka does not rely on the atime attributes at all. noatime
Could you use an asynchronous workflow to do expensive work (such as periodic data aggregation) in advance? Yes
Can message queues receive messages? Yes
Can message queues hold messages? Yes
Can message queues deliver messages? Yes
Can Redis be used as a message broker? Yes
Can messages be lost in a Redis message broker? Yes
An application publishes a job to a message queue, then notifies the user of the job status. A _____ picks up the job from the queue, processes it, then signals its completion. worker
In asynchronous workflows, jobs are processed in the _____ without blocking the user. For example, a tweet can instantly appear on your timeline, but could take some time before it is actually delivered to followers. background
Can queues add delays to operations? Yes
When dealing with many inexpensive or realtime operations, are queues a good use case? They can be, but they can introduce delay and complexity compared to synchronous execution.
A queue has grown significantly, becoming larger than available memory. What are some problems that may appear? Cache misses, disk reads, slower performance
_____ pressure limits queue sizes, allowing for good throughput / latency for jobs already in the queue. Once filled, the queue's clients are asked to try again later. Back pressure
What protocol is used in RabbitMQ message queues? AMQP
Scheduled _____ queues receive tasks, run them and deliver the results. Task queues
Are real-time payments and financial transactions a use case for event streaming? Yes
Is real-time shipment/logistics monitoring a use case for event streaming? Yes
Is real-time IoT device monitoring a use case for event streaming? Yes
Is user interaction telemetry a use case for event streaming? Yes
Is microservice implementation a use case for event streaming? Yes
Kafka's three key capabilities are: To _____/_____ to streams of events To _____ streams of events durably, reliably and indefinitely. To _____ streams of events, as they occur or retrospectively. publish / subscribe store process
Can Kafka implement continous import/export of your data from/to other systems? Yes
Kafka is run as a cluster of one or more servers that can span multiple datacenters or cloud regions. Some of these servers form the storage layer, called the _____.  brokers
If a Kafka server fails, the other servers will _____ to ensure continuous operations without any data loss. take over its work
Reading / writing data to Kafka is done in the form of _____.  events
An event consists of:
  • _____: "Alice"
  • _____: "Made a payment of $200 to Bob"
  • _____: "Jun. 25, 2020 at 2:06 p.m."
  • _____ (optional)
a key a value a timestamp

metadata (optional)

_____ are client applications that publish (write) events to Kafka Producers.
_____ are clients that subscribe to (read and process) Kafka events. consumers
Do produces sometimes need to wait for consumers by design? No - they are fully decoupled
Does Kafka provide the ability to process an event exactly once? Yes - guarantees it.
Events are organized and durably stored in _____, similar to files stored in a folder. topics
An example _____ name could be "payments".  topic
_____ in Kafka are always multi-producer and multi-subscriber: each can always have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events. Topics
A Kafka event has been consumed. What happens to it? It is retained for as long as it is defined to be retained, configured per topic.
Topics are partitioned, meaning a topic is spread over a number of "_____" located on different Kafka brokers. buckets
Topics are distributed via partitioning (buckets). This improves scalability because it allows client applications to both read and write the data from/to many _____ at the same time.  brokers
Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as _____. they were written
When a new event is published to a topic, it is actually appended to one of the topic's _____. Events with the same event key (such as ID) are all written to the same one. partitions
Can Kafka replace a traditional message broker? Yes
Can Kafka be used for log aggregation? Yes
Can Kafka process data in multiple-stage pipelines, where raw input data is consumed from Topics, then aggregated/enriched/transformed into new topics for further consumptions and processing? Yes
Can an event represent a payment transaction? Yes
Can an event represent a geolocation update? Yes
Can an event represent a shipping order? Yes
Can Kafka support log aggregation? Yes
Does Kafka support large data backlogs? Yes
In Kafka, can you process feeds to create new, derived feeds? Yes - implemented by partitioning and the consumer model.
Kafka relies heavily on the _____ for storing and caching messages.  filesystem
In Kafka, using the filesystem and relying on _____ is superior to maintaining an in-memory cache or other structure�we at least double the available cache by having automatic access to all free memory, and likely double again by storing a compact byte structure rather than individual objects.  pagecache
Kafka protocol is built around a "_____" abstraction where network requests group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time. The server in turn appends chunks of messages to its log in one go, and the consumer fetches large linear chunks at a time. message set
Byte copying can be an inefficiency while under large load. To avoid this we employ a standardized binary message format that is shared by the _____, the _____ and the _____ (so data chunks can be transferred without modification between them). producer, broker and consumer
The _____ maintained by the broker is itself just a directory of files, each populated by a sequence of message sets that have been written to disk in the same format used by the producer and consumer. message log 
The producer sends data directly to the broker that is the _____ for the partition. To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriately direct its requests. leader
The Kafka _____ works by issuing "fetch" requests to the brokers leading the partitions it wants to consume.  It specifies its offset in the log with each request and receives back a chunk of log beginning from that position, with possibility to rewind it to re-consume data as needed. consumer
In Kafka, data is pushed from the _____ to the _____.   producer broker
In Kafka, data is pulled from the _____ by the _____.  broker consumer
A _____-based system like Kafka has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. pull
A consumer can deliberately _____ back to an old offset and re-consume data. This violates the common contract of a queue, but turns out to be an essential feature for many consumers. For example, if the consumer code has a bug and is discovered after some messages are consumed, the consumer can re-consume those messages once the bug is fixed. rewind
The position of a consumer in each partition is a single integer: the _____ of the next message to consume.  This makes the state about what has been consumed very small, just one number for each partition. This state can be periodically checkpointed. This makes the equivalent of message acknowledgements very cheap. offset
"_____" delivery means messages may be lost but are never redelivered. At most once
"_____" delivery means messages are never lost but may be redelivered. At least once
"_____" delivery means messages are delivered once and only once. This is what Kafka implements. Exactly once
When publishing a message Kafka has a notion of the message being "_____" to the log. It will not be lost as long as one broker that replicates the partition to which this message was written remains "alive".  If a producer attempts to publish a message and experiences a network error it cannot be sure if this error happened before or after the message was committed. This is similar to the semantics of inserting into a database table with an autogenerated key. commited
Kafka replicates the log for each topic's partitions across a configurable number of servers. Can you set this replication factor per topic? Yes
Kafka replicates the _____ for each topic's partitions across a configurable number of servers. log
Kafka is meant to be used with replication by default�in fact we implement un-replicated topics as replicated topics where the replication factor is one. The unit of replication is the topic _____.  partition
Under non-failure conditions, each partition in Kafka has a single _____ and zero or more _____.  leader followers
The total number of (partition?) replicas including the leader constitute the replication factor. All reads and writes go to the _____ of the partition.  leader
Typically, there are many more partitions than _____ and the leaders are evenly distributed among _____.  brokers
The logs on the _____ are identical to the leader's log�all have the same offsets and messages in the same order (though, of course, at any given time the leader may have a few as-yet unreplicated messages at the end of its log). followers
Followers consume messages from the _____ just as a normal Kafka consumer would and apply them to their own log. leader
A Kafka node is "in sync" if it meets 2 conditions: 1. A node must be able to maintain its session with _____2. If it is a follower, it must _____ writes happening on the leader without falling too far behind. ZooKeeper replicate
The _____ keeps track of the set of "in sync" nodes. leader
The determination of _____ replicas is controlled by the replica.lag.time.max.ms configuration. stuck and lagging
If a follower dies, gets stuck, or falls behind, the _____ will remove it from the list of in sync replicas.  leader
A message is considered committed when all _____ for that partition have applied it to their log. in sync replicas
A _____ message will not be lost, as long as there is at least one in sync replica alive, at all times. committed
_____ have the option of waiting for the message to be committed, depending on their preference for tradeoff between latency and durability. This preference is controlled by the acks setting that the producer uses.  Producers
Topics have a setting for the "minimum number" of in-sync replicas that is checked when the _____ requests acknowledgment that a message has been written to the full set of in-sync replicas.  If a less stringent acknowledgement is requested by the _____, then the message can be committed, and consumed, even if the number of in-sync replicas is lower than the minimum (e.g. it can be as low as just the leader). producer
A Kafka partition is a replicated _____ which models the process of coming into consensus on the order of a series of values (generally numbering the log entries 0, 1, 2, ...).  A leader chooses the ordering of values provided to it. As long as the leader remains alive, all followers need to only copy the values and ordering the leader chooses. log
To choose its quorum set, Kafka dynamically maintains a set of _____ that are caught-up to the leader. Only members of this set are eligible for election as leader. in-sync replicas (ISR)
A write to a Kafka partition is not considered committed until _____ have received the write. all in-sync replicas
Kafka does not require that crashed nodes recover with all their data intact. Before being allowed to join the _____, a replica must fully re-sync again even if it lost unflushed data in its crash. ISR
Kafka's guarantee with respect to data loss is predicated on _____ remaining in sync. at least one replica
Systems must do something when all the replicas die - usually choosing between availability and consistency:
  1. DEFAULT: Wait for a replica in the _____ to come back to life and choose this replica as the leader (hopefully it still has all its data). Kafka will remain unavailable as long as those replicas are down. If they or their data are gone, it is lost.
  2. Choose the first replica (not necessarily in the _____) that comes back to life as the leader. If a non-in-sync replica comes back to life and we allow it to become leader, then its log becomes the source of truth even though it is not guaranteed to have every committed message. 
ISR
When writing to Kafka, _____ can choose whether they wait for the message to be acknowledged by replicas. Note that "acknowledgement by all replicas" does not guarantee that the full set of assigned replicas have received the message. producers
If a topic is configured with only two replicas and one fails (i.e., only one in sync replica remains), then writes that specify _____ will succeed. However, these writes could be lost if the remaining replica also fails. Although this ensures maximum availability of the partition, this behavior may be undesirable to some users who prefer durability over availability.  acks=all
A topic can disable _____ - if all replicas become unavailable, then the partition will remain unavailable until the most recent leader becomes available again. This prefers unavailability over the risk of message loss.  unclean leader election
A topic can specify a minimum _____ - the partition will only accept writes if the size of the ISR is above a certain minimum, in order to prevent the loss of messages that were written to just a single replica, which subsequently becomes unavailable. This setting only takes effect if the producer uses acks=all and guarantees that the message will be acknowledged by at least this many in-sync replicas. This setting offers a trade-off between consistency and availability. A higher setting for minimum ISR size guarantees better consistency since the message is guaranteed to be written to more replicas which reduces the probability that it will be lost. However, it reduces availability since the partition will be unavailable for writes if the number of in-sync replicas drops below the minimum threshold. ISR size
Partitions A Kafka cluster will manage thousands of topic partitions, balanced within a cluster in a _____ fashion to avoid clustering all partitions for high-volume topics on a small number of nodes.  round-robin
Kafka balances leadership so that each _____ is the leader for a proportional share of its partitions. node
Log _____ ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition.  Use cases: - restoring state after application crashes or system failure- reloading caches after application restarts during operational maintenance compaction
Log _____ gives us a more granular retention mechanism so that we are guaranteed to retain at least the last update for each primary key (e.g. bill@gmail.com). By doing this we guarantee that the log contains a full snapshot of the final value for every key not just keys that changed recently. This means downstream consumers can restore their own state off this topic without us having to retain a complete log of all changes. compaction
Log compaction can be useful for _____. This is a style of application design which co-locates query processing with application design and uses a log of changes as the primary store for the application. Event sourcing
Log _____ is useful when you have a data set in multiple data systems, and one of these systems is a database.  For example you might have a database, a cache, a search cluster, and a Hadoop cluster. Each change to the database will need to be reflected in the cache, the search cluster, and eventually in Hadoop. In the case that one is only handling the real-time updates you only need recent log. But if you want to be able to reload the cache or restore a failed search node you may need a complete data set. compaction
Log _____ is useful with a process that does local computation can be made fault-tolerant by logging out changes that it makes to its local state so another process can reload these changes and carry on if it should fail. A concrete example of this is handling counts, aggregations, and other "group by"-like processing in a stream query system. Samza, a real-time stream-processing framework, uses this feature for exactly this purpose. compaction
Any _____ that stays caught-up to within the head of the log will see every message that is written; these messages will have sequential offsets.  - The topic's min.compaction.lag.ms can be used to guarantee the minimum length of time must pass after a message is written before it could be compacted. I.e. it provides a lower bound on how long each message will remain in the (uncompacted) head. - The topic's max.compaction.lag.ms can be used to guarantee the maximum delay between the time a message is written and the time the message becomes eligible for compaction. consumer
Ordering of messages is always maintained. Log compaction will never re-order messages, just _____ some. remove
With log compaction, the offset for a message never changes. It is the permanent _____ for a position in the log. identifier
Log _____ guarantees that any consumer progressing from the start of the log will see at least the final state of all records in the order they were written. Additionally, all delete markers for deleted records will be seen, provided the consumer reaches the head of the log in a time period less than the topic's delete.retention.ms setting (the default is 24 hours). In other words: since the removal of delete markers happens concurrently with reads, it is possible for a consumer to miss delete markers if it lags by more than delete.retention.ms. compaction
Log compaction is handled by the _____, a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log.  log cleaner
Can log cleaning be enabled per-topic? Yes
Kafka cluster has the ability to enforce _____ on requests to control the broker resources used by clients.  quotas
Two types of client quotas can be enforced by Kafka brokers for each group of clients sharing a quota:
  1. _____ quotas define byte-rate thresholds (since 0.9)
  2. _____ quotas define CPU utilization thresholds as a percentage of network and I/O threads (since 0.11)
Network bandwidth Request rate
Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket; in Linux this is done with the _____ system call. sendfile system call
When a machine crashes or data needs to be re-loaded or re-processed, one needs to do a full load. _____ allows feeding both of these use cases off the same backing topic.  Log compaction. This style of usage of a log is described in more detail in this blog post.
All connections of a quota group share the quota configured for the group. For example, if (user="test-user", client-id="test-client") has a produce quota of 10MB/sec, this is shared across all _____ instances of user "test-user" with the client-id "test-client". producer
Quotas can be applied to (user, client-id), user or client-id groups. For a given connection, the _____ quota matching the connection is applied.  most specific
The identity of Kafka clients is the _____ which represents an authenticated user in a secure cluster.  user principal
The tuple (_____, _____) defines a secure logical group of clients that share both user principal and client-id. user, client-id
In a cluster that supports unauthenticated clients, _____ is a grouping of unauthenticated users chosen by the broker using a configurable PrincipalBuilder user principal
_____ is a logical grouping of clients with a meaningful name chosen by the client application.  Client-id
By default, each unique client group receives a fixed quota as configured by the cluster. This quota is defined and utilized by clients on a per-_____ basis before getting throttled. broker
Messages consist of the variable-length items: - _____- opaque _____ byte array - opaque _____ byte array header key

value

Messages are also known as... records
Messages are always written in _____.  batches
Kafka consumer tracks the maximum offset it has consumed in each partition and has the capability to _____ offsets so that it can resume from those offsets in the event of a restart.  commit
Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the _____.  Any consumer instance in that consumer group should send its offset commits and fetches to that group coordinator (broker). Consumer groups are assigned to coordinators based on their group names.  group coordinator
You have the option of either adding topics manually or having them be created automatically when data is first published to _____. a non-existent topic
The Kafka cluster will automatically detect any broker shutdown or failure and elect new _____ for the partitions on that machine. leaders
We refer to the process of replicating data between Kafka clusters "_____" to avoid confusion with the replication that happens amongst the nodes in a single cluster.  mirroring
Does Kafka come with a tool for mirroring data between clusters? Yes
Too add servers to a Kafka clusters, assign them a _____ and start up Kafka on them. unique broker ID
New Kafka servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new _____ are created.  topics
The partition reassignment tool can be used to move partitions across _____.  brokers
Kafka lets you apply a _____ to replication traffic, setting an upper bound on the bandwidth used to move replicas from machine to machine. This is useful when rebalancing a cluster, bootstrapping a new broker or adding or removing brokers, as it limits the impact these data-intensive operations will have on users. throttle
The most important consumer configuration is the _____. fetch size
Kafka always immediately writes all data to the filesystem and supports the ability to configure the _____ policy that controls when data is forced out of the OS cache and onto disk using the _____. It can force data to disk after a period of time or after a certain number of messages has been written. flush
Kafka must eventually call _____ to know that data was flushed. When recovering from a crash for any log segment not known to be _____'d Kafka will check the integrity of each message by checking its CRC and also rebuild the accompanying offset index file as part of the recovery process executed on startup. fsync
EXT4 has had more usage, but recent improvements to the _____ filesystem have shown it to have better performance characteristics for Kafka's workload with no compromise in stability. XFS
_____ is the practice of capturing, storing, processing, routing and reacting to streams of events built from from event sources (databases, devices, software). Event streaming
Is implementing data platforms and event-driven architecture a use case for event streaming? Yes
Kafka distributed systems consist of _____ and _____ that communicate via a binary protocol over TCP. clients and servers
Servers that run _____ continuously import/export data as event streams, integrating Kafka with your existing systems, databases or other Kafka clusters. Kafka Connect
_____ allow you to write distributed applications and microservices that read/write/process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures. Clients
An _____ records the fact that something happened in your system.  event (or "record"/"message")
Does Kafka's performance lower with data size? Kafka's performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.
Draw a diagram: - A topic has four partitions P1, P2, P3, P4. - Two different producers are independently publishing new events to the topic by writing events over the network to the topic's partitions. Both can write to the same partition if appropriate.- Events with the same key (denoted by their color in the diagram) are written to the same partition. 
A topic can be fault-tolerant and highly-available via being replicated across datacenters, so that there are always multiple _____ that have a copy of the data just in case things go wrong, you want to do maintenance on the brokers, and so on. A common production setting is a replication factor of 3, i.e., there will always be three copies of your data. This replication is performed at the level of topic-partitions. brokers
Can Kafka be used to aggregate monitoring statistics from distributed applications to create centralized feeds of operational data? Yes
Log aggregation typically collects physical log files off servers and puts them in HDFS or a central server for processing. Kafka abstracts away the details of files and gives a cleaner, lower-latency abstraction of log/event data as _____. This allows for easier support for multiple data sources and distributed data consumption. a stream of messages
A processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic; further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic; a final processing stage might attempt to recommend this content to users.  Such processing pipelines create graphs of real-time data flows based on the individual topics. Starting in 0.10.0.0, a light-weight but powerful stream processing library called _____ is available in Apache Kafka to perform such data processing as described above. Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm and Apache Samza. Kafka Streams 
_____ is a style of application design where state changes are logged as a time-ordered sequence of records. Event sourcing
Kafka can serve as an external commit-log for a distributed system, helping replicate data between nodes and re-syncing failed nodes to restore their data. The _____ feature in Kafka helps support this usage. log compaction
Kafka lets you read, write, store, and process _____ across many machines. events
Do you have to create a topic before writing your events? Yes
Events in Kafka are durably stored. Can they be read any number of times by any number of consumers? Yes
_____ allows you to integrate (via 'connectors') and continuously ingest data from existing, external systems into Kafka, and vice versa. Kafka Connect
You can process events with the _____ Java/Scala client library. The library supports exactly-once processing, stateful operations and aggregations, windowing, joins, processing based on event-time, etc. Kafka Streams
If your disk usage favors linear reads then read-ahead is effectively pre-populating this cache with useful data on each disk read. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's _____. pagecache
Efficient message compression requires compressing multiple messages together rather than compressing each message individually. A "_____" of messages can be clumped together compressed and sent to the server in this form. It will be written in compressed form and will remain compressed in the log and will only be decompressed by the consumer. batch
To enable batching, the Kafka producer will attempt to accumulate data in memory and to send out larger batches of N messages in a single _____. request
Kafka's topics are divided into a set of totally ordered _____, each consumed by exactly one consumer within each subscribing consumer group at any given time.  partitions
_____ aims to improve the availability of stream applications, consumer groups and other applications built on top of the group rebalance protocol. The rebalance protocol relies on the group coordinator to allocate entity ids to group members. These generated ids are ephemeral and will change when members restart and rejoin. 

Kafka�s group management protocol allows group members to provide persistent entity ids. Group membership remains unchanged based on those ids, thus no rebalance will be triggered.

Static membership

If a majority of servers suffer a permanent failure, then you must either choose to lose _____ of your data or violate _____ by taking what remains on an existing server as your new source of truth. 100%

consistency

Each log _____ works as follows:
  1. It chooses the log that has the highest ratio of log head to log tail
  2. It creates a succinct summary of the last offset for each key in the head of the log
  3. It recopies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment (not a fully copy of the log).
  4. The summary of the log head is essentially just a space-compact hash table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner iteration can clean around 366GB of log head (assuming 1k messages).
compactor thread
The _____ controls which partition it publishes messages to. This can be done at random, or by some semantic partitioning function.  You can specify a key to partition by and using this to hash to a partition. For example if the key chosen was a user id then all data for a given user would be sent to the same partition.  This in turn will allow consumers to make locality assumptions about their consumption. This style of partitioning is explicitly designed to allow locality-sensitive processing in consumers. client
_____ is a mechanism to give finer-grained per-record retention, rather than the coarser-grained time-based retention. The idea is to selectively remove records where we have a more recent update with the same primary key.  This way the log is guaranteed to have at least the last state for each key.This retention policy can be set per-topic, so a single cluster can have some topics where retention is enforced by size or time and other topics where retention is enforced by compaction. Log compaction
It is possible for producers and consumers to produce/consume very high volumes of data or generate requests at a very high rate and thus monopolize broker resources, cause network saturation and generally DOS other clients and the brokers themselves. Having _____ protects against these issues and is all the more important in large multi-tenant clusters where a small set of badly behaved clients can degrade user experience for the well behaved ones. quotas
Quota configuration may be defined for _____. It is possible to override the default quota at any of the quota levels. The mechanism is similar to the per-topic log config overrides.  (user, client-id), user and client-id groups
_____ quotas are defined as the byte rate threshold for each group of clients sharing a quota. By default, each unique client group receives a fixed quota in bytes/sec as configured by the cluster.  This quota is defined on a per-broker basis. Each group of clients can publish/fetch a maximum of X bytes/sec per broker before clients are throttled. Network bandwidth 
_____ quotas are defined as the percentage of time a client can utilize on request handler I/O threads and network threads of each broker within a quota window. A quota of n% represents n% of one thread, so the quota is out of a total capacity of ((num.io.threads + num.network.threads) * 100)%. Each group of clients may use a total percentage of upto n% across all I/O and network threads in a quota window before being throttled. Since the number of threads allocated for I/O and network threads are typically based on the number of cores available on the broker host, request rate quotas represent the total percentage of CPU that may be used by each group of clients sharing the quota. Request rate quotas
When a server is stopped gracefully it has two optimizations it will take advantage of:
  1. It will _____ to avoid needing to do any log recovery when it restarts (i.e. validating the checksum for all messages in the tail of the log). Log recovery takes time so this speeds up intentional restarts.
  2. It will _____ to other replicas prior to shutting down. This will make the leadership transfer faster and minimize the time each partition is unavailable to a few milliseconds.
sync all its logs to disk migrate any partitions the server is the leader for 
Whenever a broker stops or crashes, leadership for that broker's partitions transfers to other replicas. When the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes. To avoid this imbalance, Kafka has a notion of _____. preferred replicas
Does the partition reassignment tool have the ability to automatically generate a reassignment plan for decommissioning brokers?  No - the admin has to come up with a reassignment plan to move the replica for all partitions hosted on the broker to be decommissioned, to the rest of the brokers.
The most important producer configurations are _____, _____ and _____ acks compression batch size
Kafka uses _____ Metrics for metrics reporting in the server.  The Java clients use Kafka Metrics, a built-in metrics registry that minimizes transitive dependencies pulled into client applications.  Both expose metrics via JMX and can be configured to report stats using pluggable stats reporters to hook up to your monitoring system. Yammer
Can asynchronous workflows reduce request time for expensive operations (that would otherwise be performed in-line)? Yes