Created this project to learn Kafka
Below is a documentation of my learning process, starting at 12:30pm, 27th July 2020
Please also visit my Hadoop repo here: https://github.com/johnobla/hadoop
-
Decided to learn Kafka 🥳
-
Searched Youtube for videos on what Kafka actually is
-
Learned the concept of Kafka, to decouple data streams and systems by providing real time messaging
-
Found out the way companies use Kafka to provide real time interactions/analytics; Netflix, Uber, LinkedIn, etc.
-
Watched lecture by Confluent that explained the above concepts in more detail
☝ The lecture also detailed Kafka's logs (Topics) and how each event is stored
-
Installed Docker 🐳
-
Started Confluent's tutorial series ✅
-
Stopped Confluent's tutorial series ❌
☝ They had a very "code first, ask questions later" approach, but I learn better when I understand the concepts that I'm learning before I start coding
-
Found a better Kafka learning series ✅
☝ This series goes indepth on the concepts before coding
-
Learned the basics of Topics, Partitions, and Offsets
-
Learned the basic of Brokers and Clusters, and how partitions are shared across Brokers
-
Learned about Replication Factors and partition Leaders
-
Learned about Producers, their automatic load-balancing, and their different acknowlegement levels (0, 1, 2)
-
Learned about Producer message keys
-
Learned and Consumers and Consumer Groups
-
Learned about Consumer Offsets, the Offsets Topic, and Delivery Semantics
-
At most once
-
At least once
-
Exactly once 🤩👌
-
-
Learned about Broker Discovery
-
Learned about how Zookeper manages Kafka's brokers
-
Revised the Kafka Guarantees 📜
-
Messages are appended to a topic-partition in the order they are sent
-
Consumers read messages in the order stored in a topic-partition
-
With a replication factor of N, producers and consumers can tolerate up to N-1 brokers being down
☝ (N ≥ 3 is best, as a broker can be taken down for maintenance while another can go down unexpectedly)
-
As long as the number of partitions remains constant for a topic, the same key will always go to the same partition
-
-
Installed Java JDK 8
-
Installed Kafka and added PATH file for
kafka_2.12-2.5.0\bin\windows
🎉 -
Created data directory for Zookeeper, and log directory for Kafka
-
Started servers for Zookeper and Kafka
-
Discovered bug where trying to delete topics on windows causes kafka to crash 🤷♂️. I'll complete the rest of the course on my Linux VM
-
Getting
Could not get lock
error in linux -
Followed online documentation, now getting different error
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
😢 -
Absolutely could not get JDK 8 installed on Linux, using Macbook instead
-
Can't install JDK8 through brew due to licenscing 🙄
-
JDK 8 successfully installed manually 🎉
-
Added kafka folder to repo
-
Added kafka_2.12-2.5.0/bin to zsh path
-
Started servers for Zookeeper and Kafka 🎉
-
Created aliases to start Zookeeper and Kafka with relevant config files
-
Learned how to list, descripe, create, and delete topics
-
Created alias for kafka-topics
-
Learned how producers can write to a topic that doesn't exist yet
👆 Not recommended due to poor default settings of new topic
-
Changed default partition number to 3 in
server.properties
-
Learned how console-producer can stream messages to console-consumer
-
Used
--from-beginning
to retrieve messages sent before the consumer was initialised -
Played around with multiple consumers and consumer groups reading from the same producer
-
Changed kafka-topics to use
--bootstrap-server localhost:9092
-
Learned about consumer groups, and the relationship between offsets and lag
-
Changed offset positions using consumer groups
-
Started programming Kafka with Java 🥳
-
Installed Maven
-
Programmed kafka producer using java
-
Added kafka aliases to repo
-
Programmed kafka producer to use callbacks