# Apache Kafka
Powering Real-Time Data at Scale

![Apache Kafka](images/logo.jpg)

## Introduction
This workshop aims to provide a practical understanding of Apache Kafka's core concepts and hands-on experience with its functionality.

### Objectives
- Understand the key concepts of Event Streaming and Apache Kafka.
- Learn how to produce and consume messages.
- Explore advanced features like partitions and fault tolerance.
- Conclude with a quiz to summarize and memorize 

[Apache Kafka Official Documentation](https://kafka.apache.org/documentation/)


# Introduction to Event Streaming

Event streaming is a technology paradigm for continuously capturing, storing, processing, and reacting to events happening across your business or applications in real time. It fundamentally changes how data is handled, moving away from batch processing to a continual flow of data.

Event streaming platforms like Apache Kafka enable the collection, integration, and analysis of massive streams of event data from multiple sources.

![Event Streaming](images/event.svg)

## The Relevance of Event Streaming

Event streaming has become vital in today’s digital world, where real-time data and insights are crucial for decision making. Applications range from real-time analytics and monitoring to data integration and microservices communication.

### <span style="color:red">Question: Why do you think real-time data processing is important in modern applications?</span>


## What is Apache Kafka?

Apache Kafka is a distributed event streaming platform that provides high-throughput, highly scalable, and fault-tolerant event streaming capabilities. It is designed to handle real-time data feeds and provides a unified platform for both producing and consuming data streams.

### Key Features:
- **High Throughput**: Capable of handling millions of events per second.
- **Scalability**: Easily scales horizontally to accommodate growing data.
- **Fault Tolerance**: Robust against system failures, ensuring no data loss.
- **Real-Time Processing**: Enables immediate data processing and decision making.

![Applications](images/apps.png)

# Companies Using Apache Kafka

Apache Kafka is widely adopted by numerous companies across various industries. Its ability to handle large-scale, real-time data makes it a preferred choice for modern data architectures.

## Some Notable Companies:
- **LinkedIn**: Originally developed Kafka to handle their activity stream and operational metrics.
- **Netflix**: Utilizes Kafka for real-time monitoring and event processing in their streaming service.
- **Uber**: Employs Kafka for gathering user, trip, and geospatial data for real-time analytics and decision-making.
- **Twitter**: Uses Kafka as a backbone for their event streaming architecture, handling billions of events each day.

### <span style="color:red">Question: Can you think of a scenario in your industry or field where Kafka's capabilities would be beneficial?</span>


# Transition to Kafka Key Concepts

Now that we understand the importance of event streaming and the role of Apache Kafka in this domain, let's delve into the key concepts that make Kafka a powerful tool for event-driven data processing.


## Overview of Apache Kafka

### Key Concepts

Understanding the fundamental concepts of Apache Kafka is crucial for working with this powerful streaming platform. In this section, we will explore the essential components and their roles in Kafka's architecture.

- **Topics**: Categories where records are stored.
- **Producers**: Entities that publish messages to topics.
- **Consumers**: Entities that subscribe to topics and process messages.
- **Brokers**: Servers in a Kafka cluster that store data and serve clients.  
- **Partitions**: Kafka topics are divided into partitions, which allow for data to be distributed and parallelized across multiple brokers.
- **Offsets**: Unique identifiers of records within a partition.

![Kafka Architecture](images/simple.png)


## 1. Topics

- **Definition**: A topic is a category or feed name to which records are published. It is like a channel where data is stored and distributed.
- **Characteristics**:
  - Topics are partitioned for scalability.
  - Data within a topic is immutable.
- **Use Case**: Different topics for logs, metrics, customer activities, etc.

### <span style="color:red">Question: What purpose do partitions within a topic serve in Apache Kafka?</span>


## 2. Producers

- **Definition**: Applications or processes that publish data to Kafka topics.
- **How it Works**:
  - Producers send data to topics, optionally choosing the partition.
  - Data can be sent synchronously or asynchronously.
- **Key Points**: Responsible for key determination and efficient data distribution.

### <span style="color:red">Question: How do producers influence which partition a message is sent to in Kafka?</span>


## 3. Consumers

- **Definition**: Processes that read data from Kafka topics.
- **Consumption Patterns**:
  - Subscribe to one or more topics and read data in order.
  - Track which records have been consumed using offsets.
- **Grouping**: Part of a consumer group to avoid duplicate processing.

### <span style="color:red">Question: Why is it important for consumers to track offsets in Kafka?</span>


![Topic](images/topic.png)

![Producers](images/producers.png)

## 4. Brokers

- **Definition**: Servers in a Kafka cluster storing data and serving clients.
- **Cluster Role**:
  - Handle load balancing and fault tolerance.
  - Manage requests from producers and serve data to consumers.
- **Replication**: Ensures data availability and durability.

### <span style="color:red">Question: What is the role of a broker in the Kafka architecture?</span>


## 5. Partitions

- **Definition**: Divisions within a Kafka topic.
- **Scaling and Performance**:
  - Enable parallel processing across nodes.
  - Hosted on different servers for better data handling and consumer management.

### <span style="color:red">Question: How do partitions contribute to Kafka’s scalability and fault tolerance?</span>

![Partitions](images/partitions.png)

![More Consumers](images/more_consumers.png)

![Impossible](images/impossible.png)

![Idle Consumer](images/idle_consumer.png)

![More Groups](images/more_groups.png)

## 6. Offsets

- **Definition**: Unique identifiers for each record within a partition.
- **Consumer Tracking**:
  - Enable consumers to keep track of consumed messages.
  - Allow resuming reading from the last consumed offset.

### <span style="color:red">Question: What would happen if Kafka consumers didn't use offsets?</span>
