<a href="https://colab.research.google.com/github/martin-fabbri/colab-notebooks/blob/master/kafka/explore_how_kafka_partitions_work_pynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Explore How Kafka Partitions Work

In [0]:
#@title ## Setup Kafka
#@markdown This cell will install Kafka 2.12
%%bash
sudo apt-get update -qq
sudo wget https://www-us.apache.org/dist/kafka/2.3.0/kafka_2.12-2.3.0.tgz -q
sudo tar -xzf kafka_2.12-2.3.0.tgz
sudo mv kafka_2.12-2.3.0 /opt/kafka

In [0]:
#@title ## Start services
#@markdown Start ``zookeeper on port 2181`` and  ``kafka on port 9092``(default ports).

%%bash
sudo nohup /opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties > /dev/null 2>&1 &
sleep 5
sudo nohup /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties > /dev/null 2>&1 &

Our goal is to review Kafka's architecture and understand how it stores data. 

## Topic Storage

First, let's create a topic

```shell
kafka-topics --create --topic <topic name> --partitions 1 --replication-factor 1 --zookeeper localhost:2181
```



In [6]:
%%bash
/opt/kafka/bin/kafka-topics.sh --create --topic explore01 --partitions 1 \
  --replication-factor 1 --zookeeper localhost:2181

Created topic explore01.


## Inspecting the Directory Structure
Now that the topic is successfully created, let's see how Kafka stored it on disk. `/var/lib/kafka/data`
```shell
ls -alh <path to kafka data on broker> | grep kafka-arch
```


In [15]:
ls -alh /tmp/kafka-logs/ | grep explore01

drwxr-xr-x 2 root root 4.0K Apr 18 21:46 [01;34mexplore01-0[0m/


## What does the output look like?

What kind of data is kept inside of the directory?

```shell
ls -alh /tmp/kafka-logs/explore*
```

If you try to open the file ending in .log is there anything in it?



In [16]:
ls -alh /tmp/kafka-logs/explore*

total 12K
drwxr-xr-x 2 root root 4.0K Apr 18 21:46 [0m[01;34m.[0m/
drwxr-xr-x 3 root root 4.0K Apr 18 22:02 [01;34m..[0m/
-rw-r--r-- 1 root root  10M Apr 18 21:46 00000000000000000000.index
-rw-r--r-- 1 root root    0 Apr 18 21:46 00000000000000000000.log
-rw-r--r-- 1 root root  10M Apr 18 21:46 00000000000000000000.timeindex
-rw-r--r-- 1 root root    8 Apr 18 21:46 leader-epoch-checkpoint


In [0]:
cat /tmp/kafka-logs/explore01-0/00000000000000000000.log

In [18]:
ls -l /tmp/kafka-logs/explore01-0/00000000000000000000.log

-rw-r--r-- 1 root root 0 Apr 18 21:46 /tmp/kafka-logs/explore01-0/00000000000000000000.log


## Produce Data
Now that we have this topic, let's produce some data into it.

```shell
kafka-console-producer --topic explore01 --broker-list localhost:9092
```

Produce a couple of messages.


In [22]:
!/opt/kafka/bin/kafka-console-producer.sh --topic explore01 --broker-list localhost:9092

>test 1
>test 2
>test 3
>

Repeat the steps from Inspecting the Directory Structure and see how the results have changed.

In [0]:
# Try the below command on the console instead, since Jupyter is not happy 
# showing binary content

# !cat /tmp/kafka-logs/explore01-0/00000000000000000000.log

In [26]:
ls -l /tmp/kafka-logs/explore01-0/00000000000000000000.log

-rw-r--r-- 1 root root 222 Apr 18 22:14 /tmp/kafka-logs/explore01-0/00000000000000000000.log


## Update the number of partitions

Let's see what happens if we modify the number of partitions

```shell
kafka-topics --alter --topic explore01 --partitions 3 --zookeeper localhost:2181
```

Try repeating the steps from the previous section. How many folders do you see now?

In [27]:
%%bash
/opt/kafka/bin/kafka-topics.sh --alter --topic explore01 --partitions 3 \
  --zookeeper localhost:2181

Adding partitions succeeded!


In [28]:
ls -alh /tmp/kafka-logs/explore*

/tmp/kafka-logs/explore01-0:
total 16K
drwxr-xr-x 2 root root 4.0K Apr 18 21:46 [0m[01;34m.[0m/
drwxr-xr-x 5 root root 4.0K Apr 18 22:22 [01;34m..[0m/
-rw-r--r-- 1 root root  10M Apr 18 21:46 00000000000000000000.index
-rw-r--r-- 1 root root  222 Apr 18 22:14 00000000000000000000.log
-rw-r--r-- 1 root root  10M Apr 18 21:46 00000000000000000000.timeindex
-rw-r--r-- 1 root root    8 Apr 18 21:46 leader-epoch-checkpoint

/tmp/kafka-logs/explore01-1:
total 12K
drwxr-xr-x 2 root root 4.0K Apr 18 22:21 [01;34m.[0m/
drwxr-xr-x 5 root root 4.0K Apr 18 22:22 [01;34m..[0m/
-rw-r--r-- 1 root root  10M Apr 18 22:21 00000000000000000000.index
-rw-r--r-- 1 root root    0 Apr 18 22:21 00000000000000000000.log
-rw-r--r-- 1 root root  10M Apr 18 22:21 00000000000000000000.timeindex
-rw-r--r-- 1 root root    8 Apr 18 22:21 leader-epoch-checkpoint

/tmp/kafka-logs/explore01-2:
total 12K
drwxr-xr-x 2 root root 4.0K Apr 18 22:21 [01;34m.[0m/
drwxr-xr-x 5 root root 4.0K Apr 18 22:22 [01;34m..[