# Kafka Topic Creation & Supply Chain Producer Sumulation Tutorial

## Introduction

Kafka is a distributed event streaming platform capable of handling real-time data feeds. In this tutorial, we will walk through the process of creating a Kafka topic using Python and the kafka-python library.

# 1. Install Required Packages
---

To interact with Kafka in Python, install the kafka-python package:
```cmd
pip install kafka-python
```
Ensure you have a running MSK or any other Kafka cluster with a plain text **broker server address** before proceeding.

# 2. Create a Kafka Topic Using Python 
---

Below is a Python script to create a topic in Kafka using KafkaAdminClient. This script establishes a connection to the Kafka broker, defines a new topic with the desired configurations, and executes the creation process using Kafka's administrative client. The script also includes a validation step to ensure that the topic is successfully created before closing the connection.

In [None]:
from kafka.admin import KafkaAdminClient, NewTopic

# Define Kafka broker address
KAFKA_BROKER = "b-1.msksilveraiwolfuseast.nqmaxv.c11.kafka.us-east-1.amazonaws.com:9092,b-2.msksilveraiwolfuseast.nqmaxv.c11.kafka.us-east-1.amazonaws.com:9092"  # <- Update with your broker address

# Initialize Kafka Admin Client
admin_client = KafkaAdminClient(
    bootstrap_servers=KAFKA_BROKER,
    client_id="admin-client"
)

# Define new topic
num_partitions = 1  # Number of partitions
replication_factor = 1  # Adjust based on your cluster setup

# Define multiple topics
topics = [
    NewTopic(name="shipment-contents", num_partitions=num_partitions, replication_factor=replication_factor),
    NewTopic(name="orders-raw", num_partitions=num_partitions, replication_factor=replication_factor),
    NewTopic(name="inventory-updates", num_partitions=num_partitions, replication_factor=replication_factor),
    NewTopic(name="shipment-status", num_partitions=num_partitions, replication_factor=replication_factor),
    NewTopic(name="customer-feedback", num_partitions=num_partitions, replication_factor=replication_factor)
]

# Create topic
admin_client.create_topics(new_topics=topics, validate_only=False)

print(f"Topic '{' ,'.join([x.name for x in topics])}' created successfully!")

# 3. List and Verify Topics
---
To list all available Kafka topics using Python, we use KafkaAdminClient, which connects to the Kafka broker and retrieves the list of existing topics. This is useful for verifying that a topic was successfully created or checking the available topics in a Kafka cluster before performing further operations.

In [None]:
# List topics
topics = admin_client.list_topics()
print("Available topics:", [x for x in topics if not x.startswith("__")])

# 4. Producing Messages to Kafka
---

To send messages to Kafka, we can use KafkaProducer, which allows us to publish messages to a topic in a structured format.

The following Python script demonstrates how to set up a Kafka producer and generate sample supply chain messages:


In [None]:
from kafka import KafkaProducer
import json
import time
import random
from datetime import datetime, timezone
from uuid import uuid4

# Setting up the Kafka Producer
producer = KafkaProducer(
    bootstrap_servers=KAFKA_BROKER,
    key_serializer=lambda k: k.encode('utf-8') if k else None,
    value_serializer=lambda value: json.dumps(value).encode("utf-8"),
)

# Function to generate raw order messages
def generate_order_message(order_id):
    return {
        "order_id": str(uuid4()),
        "product_id": f"P-{random.randint(1000, 1005)}",
        "quantity": random.randint(1, 100),
        "event_timestamp": datetime.now(timezone.utc).isoformat()
    }

# Function to generate inventory updates
def generate_inventory_update():
    return {
        "warehouse_location": f"WH-{random.randint(1, 3)}",
        "product_id": f"P-{random.randint(1000, 1005)}",
        "stock_level": random.randint(0, 500),
        "event_timestamp": datetime.now(timezone.utc).isoformat()
    }

def generate_shipment_contents(chosen_order):

    return {
        "shipment_id": f"S-{random.randint(10000, 10005)}",
        "order_id": chosen_order.get("order_id"),
        "product_id": chosen_order.get("product_id"),
        "quantity": random.randint(1, 10),
        "event_timestamp": datetime.now(timezone.utc).isoformat()
    }

# Function to generate shipment status updates
def generate_shipment_status(chosen_order):
    return {
        "shipment_id": f"S-{random.randint(10000, 10005)}",
        "order_id": chosen_order.get("order_id"),
        "current_status": random.choice(["in_transit", "delivered", "out_for_delivery", "delayed"]),
        "event_timestamp": datetime.now(timezone.utc).isoformat()
    }

# Customer feedback mapping
rating_mapping = {
    "1": "Poor",
    "2": "Below Average",
    "3": "Average",
    "4": "Above Average",
    "5": "Excellent"
}

# Function to generate customer feedback
def generate_customer_feedback(chosen_order):
    rating_number = random.randint(1, 5)
    return {
        "order_id": chosen_order.get("order_id"),
        "customer_id": f"C-{random.randint(1000, 2000)}",
        "rating": rating_number,
        "comment": rating_mapping.get(str(rating_number)),
        "event_timestamp": datetime.now(timezone.utc).isoformat()
    }

# Sending messages to different Kafka topics
existing_orders = []
for i in range(20):

    # Generating the order 
    orders_raw = generate_order_message(i)
    existing_orders.append(orders_raw)
    producer.send("orders-raw", key=str(i), value=orders_raw)

    # Updating the inventory
    producer.send("inventory-updates", key=str(i), value=generate_inventory_update())

    # Generating shipment contents and status from existing orders
    chosen_order = random.choice(existing_orders)
    producer.send("shipment-contents", key=str(i), value=generate_shipment_contents(chosen_order))

    # Generating shipment status from existing orders
    producer.send("shipment-status", key=str(i), value=generate_shipment_status(chosen_order))

    # Generating customer feedback from existing orders
    producer.send("customer-feedback", key=str(i), value=generate_customer_feedback(chosen_order))
    print(f"Sent messages for iteration {i}")
    # time.sleep(random.randint(1, 5))

producer.flush()
producer.close()

> **CONTINUE TO THE NEXT NOTEBOOKS**
> - **t2a**: Describes how to stream using Spark
> - **t2b**: Describes how to stream using Delta Live Tables

# Clean Tutorial
---

> Use this to clean the topics if needed.

* Delete MSK Clusters when done!

In [None]:
import threading

def input_with_timeout(prompt, timeout):
    def inner_input():
        nonlocal user_input
        user_input = input(prompt)
    
    user_input = None
    thread = threading.Thread(target=inner_input)
    thread.start()
    thread.join(timeout)
    
    if thread.is_alive():
        print("\nInput timed out. Proceeding without cleanup.")
        return "n"
    return user_input

clean = input_with_timeout("Do you want to clean up the topics? (y/n): ", 10)
if clean.lower() == "y":

    # List all topics
    topics = admin_client.list_topics()

    # Filter system topics
    topics = [topic for topic in topics if not topic.startswith("__")]
    print("Topics to delete:", topics)

    # Delete the topics
    admin_client.delete_topics(topics=topics)
    print(f"Topics '{', '.join(topics)}' deleted successfully!")

    # Close client
    admin_client.close()

# END OF NOTEBOOK