<div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 20px;">
    <div style="flex: 0 0 auto; margin-left: 0; margin-bottom: 0;">
        <img src="https://scidx.sci.utah.edu/wp-content/uploads/2024/12/logo-sm.png" alt="scidx Logo"/>
    </div>
<!--    <div style="flex: 0 0 auto; margin-left: auto; margin-bottom: 0;">
        <img src="https://nairrpilot.org/app/site/media/ndp.jpg" alt="NDP Logo" width="200"/>
    </div> -->
</div>

# SciDX Streaming Capabilities Demonstration: Kafka Stream 

This demonstration showcases the **SciDX Streaming capabilities**, leveraging both the **SciDX POP Library** for managing data objects and the **Streaming Library** for real-time data streaming and processing of a Kafka Stream.


In [None]:
from scidx_streaming import StreamingClient
from pointofpresence import APIClient

Here, we:
1. Initialize the `APIClient` to handle data registration and discovery.
2. Initialize the `StreamingClient` to handle real-time data streams.

In [None]:
# Get the token from https://token.ndp.utah.edu
API_URL="155.101.6.191:8003"
TOKEN="eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJWbFJfUjhMNUFZN0FvOU5FSDA0MW9kSUJMczBMSk52bnQxU0ZTaDFjVDQ0In0.eyJleHAiOjE3NDEwNjk0MDcsImlhdCI6MTc0MTAzNzAwNywiYXV0aF90aW1lIjoxNzQxMDM2OTk5LCJqdGkiOiIwY2UyNmI0My1lMjg3LTRhMWEtOWVmNC0xZmMwMGNjNjg5YTciLCJpc3MiOiJodHRwczovL2lkcC5uYXRpb25hbGRhdGFwbGF0Zm9ybS5vcmcvcmVhbG1zL05EUCIsImF1ZCI6ImFjY291bnQiLCJzdWIiOiJkMzRhMzQ1OS1lNWI2LTQ4MjYtYjZhNi05YWFmNGViN2I0NDciLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJuZHBfZnJvbnRlbmRfcHJvZCIsInNlc3Npb25fc3RhdGUiOiJlNDYwYWY4MC00MjIyLTRhMjAtYjIzMy1kOWUwNTJmNDgzYzMiLCJhY3IiOiIwIiwiYWxsb3dlZC1vcmlnaW5zIjpbIioiXSwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbImRlZmF1bHQtcm9sZXMtbmRwIiwib2ZmbGluZV9hY2Nlc3MiLCJ1bWFfYXV0aG9yaXphdGlvbiJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoib3BlbmlkIHByb2ZpbGUgZW1haWwiLCJzaWQiOiJlNDYwYWY4MC00MjIyLTRhMjAtYjIzMy1kOWUwNTJmNDgzYzMiLCJlbWFpbF92ZXJpZmllZCI6ZmFsc2UsIm5hbWUiOiJBbmRyZXUgQmF1dGlzdGEiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJhbmRyZXVmYjk3QGdtYWlsLmNvbS5lZHUiLCJnaXZlbl9uYW1lIjoiQW5kcmV1IiwiZmFtaWx5X25hbWUiOiJCYXV0aXN0YSIsImVtYWlsIjoiYW5kcmV1ZmI5N0BnbWFpbC5jb20uZWR1In0.qXNMhN3igug69I_kwUYGgzF10p8PouT2Gw4PQ67PQ0-c0w22CMN4J45voFZAhZ9psdjM8pMDLwPaxYMg3CgSFNWeKQCRYs6L_HoxavwEFcObnn8MZ45wtVGrr1odPRl3xdAn7W9Hyqyg-79CjauwA4P03w7wgoIffhoKY7WAVK2XJRjkqchadtb1Izxp282YGWOtclt8mN1IS9aN7nPQWeS5nR9R4VJCgcIsGoRlGHHBbwgdAjkAR-k1iEm36aYCD4evVkmcmjgXDCXb3ecPWi9NyiiZvPBy67OLCQZ--m1cSkjLSuPd_N6L0UBVRXd78CCJ2KavlSCGls8TEcruGg"

# Initialize the POP client for data registration and discovery
client = APIClient(base_url=API_URL, token=TOKEN)

# Initialize the Streaming client for real-time data streaming
streaming = StreamingClient(client)
print(f"Streaming Client initialized. User ID: {streaming.user_id}")

# Basic use:

### 1. Register a Kafka Stream

In this step, we will use the **POP client**, and the metadata for resgitsering an online **Kafka Stream** into our POP.

Kafka streams allow real-time data ingestion from a Kafka topic. To register a Kafka-based data stream, we use client.register_kafka_topic(metadata), which requires defining the necessary metadata.



In [None]:
# Define the payload data for the Kafka topic registration
kafka_stream_metadata = {
  "dataset_name": "timestamp-example",
  "dataset_title": "timestamp-example",
  "owner_org": "saleem_test",
  "kafka_topic": "timestamp-topic-1",
  "kafka_host": "155.101.6.191",
  "kafka_port": "9092"
}

# Call the register_kafka_topic method to add the Kafka topic
try:
    response = client.register_kafka_topic(kafka_stream_metadata)
    print("Kafka topic registered successfully with ID:", response["id"])
except ValueError as e:
    print("Failed to register Kafka topic.")
    print(f"{e}.")

### 2. Search for the registered entry

This ensures the dataset is discoverable for use by the Data Consumers.

In [None]:
# Search for the registered Earthscope data stream
import json

search_results = client.search_datasets("timestamp-example", server="local")
print(f"Number of datasets found: {len(search_results)}")
print("Search result:\n" + json.dumps(search_results, indent=4))


### 3. Create a Data Stream from the registered entry

In [None]:
# Create a Kafka stream data
stream = await streaming.create_kafka_stream(
    keywords=["timestamp-example"],
    match_all=True
)

# Retrieve the stream's topic name
topic = stream.data_stream_id
print(f"Stream created: {topic}")

### 4. Consume the Streamed Data 

In [None]:
# Start consuming the filtered Kafka stream
consumer = streaming.consume_kafka_messages(topic)

In [None]:
# After some seconds you can visualize the dataset
consumer.dataframe

### 5: Stop Data Consumption and Clean up 

To wrap up, we will: 
1. Stop the data consumer to halt data processing.
2. Delete the created stream from the Kafka topic using the Streaming client.
3. Remove the registered dataset using the POP client.

This ensures all resources and background tasks are properly released.

In [None]:
# Stop the Kafka consumer
consumer.stop()

# Delete the Kafka stream - this will cause error
await streaming.delete_stream(stream)

# Delete the registered dataset from the POP system
client.delete_resource_by_id(search_results[0]["id"])
print("Cleanup completed: Stream and registered dataset deleted.")

<br>

## **Other Resource**

Create a Kafka topic on our running Kafka server.

### 1. Get Kafka Server Detail

The POP API stores the connection information for the Kafka server. In the following step, we use `client.get_kafka_details()` to retrieve the details of the running Kafka server, such as host and port, which are needed to register and stream data.

In [None]:
# Run the following code cell to get Kafka connection details
try:
    response = client.get_kafka_details()
    print("Kafka details:", response)
except ValueError as e:
    print("Failed to get the Kafka details.")
    print(e)

kafka_host = response["kafka_host"]
kafka_port = response["kafka_port"]

### 2. Initialize Kafka Client

In [None]:
from kafka.admin import KafkaAdminClient, NewTopic

# Define Kafka broker
bootstrap_servers = f'{kafka_host}:{kafka_port}' 
print(f"Kafka bootstrap server: {bootstrap_servers}")

# Create an admin client
kafka_client = KafkaAdminClient(
    bootstrap_servers=bootstrap_servers,
)

### 3. Create a new Kafka topic: **Skip the following code cell if the **timestamp-topic-1** already exists, if can verify by executing Step 4 List all Topics**

In [None]:
# Define the topic name to be used by the streaming client
streaming_topic = 'timestamp-topic-1'

# Define the new topic
topic = NewTopic(name=streaming_topic, num_partitions=1, replication_factor=1)

# Attempt to create the topic - already exists will raise an error
try:
    kafka_client.create_topics([topic])
    print(f"Topic '{streaming_topic}' created successfully.")
except Exception as e:
    print(f"Failed to create topic: {e}")

### 4. List all Topics

In [None]:
# List existing topics in the Kafka cluster
try:
    topics = kafka_client.list_topics()
    print("Topics in the Kafka cluster:")
    for topic in topics:
        print(f" - {topic}")
except Exception as e:
    print(f"Failed to list topics: {e}")

### 5: Delete Kafka Topic

In [None]:
# Delete the topic if it exists
try:
    kafka_client.delete_topics([streaming_topic])
    print(f"Topic '{streaming_topic}' deletion requested.")
except Exception as e:
    print(f"Failed to delete topic: {e}")