## **PubSub**

Description: This notebook will teach you the basics of working with PubSub in Google Cloud Platform (GCP):

- What is PubSub?
- Creating topics and subscriptions programmatically.
- Publishing and consuming messages.
- Advanced topics:
   - Dead Letter Queues (DLQs)
   - Schema setup for PubSub
   - Message ordering
   - Push vs. Pull subscriptions


EDEM. Master Big Data & Cloud 2024/2025<br>
Professor: Javi Briones

In [None]:
!pip install google-cloud-pubsub --quiet

In [None]:
# Import libraries
from google.cloud import pubsub_v1
from google.pubsub_v1.types import Encoding
import logging
import json

In [None]:
# Set logs
logging.basicConfig(level=logging.INFO)

### **What is PubSub?**

Google Cloud PubSub is a fully managed messaging service that allows you to:

- Enable **real-time event streaming** by decoupling publishers and subscribers.
- **Publishers** send messages to **topics**.
- **Subscribers** receive messages through **subscriptions**.

#### **Key Concepts**

- **Topic:** A named resource that serves as the entry point for messages.
- **Subscription:** A named resource representing the subscriber’s interest in a topic.
- **Publisher:** The application that sends messages to the topic.
- **Subscriber:** The application that receives messages from a subscription.



### **Creating topics and subscriptions**

In [None]:
# Variables
project_id = "your-gcp-project-id"
topic_id = "your-gcp-pubsub-topic-id"
subscription_id = "your-gcp-pubsub-subscription-id"
ordered_topic_id = "your-gcp-pubsub-ordered-topic-id"
ordered_subscription_id = "your-gcp-pubsub-ordered-subscription-id"
dlq_topic_id = "your-gcp-pubsub-dlq-topic-id"
dlq_subscription_id = "your-gcp-pubsub-dlq-subscription-id"

# PubSub Client
publisher = pubsub_v1.PublisherClient()
def get_subscriber():
    return pubsub_v1.SubscriberClient()
schema_client = pubsub_v1.SchemaServiceClient()

# Full paths
topic_path = publisher.topic_path(project_id, topic_id)
subscription_path = get_subscriber().subscription_path(project_id, subscription_id)
ordered_topic_path = publisher.topic_path(project_id, ordered_topic_id)
ordered_subscription_path = get_subscriber().subscription_path(project_id, ordered_subscription_id)
dlq_topic_path = publisher.topic_path(project_id, dlq_topic_id)
dlq_subscription_path = get_subscriber().subscription_path(project_id, dlq_subscription_id)

In [None]:
"""
Exercise 01: Create a PubSub Topic using the Python Client Library.
"""

logging.info(f"Creating topic: {topic_id}")

try:
    publisher.create_topic(name=topic_path)
    logging.info(f"Topic created: {topic_path}")

except Exception as err:
    logging.error(f"Topic may already exist: {err}")

In [None]:
"""
Exercise 02: Create a PubSub Subscription using the Python Client Library.
"""

logging.info(f"Creating subscription: {subscription_id}")

try:
    get_subscriber().create_subscription(name=subscription_path, topic=topic_path)
    logging.info(f"Subscription created: {subscription_path}")

except Exception as err:
    logging.error(f"Subscription may already exist: {err}")

In [None]:
"""
Exercise 03: Publish messages to a PubSub Queue using the Python Client Library.
"""

def publish_messages(publisher, topic_path):

    for i in range(5):
        
        payload = json.dumps({'id':i})
        msg = payload.encode("utf-8")
        future = publisher.publish(topic_path, msg)

        logging.info(f"Published {msg} with message ID {future.result()}")

publish_messages(publisher, topic_path)

In [None]:
"""
Exercise 04.A: Pull messages from a PubSub Queue using the Python Client Library.
"""

def pull_messages(subscriber, subscription_path):

    with subscriber:

        response = subscriber.pull(
            request={"subscription": subscription_path, "max_messages": 5}
        )

        for msg in response.received_messages:

            logging.info(f"Received: {msg.message.data.decode('utf-8')}")

            subscriber.acknowledge(
                request={"subscription": subscription_path, "ack_ids": [msg.ack_id]}
            )

pull_messages(get_subscriber(), subscription_path)

In [None]:
"""
Exercise 04.B: Acknowledge messages in PubSub using the Python Client Library.
"""

def pull_messages(subscriber, subscription_path):

    with subscriber:

        response = subscriber.pull(
            request={"subscription": subscription_path, "max_messages": 5}
        )

        for msg in response.received_messages:

            logging.info(f"Received: {msg.message.data.decode('utf-8')}")

            # subscriber.acknowledge(
            #     request={"subscription": subscription_path, "ack_ids": [msg.ack_id]}
            # )

pull_messages(get_subscriber(), subscription_path)

### **Advanced Pub/Sub Concepts**

- **Dead Letter Queues (DLQs):**
   - A special subscription that collects undeliverable messages.
   - Useful for handling processing failures.

- **Schema Setup:**
   - Enforcing a specific format for Pub/Sub messages.
   - Ensures data consistency across publishers and subscribers.

- **Message Ordering:**
   - Enabling message ordering guarantees that messages with the same key are delivered in order.
   - Useful for stateful processing or time-series data.

- **Push vs. Pull Subscriptions:**
   - **Push:** Pub/Sub sends messages directly to an HTTP endpoint.
   - **Pull:** Subscribers pull messages explicitly, allowing more control over processing.


In [None]:
"""
Exercise 05: Demonstrating Message Ordering in PubSub.
"""

# Create a topic with message ordering enabled
logging.info(f"Creating topic: {ordered_topic_id}")

try:
    publisher.create_topic(
        request={
            "name": ordered_topic_path,
            "labels": {"purpose": "ordering-demo"}
        }
    )
    logging.info(f"Ordered topic created: {ordered_topic_path}")

except Exception as err:
    logging.warning(f"Topic may already exist: {err}")

# Create a subscription for the ordered topic
try:
    get_subscriber().create_subscription(
        name=ordered_subscription_path,
        topic=ordered_topic_path
    )
    logging.info(f"Subscription created: {ordered_subscription_path}")

except Exception as err:
    logging.warning(f"Subscription may already exist: {err}")

In [None]:
# Publish / Consume messages

publisher_options = pubsub_v1.types.PublisherOptions(enable_message_ordering=True)
ordering_publisher = pubsub_v1.PublisherClient(
    publisher_options=publisher_options
)

messages = [
    ("message1", "key1"),
    ("message2", "key2"),
    ("message3", "key1"),
    ("message4", "key2"),
]

for msg in messages:

    data = msg[0].encode("utf-8")
    ordering_key = msg[1]

    future = ordering_publisher.publish(ordered_topic_path, data, ordering_key=ordering_key)
    logging.info(f"Published {data} with message ID {future.result()}")

In [None]:
pull_messages(get_subscriber(), ordered_subscription_path)

In [None]:
"""
Exercise 06: Publishing Messages with Attributes
"""

for i in range(5):
        
    payload = json.dumps({'id':i})
    msg = payload.encode("utf-8")
    event = 'test' if i == 4 else 'train'

    # Add two attributes, user and event, to the message
    future = publisher.publish(topic_path, msg, user='Javi', event=event)

    logging.info(f"Published {msg} with message ID {future.result()}")


In [None]:
response = get_subscriber().pull(
    request={"subscription": subscription_path, "max_messages": 5}
)

for msg in response.received_messages:

    if msg.message.attributes['event'] == 'test':

        logging.info(f"Received: {msg.message.data.decode('utf-8')}")

In [None]:
"""
Exercise 07: Enforcing Schema Validation with Dead Letter Queue.
"""

# Create the schema
schema_id = "edem-schema"
schema_definition = """
{
  "type": "record",
  "name": "edem_message",
  "fields": [
    {"name": "id", "type": "int"}
  ]
}
"""

try:
    schema = schema_client.create_schema(
        request={
            "parent": f"projects/{project_id}",
            "schema_id": schema_id,
            "schema": {"type_": "AVRO", "definition": schema_definition},
        }
    )
    logging.info(f"Created schema: {schema.name}")

except Exception as err:
    logging.warning(f"Schema may already exist: {err}")

# Create a subscription with DLQ
dead_letter_policy = pubsub_v1.types.DeadLetterPolicy(
    dead_letter_topic=dlq_topic_path,
    max_delivery_attempts=5,
)

try:
    
    subscription = get_subscriber().create_subscription(
        request = {
        "name": dlq_subscription_path,
        "topic": topic_path,
        "dead_letter_policy": dead_letter_policy,
    }
    )

    logging.info(f"Created subscription with DLQ: {dlq_subscription_path}")

except Exception as err:
    logging.warning(f"Subscription may already exist: {err}")


In [None]:
# A. Publish invalid message
invalid_message= json.dumps({}).encode('utf-8')

publisher.publish(topic_path, invalid_message)
logging.info("Published invalid message (should go to DLQ)")


In [None]:
# B. Pull messages from the DLQ
pull_messages(get_subscriber(), dlq_subscription_path)

### **Clean Up**

In [None]:
topics = []
subscriptions = []

topics.append(topic_path)
topics.append(ordered_topic_path)
topics.append(dlq_topic_path)

subscriptions.append(subscription_path)
subscriptions.append(ordered_subscription_path)
subscriptions.append(dlq_subscription_path)

In [None]:
# Delete Topics
for topic in topics:

    try:
        publisher.delete_topic(request={"topic": topic})
        logging.info(f"Topic deleted: {topic}")

    except Exception as err:
        logging.info(f"Error while deleting the topic: {err}")

In [None]:
# Delete Subscriptions
for subscription in subscriptions:

    try:
        publisher.delete_topic(request={"subscription": subscription})
        logging.info(f"Topic deleted: {subscription}")

    except Exception as err:
        logging.info(f"Error while deleting the topic: {err}")