## **PubSub**

Description: This notebook will teach you the basics of working with PubSub in Google Cloud Platform (GCP):

- What is PubSub?
- Creating topics and subscriptions programmatically.
- Publishing and consuming messages.
- Advanced topics:
   - Dead Letter Queues (DLQs)
   - Schema setup for PubSub
   - Message ordering
   - Push vs. Pull subscriptions


EDEM. Master Big Data & Cloud 2025/2026<br>
Professor: Javi Briones

In [3]:
!pip install google-cloud-pubsub --quiet

In [4]:
# Import libraries
from google.cloud import pubsub_v1
from google.pubsub_v1.types import Encoding
import logging
import json

In [5]:
# Set logs
logging.basicConfig(level=logging.INFO)

### **What is PubSub?**

Google Cloud PubSub is a fully managed messaging service that allows you to:

- Enable **real-time event streaming** by decoupling publishers and subscribers.
- **Publishers** send messages to **topics**.
- **Subscribers** receive messages through **subscriptions**.

#### **Key Concepts**

- **Topic:** A named resource that serves as the entry point for messages.
- **Subscription:** A named resource representing the subscriberâ€™s interest in a topic.
- **Publisher:** The application that sends messages to the topic.
- **Subscriber:** The application that receives messages from a subscription.



In [6]:
# Set project_id variable
PROJECT_ID = input("Enter your GCP Project ID: ")

In [7]:
# Set up PubSub clients
publisher = pubsub_v1.PublisherClient()
def get_subscriber():
    return pubsub_v1.SubscriberClient()
schema_client = pubsub_v1.SchemaServiceClient()

I0000 00:00:1769009622.119403  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
I0000 00:00:1769009622.358352  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


In [8]:
"""
Exercise 01: Create a PubSub Topic using the Python Client Library.
"""

topic_id = 'edem-demo-topic'
topic_path = publisher.topic_path(PROJECT_ID, topic_id)

logging.info(f"Creating topic: {topic_id}")

try:
    publisher.create_topic(name=topic_path)
    logging.info(f"Topic created: {topic_path}")

except Exception as err:
    logging.error(f"Topic may already exist: {err}")

INFO:root:Creating topic: edem-demo-topic
ERROR:root:Topic may already exist: 409 Resource already exists in the project (resource=edem-demo-topic).


In [9]:
"""
Exercise 02: Create a PubSub Subscription using the Python Client Library.
"""

subscription_id = 'edem-demo-topic-sub'
subscription_path = get_subscriber().subscription_path(PROJECT_ID, subscription_id)

logging.info(f"Creating subscription: {subscription_id}")

try:
    get_subscriber().create_subscription(name=subscription_path, topic=topic_path)
    logging.info(f"Subscription created: {subscription_path}")

except Exception as err:
    logging.error(f"Subscription may already exist: {err}")

INFO:root:Creating subscription: edem-demo-topic-sub
I0000 00:00:1769009624.044036  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
I0000 00:00:1769009624.372119  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
ERROR:root:Subscription may already exist: 409 Resource already exists in the project (resource=edem-demo-topic-sub).


In [10]:
"""
Exercise 03: Publish messages to a PubSub Queue using the Python Client Library.
"""

def publish_messages(publisher, topic_path):

    for i in range(5):
        
        payload = json.dumps({'id':i})
        msg = payload.encode("utf-8")
        future = publisher.publish(topic_path, msg)

        logging.info(f"Published {msg} with message ID {future.result()}")

publish_messages(publisher, topic_path)

INFO:root:Published b'{"id": 0}' with message ID 17844224231443971
INFO:root:Published b'{"id": 1}' with message ID 17845125862160107
INFO:root:Published b'{"id": 2}' with message ID 17843513324756259
INFO:root:Published b'{"id": 3}' with message ID 17635339318576174
INFO:root:Published b'{"id": 4}' with message ID 17842506964536865


In [11]:
"""
Exercise 04.A: Pull messages from a PubSub Queue using the Python Client Library.
"""

def pull_messages(subscriber, subscription_path):

    with subscriber:

        response = subscriber.pull(
            request={"subscription": subscription_path, "max_messages": 10}
        )

        for msg in response.received_messages:

            logging.info(f"Received: {msg.message.data.decode('utf-8')}")

            subscriber.acknowledge(
                request={"subscription": subscription_path, "ack_ids": [msg.ack_id]}
            )

pull_messages(get_subscriber(), subscription_path)

I0000 00:00:1769009626.181264  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Received: {"id": 2}
INFO:root:Received: {"id": 3}
INFO:root:Received: {"id": 0}
INFO:root:Received: {"id": 1}
INFO:root:Received: {"id": 4}
INFO:root:Received: {"id": 4}
INFO:root:Received: {"id": 0}
INFO:root:Received: {"id": 1}
INFO:root:Received: {"id": 3}


In [12]:
"""
Exercise 04.B: Acknowledge messages in PubSub using the Python Client Library.
"""


def _pull_messages(subscriber, subscription_path):

    with subscriber:

        response = subscriber.pull(
            request={"subscription": subscription_path, "max_messages": 10}
        )

        for msg in response.received_messages:

            message = json.loads(msg.message.data.decode('utf-8'))

            if message['id'] % 2 == 0:

                subscriber.acknowledge(
                    request={"subscription": subscription_path, "ack_ids": [msg.ack_id]}
                )

                logging.info(f"Received: {message}")

            else:
                
                logging.info(f"Skipped: {message}")

_pull_messages(get_subscriber(), subscription_path)

I0000 00:00:1769009629.516699  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Received: {'id': 2}


### **Advanced Pub/Sub Concepts**

- **Dead Letter Queues (DLQs):**
   - A special subscription that collects undeliverable messages.
   - Useful for handling processing failures.

- **Schema Setup:**
   - Enforcing a specific format for Pub/Sub messages.
   - Ensures data consistency across publishers and subscribers.

- **Message Ordering:**
   - Enabling message ordering guarantees that messages with the same key are delivered in order.
   - Useful for stateful processing or time-series data.

- **Push vs. Pull Subscriptions:**
   - **Push:** Pub/Sub sends messages directly to an HTTP endpoint.
   - **Pull:** Subscribers pull messages explicitly, allowing more control over processing.


In [13]:
"""
Exercise 05: Demonstrating Message Ordering in PubSub.
"""

# Create a topic with message ordering enabled
ordered_topic_id = 'edem-ord-demo-topic'
ordered_topic_path = publisher.topic_path(PROJECT_ID, ordered_topic_id)

try:
    publisher.create_topic(
        request={
            "name": ordered_topic_path,
            "labels": {"purpose": "ordering-demo"}
        }
    )
    logging.info(f"Ordered topic created: {ordered_topic_path}")

except Exception as err:
    logging.warning(f"Topic may already exist: {err}")

# Create a subscription for the ordered topic
ordered_subscription_id = 'edem-ord-demo-sub'
ordered_subscription_path = get_subscriber().subscription_path(PROJECT_ID, ordered_subscription_id)

try:
    get_subscriber().create_subscription(
        request={
            "name": ordered_subscription_path,
            "topic": ordered_topic_path,
            "enable_message_ordering": True}
    )
    logging.info(f"Subscription created: {ordered_subscription_path}")

except Exception as err:
    logging.warning(f"Subscription may already exist: {err}")

I0000 00:00:1769009633.313732  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
I0000 00:00:1769009633.551453  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


In [14]:
publisher_options = pubsub_v1.types.PublisherOptions(enable_message_ordering=True)
ordering_publisher = pubsub_v1.PublisherClient(
    publisher_options=publisher_options
)

messages = [
    ("message1", "key1"),
    ("message2", "key1"),
    ("message3", "key1"),
    ("message4", "key1"),
]

for msg in messages:

    data = msg[0].encode("utf-8")
    ordering_key = msg[1]

    future = ordering_publisher.publish(ordered_topic_path, data, ordering_key=ordering_key)
    logging.info(f"Published {data} with message ID {future.result()}")

I0000 00:00:1769009634.883804  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Published b'message1' with message ID 17849210624053519
INFO:root:Published b'message2' with message ID 17848885476085767
INFO:root:Published b'message3' with message ID 17849872764551536
INFO:root:Published b'message4' with message ID 17643310253382030


In [15]:
pull_messages(get_subscriber(), ordered_subscription_path)

I0000 00:00:1769009636.125242  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Received: message1
INFO:root:Received: message2
INFO:root:Received: message3
INFO:root:Received: message4


In [16]:
pull_messages(get_subscriber(), subscription_path)

I0000 00:00:1769009641.464688  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported


In [17]:
"""
Exercise 06: Publishing Messages with Attributes
"""

for i in range(5):
        
    payload = json.dumps({'id':i})
    msg = payload.encode("utf-8")
    event = 'test' if i == 4 else 'train'

    # Add two attributes, user and event, to the message
    future = publisher.publish(topic_path, msg, user='admin', event=event)

    logging.info(f"Published {msg} with message ID {future.result()}")


INFO:root:Published b'{"id": 0}' with message ID 17844116331172821
INFO:root:Published b'{"id": 1}' with message ID 17848175605019052
INFO:root:Published b'{"id": 2}' with message ID 17642738162246726
INFO:root:Published b'{"id": 3}' with message ID 17844574634181933
INFO:root:Published b'{"id": 4}' with message ID 17843555048522739


In [18]:
## Pull messages and print attributes
response = get_subscriber().pull(
    request={"subscription": subscription_path, "max_messages": 10}
)

for msg in response.received_messages:

    if msg.message.attributes['event'] == 'test':

        logging.info(f"Received: {msg.message.data.decode('utf-8')}")
    
    else:

        logging.info(f"Received: {msg.message.attributes['event']}")

I0000 00:00:1769009661.936529  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Received: train
INFO:root:Received: train
INFO:root:Received: train
INFO:root:Received: train


In [19]:
"""
Exercise 07: Enforcing Schema Validation with Dead Letter Queue.
"""

from google.protobuf import field_mask_pb2

# Create the schema
schema_id = "edem-schema"
schema_definition = """
{
  "type": "record",
  "name": "edem_message",
  "fields": [
    {"name": "id", "type": "int"}
  ]
}
"""

try:
    schema = schema_client.create_schema(
        request={
            "parent": f"projects/{PROJECT_ID}",
            "schema_id": schema_id,
            "schema": {"type_": "AVRO", "definition": schema_definition},
        }
    )
    logging.info(f"Created schema: {schema.name}")

except Exception as err:
    logging.warning(f"Schema may already exist: {err}")

# Add the schema to a newly created topic

from google.api_core.exceptions import AlreadyExists, InvalidArgument
from google.cloud.pubsub import PublisherClient, SchemaServiceClient
from google.pubsub_v1.types import Encoding

schema_topic_id = 'edem-schema-demo-topic'
schema_topic_path = publisher.topic_path(PROJECT_ID, schema_topic_id)

logging.info(f"Creating topic: {schema_topic_id}")

try:
    publisher.create_topic(request={
        "name": schema_topic_path,
        "schema_settings": {
            "schema": schema_client.schema_path(PROJECT_ID, schema_id),
            "encoding": Encoding.JSON
        }
    })
    
    logging.info(f"Topic created: {schema_topic_path}")

except Exception as err:
    logging.error(f"Topic may already exist: {err}")

INFO:root:Creating topic: edem-schema-demo-topic
ERROR:root:Topic may already exist: 409 Resource already exists in the project (resource=edem-schema-demo-topic).


In [20]:
# Publish valid and invalid messages
def publish_messages_with_schema(publisher, topic_path):

    valid_payload = json.dumps({'id':1})
    invalid_payload = json.dumps({'name':'invalid'})

    valid_msg = valid_payload.encode("utf-8")
    invalid_msg = invalid_payload.encode("utf-8")

    # Publish a valid message
    future_valid = publisher.publish(topic_path, valid_msg)
    logging.info(f"Published valid message with ID {future_valid.result()}")

    # Publish an invalid message
    try:
        future_invalid = publisher.publish(topic_path, invalid_msg)
        logging.info(f"Published invalid message with ID {future_invalid.result()}")
        
    except Exception as err:
        logging.error(f"Failed to publish invalid message: {err}")

publish_messages_with_schema(publisher, schema_topic_path)

INFO:root:Published valid message with ID 17644398536518772
ERROR:root:Failed to publish invalid message: 400 Invalid data in message: Message failed schema validation. [reason: "INVALID_JSON_AVRO_MESSAGE"
domain: "pubsub.googleapis.com"
metadata {
  key: "revisionInfo"
  value: "Could not validate message with any schema revision for schema: projects/51738838848/schemas/edem-schema, last checked revision: revision_id=64a88c76 failed with status: Invalid data in message: Field was not found in JSON object: id."
}
metadata {
  key: "message"
  value: "Message failed schema validation"
}
]


In [21]:
"""
Exercise 08: Dead Letter Queue.
"""

# Create DLQ topic and subscription
dlq_topic_id = 'edem-dlq-topic'
dlq_topic_path = publisher.topic_path(PROJECT_ID, dlq_topic_id)

dlq_subscription_id = 'edem-dlq-sub'
dlq_subscription_path = get_subscriber().subscription_path(PROJECT_ID, dlq_subscription_id)

# Topic
try:
    publisher.create_topic(
        request={
            "name": dlq_topic_path,
            "labels": {"purpose": "ordering-demo"}
        }
    )
    logging.info(f"DLQ topic created: {dlq_topic_path}")

except Exception as err:
    logging.warning(f"Topic may already exist: {err}")

# Subscription
try:
    get_subscriber().create_subscription(name=dlq_subscription_path, topic=dlq_topic_path)
    logging.info(f"Subscription created: {dlq_subscription_path}")

except Exception as err:
    logging.error(f"Subscription may already exist: {err}")

I0000 00:00:1769009668.261492  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
I0000 00:00:1769009669.832434  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
ERROR:root:Subscription may already exist: 409 Resource already exists in the project (resource=edem-dlq-sub).


In [22]:
# Update DLQ on Subscription
dead_letter_policy = pubsub_v1.types.DeadLetterPolicy(
    dead_letter_topic=dlq_topic_path,
    max_delivery_attempts=5,
)


subscription = pubsub_v1.types.Subscription(
    name=subscription_path,
    topic=topic_path,
    dead_letter_policy=dead_letter_policy,
)


# Fields to update
update_mask = pubsub_v1.types.FieldMask(paths=["dead_letter_policy"])

try:
    
    subscription = get_subscriber().update_subscription(
        request = {
        "subscription": subscription,
        "update_mask": update_mask
    }
    )

    logging.info(f"DLQ updated on main subscription: {subscription_path}")

except Exception as err:
    logging.warning(f"Subscription may already exist: {err}")

I0000 00:00:1769009670.984180  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:DLQ updated on main subscription: projects/inspiring-bonus-481514-j4/subscriptions/edem-demo-topic-sub


In [23]:
# B. Pull messages from the subscription until max delivery attempts are reached
_pull_messages(get_subscriber(), subscription_path)

I0000 00:00:1769009674.758037  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Received: {'id': 0}
INFO:root:Skipped: {'id': 1}
INFO:root:Received: {'id': 2}
INFO:root:Received: {'id': 4}
INFO:root:Skipped: {'id': 3}


### **Clean Up**

In [24]:
topics = []
subscriptions = []

topics.append(topic_path)
topics.append(ordered_topic_path)
topics.append(schema_topic_path)
topics.append(dlq_topic_path)

subscriptions.append(subscription_path)
subscriptions.append(ordered_subscription_path)
subscriptions.append(dlq_subscription_path)

In [25]:
# Delete Topics
for topic in topics:

    try:
        publisher.delete_topic(request={"topic": topic})
        logging.info(f"Topic deleted: {topic}")

    except Exception as err:
        logging.info(f"Error while deleting the topic: {err}")

INFO:root:Topic deleted: projects/inspiring-bonus-481514-j4/topics/edem-demo-topic
INFO:root:Topic deleted: projects/inspiring-bonus-481514-j4/topics/edem-ord-demo-topic
INFO:root:Topic deleted: projects/inspiring-bonus-481514-j4/topics/edem-schema-demo-topic
INFO:root:Topic deleted: projects/inspiring-bonus-481514-j4/topics/edem-dlq-topic


In [26]:
# Delete Subscriptions
for subscription in subscriptions:

    try:
        get_subscriber().delete_subscription(request={"subscription": subscription})
        logging.info(f"Subscription deleted: {subscription}")

    except Exception as err:
        logging.info(f"Error while deleting the subscription: {err}")

I0000 00:00:1769009686.455416  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Subscription deleted: projects/inspiring-bonus-481514-j4/subscriptions/edem-demo-topic-sub
I0000 00:00:1769009690.124218  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Subscription deleted: projects/inspiring-bonus-481514-j4/subscriptions/edem-ord-demo-sub
I0000 00:00:1769009692.992216  125321 check_gcp_environment_no_op.cc:29] ALTS: Platforms other than Linux and Windows are not supported
INFO:root:Subscription deleted: projects/inspiring-bonus-481514-j4/subscriptions/edem-dlq-sub
