## Introduction

In fact, cloud pubsub is really a replacement of [apache Kafka](https://kafka.apache.org/),if you are familiar with Kafka, then for the cloud pubsub is similiar. 

We could do with pubsub is that we could process both batch or real-time data without lossing data as cloud pubsub provide at-least-once algorithm to ensure the data should be delivered to destination, one more thing to notice is that pubsub could also storage the message in cluster but within some periods.

There are some cases could use pubsub like bellow,
![cloud pubsub usecaset](https://lh3.googleusercontent.com/0BAiaS6Tp331qFe4ekq2trQS6SiRUenG6mMEpHmTLASsvRyGZaub1rEWnVDa4lkRRokWkLBPc8TN=e14-rj-sc0xffffff-w3004).

In general, we could use pubsub as meddleware to store, process data that could do distributed processing, we could create publisher as many as we want, they will create some messages and send them to main distributed engine, then the data will be stored in the distributed system, we could create some subscribers that we process the data, we could use pull or push algorithm to decide how to get data, after data is processed, subscribers will send a notification that action is done. That's main use case of pubsub. 

In fact, for publisher and subsriber, there are many relationships: one-many, many-many, many-one, you could get some high level understanding of different relationship.![Different relationship](https://cloud.google.com/pubsub/images/many-to-many.svg)

In image to show is just like this(this is one publisher and two subscribers.):

![publisher and subscriber](https://cloud.google.com/pubsub/images/qs-diag-final.svg)

Let's to use GCP to demonstrate how to use PUBSUB in project.


In [5]:
! pip install --upgrade pip

Collecting pip
[?25l  Downloading https://files.pythonhosted.org/packages/43/84/23ed6a1796480a6f1a2d38f2802901d078266bda38388954d01d3f2e821d/pip-20.1.1-py2.py3-none-any.whl (1.5MB)
[K     |████████████████████████████████| 1.5MB 4.2MB/s 
[?25hInstalling collected packages: pip
  Found existing installation: pip 19.3.1
    Uninstalling pip-19.3.1:
      Successfully uninstalled pip-19.3.1
Successfully installed pip-20.1.1


In [7]:
# I face error with: AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key'
# So I have to uninstall the protobuf related module and reinstall it, it works.
! pip3 uninstall python3-protobuf
! pip3 uninstall protobuf

! pip  install protobuf

Found existing installation: protobuf 3.10.0
Uninstalling protobuf-3.10.0:
  Would remove:
    /usr/local/lib/python3.6/dist-packages/google/protobuf/*
    /usr/local/lib/python3.6/dist-packages/protobuf-3.10.0-py3.6-nspkg.pth
    /usr/local/lib/python3.6/dist-packages/protobuf-3.10.0.dist-info/*
Proceed (y/n)? y
  Successfully uninstalled protobuf-3.10.0
Collecting protobuf
  Downloading protobuf-3.12.2-cp36-cp36m-manylinux1_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 4.6 MB/s 
Installing collected packages: protobuf
Successfully installed protobuf-3.12.2


In [8]:
# first let's install pubsub python client
! pip install  google-cloud-pubsub --quiet

[?25l[K     |██▎                             | 10 kB 28.1 MB/s eta 0:00:01[K     |████▋                           | 20 kB 3.1 MB/s eta 0:00:01[K     |██████▉                         | 30 kB 3.7 MB/s eta 0:00:01[K     |█████████▏                      | 40 kB 4.0 MB/s eta 0:00:01[K     |███████████▍                    | 51 kB 3.5 MB/s eta 0:00:01[K     |█████████████▊                  | 61 kB 3.9 MB/s eta 0:00:01[K     |████████████████                | 71 kB 4.1 MB/s eta 0:00:01[K     |██████████████████▎             | 81 kB 4.5 MB/s eta 0:00:01[K     |████████████████████▋           | 92 kB 4.6 MB/s eta 0:00:01[K     |██████████████████████▉         | 102 kB 4.6 MB/s eta 0:00:01[K     |█████████████████████████▏      | 112 kB 4.6 MB/s eta 0:00:01[K     |███████████████████████████▍    | 122 kB 4.6 MB/s eta 0:00:01[K     |█████████████████████████████▊  | 133 kB 4.6 MB/s eta 0:00:01[K     |████████████████████████████████| 143 kB 4.6 MB/s 
[?25h  Building w

In [3]:
# then let's config the project that we would like to use.
! gcloud config set project 	cloudtutorial-279003

Updated property [core/project].


In [0]:
# then let's auth this notebook
from google.colab import auth
auth.authenticate_user()

In [13]:
# first let's try to delete the topics in case it exists
! gcloud pubsub topics delete first_topic

Deleted topic [projects/cloudtutorial-279003/topics/first_topic].


In [15]:
# detete the subscribers
! gcloud pubsub subscriptions delete first_sub

Deleted subscription [projects/cloudtutorial-279003/subscriptions/first_sub].


In [16]:
# first we have to create the topic that we would like to use to store the messages.
! gcloud pubsub topics create first_topic

Created topic [projects/cloudtutorial-279003/topics/first_topic].


In [17]:
# then we have to create subscribers that could process data from topics
# config which topic that we need to subscribe
! gcloud pubsub subscriptions create first_sub --topic first_topic    

Created subscription [projects/cloudtutorial-279003/subscriptions/first_sub].


In [0]:
# before we do anything, we have to provide with credentials that is used for pubsub
# I just upload the credencial files into the colab
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = [x for x in os.listdir('.')  if x.endswith('json') and x.lower().startswith('cloud')][0]

In [7]:
# then let's try to publish some messages to the topic: first_topic
from google.cloud import pubsub_v1

project_id = "cloudtutorial-279003"
topic_name = 'first_topic'

# then we need to create a publisher client
publisher = pubsub_v1.PublisherClient()

# then let's define the path of the topic:`projects/{project_id}/topics/{topic_name}`
topic_path = publisher.topic_path(project_id, topic_name)

# let's push the message into topic
for i in range(5):
  data = "This is {} message".format(i)
  # one this to notice is data must be bytestring, so that pubsub could be used
  # despite of systems.
  data = data.encode('utf-8')

  response = publisher.publish(topic_path, data=data)
  print(response.result())

print("Finish publishing step")

1271982206845472
1271988843378844
1271997951615529
1271982492354765
1271982671037313
Finish publishing step


In [0]:
# so far so good that we have already published messages into topics,
# next step is we should create the subscription client to consume messages
subscriber = pubsub_v1.SubscriberClient()

sub_name = "first_sub"

subscription_path = subscriber.subscription_path(project_id, sub_name)

# we could create a callback function
def callback(message):
  print("Get message:{}".format(message))
  # we should notify we have consumed message
  message.ack()

streaming_pull = subscriber.subscribe(subscription_path, callback=callback)

print("START TO consume message path {}".format(subscription_path))

while subscriber:
  try:
    streaming_pull.result(timeout=10)
  except:
    streaming_pull.cancel()


START TO consume message path projects/cloudtutorial-279003/subscriptions/first_sub
Get message:Message {
  data: b'This is 0 message'
  ordering_key: ''
  attributes: {}
}
Get message:Message {
  data: b'This is 1 message'
  ordering_key: ''
  attributes: {}
}
Get message:Message {
  data: b'This is 3 message'
  ordering_key: ''
  attributes: {}
}
Get message:Message {
  data: b'This is 2 message'
  ordering_key: ''
  attributes: {}
}
Get message:Message {
  data: b'This is 4 message'
  ordering_key: ''
  attributes: {}
}


Good news, we do get the messages from the topics with subscriptions client, but one thing to notice is we could use pubsub in streaming or batch logic, but for now what I write logic is for streaming logic, as we would like to keep the process running if we get message,  process it and wait for the next.

### PUBSUB with cloud function

In fact, we could also create a cloud function that we could trigger this function using pubsub message, let's just test it.

In [1]:
# first let's write a cloud function that could be used to be triggerred with pubsub
%%writefile main.py
def hello_pubsub(event, context):
  """
  event is a dictionary, `data` field contains the message,
  `attributes` contains some custom attributes.
  context is Cloud Function event metadata, `event_id` contain
  pubsub message ID, `timestamp` contains the publish time.
  """
  import base64

  print("The function is triggered by message ID: {} published at {}"
  .format(context.event_id, context.timestamp))

  if 'data' in event:
    name = base64.b64decode(event['data']).decode('utf-8')
  else:
    name = 'world'

  print("Hello {}".format(name))

Overwriting main.py


In [8]:
# after we have created the main function, let's deploy the function into cloud
! gcloud functions deploy hello_pubsub --runtime python37 --trigger-topic first_topic

Allow unauthenticated invocations of new function [hello_pubsub]? 
(y/N)?  y

availableMemoryMb: 256
entryPoint: hello_pubsub
eventTrigger:
  eventType: google.pubsub.topic.publish
  failurePolicy: {}
  resource: projects/cloudtutorial-279003/topics/first_topic
  service: pubsub.googleapis.com
ingressSettings: ALLOW_ALL
labels:
  deployment-tool: cli-gcloud
name: projects/cloudtutorial-279003/locations/us-central1/functions/hello_pubsub
runtime: python37
serviceAccountEmail: cloudtutorial-279003@appspot.gserviceaccount.com
sourceUploadUrl: https://storage.googleapis.com/gcf-upload-us-central1-aa635b36-c250-4fd0-b45c-1db908086599/5c654279-545d-40ce-ae15-48e88e7b589d.zip?GoogleAccessId=service-227224402169@gcf-admin-robot.iam.gserviceaccount.com&Expires=1592124644&Signature=dlmkKSgqfOKx%2F6opGoVUM8N15Q90XvlMXqIgcbMr4oWcx9igDie8Den2P6wb5XsEdF2dH7tnMmyQSyHy6N8CDhNTta1s%2FK2agkweCwxw%2FwDmOX6opcp9M3sDI7uqGENmQD%2BxLMePAciMW60hx0PIlBYogDnDaaBlHEnKP5RmuAKgYJ5AMWth5JJEy43h9fOtcEQPrt%2F18DvsjGe

In [12]:
# after we have deployed the function, let's try to pubsub one message into the function
! gcloud pubsub topics publish first_topic --message guangqiang.lu

messageIds:
- '1272012491924418'


### Check result of cloud function

As we have already push the message into topics, so let's check result in the cloud function logs, as you could see that we do trigger function with pubsub and get result with `hello guangqiang.lu`. 

Good news, we have already use pubsub with cloud function do the message processing. Last step, we should delete our cloud function in case of billing.

# ![pubsub logs](https://docs.google.com/uc?export=download&id=1EO2Sa5W6PjXDAAryf1INcf8xyU9B0YH4)

In [15]:
# delete the function and topics, subscriptions
! gcloud functions delete hello_pubsub

! gcloud pubsub topics delete first_topic

Resource [projects/cloudtutorial-279003/locations/us-central1/function
s/hello_pubsub] will be deleted.

Do you want to continue (Y/n)?  y

Deleted [projects/cloudtutorial-279003/locations/us-central1/functions/hello_pubsub].
Deleted topic [projects/cloudtutorial-279003/topics/first_topic].
/bin/bash: glcloud: command not found


In [17]:
! gcloud pubsub subscriptions delete first_sub

Deleted subscription [projects/cloudtutorial-279003/subscriptions/first_sub].
