Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get final review of Pub/Sub API by Cloud Pub/Sub team from Google #58

Closed
jgeewax opened this issue Feb 2, 2015 · 47 comments
Closed

Get final review of Pub/Sub API by Cloud Pub/Sub team from Google #58

jgeewax opened this issue Feb 2, 2015 · 47 comments
Assignees
Labels
api: pubsub Issues related to the Pub/Sub API. type: question Request for information or clarification. Not an issue.

Comments

@jgeewax
Copy link

jgeewax commented Feb 2, 2015

Placeholder issue until we are ready for a review

@jgeewax jgeewax added type: question Request for information or clarification. Not an issue. api: pubsub Issues related to the Pub/Sub API. labels Feb 2, 2015
@jgeewax jgeewax added this to the Pub/Sub Stable milestone Feb 2, 2015
@blowmage blowmage mentioned this issue May 4, 2015
@blowmage
Copy link
Contributor

blowmage commented May 4, 2015

@jgeewax Pubsub is merged to master. Who should this be assigned to?

@jgeewax
Copy link
Author

jgeewax commented May 12, 2015

@tmatsuo -- any interest in taking a look at the Ruby Pubsub code?

@tmatsuo
Copy link
Contributor

tmatsuo commented May 12, 2015

I'm not proficient with Ruby.
@remi Is it possible to take a look?

@blowmage
Copy link
Contributor

blowmage commented Jul 8, 2015

@tmatsuo @remi Could either of you take a look at this? Thanks!

@jgeewax
Copy link
Author

jgeewax commented Jul 16, 2015

@remi - Any chance you have time to take a look ? If not, we're going to go ahead and cut a release with this ...

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@blowmage @jgeewax I will try to look at it.

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L133

If the batch call returns multiple msg objects, I feel it's more natural with "msgs" like:

msgs = topic.publish do |batch|
       batch.publish "new-message-1", foo: :bar
       batch.publish "new-message-2", foo: :baz
       batch.publish "new-message-3", foo: :bif
end

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L186

How can you create a subscription in project-b, subscribing to a topic in project-a?
Can you add an example?

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L243

Wrong title.

Modifying a Message
->
Modifying a Deadline (or Acknowledge Deadline)

@blowmage
Copy link
Contributor

@tmatsuo said:

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L113
What is the type of the "msg"? Does it contain the auto assigned message id?

Topic#publish returns a single Message object when called without a block, and an array of Message objects when called with a block. See the documentation here.

@blowmage
Copy link
Contributor

@tmatsuo said:

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L165

Do you take care of collisions between the auto created subscription names?

The Pub/Sub service takes care of assigning the name, and I assume it makes sure there are no name collisions.

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

Ok, I just looked at the usage doc in pubsub.rb and put some comments. If the implementation complies to the usage doc, LGTM.

Basically I feel it is very well written and provides very natural feeling. Great job.

@blowmage
Copy link
Contributor

@tmatsuo said:

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L186

How can you create a subscription in project-b, subscribing to a topic in project-a?
Can you add an example?

We don't allow subscriptions to subscribe to topics in a different project. Is this possible? Is it wanted?

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@blowmage

We don't allow subscriptions to subscribe to topics in a different project. Is this possible? Is it wanted?

Yes and yes. It is a very legitimate use case.

@blowmage
Copy link
Contributor

@tmatsuo said:

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L198

What's the default batch size?

I didn't even think of this, we are setting a default batch size of 100, but I think that was for convenience during development more than anything. I will document the default size in the Subscription#pull documentation.

Is 100 a reasonable default? If not, what should it be?

@blowmage
Copy link
Contributor

@tmatsuo

@blowmage

We don't allow subscriptions to subscribe to topics in a different project. Is this possible? Is it wanted?

Yes and yes. It is a very legitimate use case.

Okay, right now subscriptions are created on the same project as the topic. We will have to change the API to accommodate this use case. Do you have any suggestions for how the API should allow for this?

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@blowmage

Is 100 a reasonable default? If not, what should it be?

The API doesn't have any default since it's required parameter.
100 seems fine as long as you have a good documentation with possible max value of 1000.

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@blowmage

Okay, right now subscriptions are created on the same project as the topic. We will have to change the API to accommodate this use case. Do you have any suggestions for how the API should allow for this?

What about:

require "gcloud/pubsub"

pubsub = Gcloud.pubsub # implicitly project-b for example

# it is "projects/project-a/topics/my-topic"
topic = pubsub.topic "my-topic", project: "project-a"

# it is "projects/project-b/subscriptions/my-sub" tied to the above topic
subscription = topic.subscription "my-sub"  
puts subscription.name

@blowmage
Copy link
Contributor

@tmatsuo

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L246

This indicates that more time is needed to process the message, or to make the message available for redelivery if the processing was interrupted.

I feel this sentence is little hard to parse. What does it mean?

Agreed, that is cumbersome. How about this:

A message must be acknowledged after it is pulled, or Pub/Sub will mark the message for redelivery. The message acknowledgement deadline can delayed if more time is needed. This will allow more time to process the message before the message is marked for redelivery.

@blowmage
Copy link
Contributor

@tmatsuo

https://github.com/GoogleCloudPlatform/gcloud-ruby/blob/master/lib/gcloud/pubsub.rb#L277

Does it use the batch interface with v1 API?

Yes, it uses the v1 batch API. We moved all of gcloud-ruby to v1 as soon as it was released.

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@blowmage

Agreed, that is cumbersome. How about this:

A message must be acknowledged after it is pulled, or Pub/Sub will mark the message for redelivery. The message acknowledgement deadline can delayed if more time is needed. This will allow more time to process the message before the message is marked for redelivery.

LGTM

@blowmage
Copy link
Contributor

@tmatsuo We will create a new PR for creating a subscription to a topic in a different project. This will require an API change and a separate PR will allow easier discussion of it.

All of your other comments, including the ones I have not addressed yet are corrected in #190. Thank you for your attention to detail, it is super appreciated. Let me know if you want a different default value for the number of messages to pull. Right now it is 100 but it can easily be changed.

You said:

Ok, I just looked at the usage doc in pubsub.rb and put some comments. If the implementation complies to the usage doc, LGTM.

Basically I feel it is very well written and provides very natural feeling. Great job.

Thanks! Once we get these two PRs we'll ask for one more sanity check from you before we release.

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 16, 2015

@blowmage
Great please cc me on the PRs

@beccasaurus
Copy link

I've been reading through the new docs and updating a sample of mine to utilize any Pub/Sub features that I haven't already used and I ran into 1 thing:

Message#attributes returns an object directly from the Google API client.

When I tried to ...

messages = subscription.pull immediate: true
messages.each do |msg|
  puts msg.message.data.inspect
  msg.message.attributes.each do |key, value|
    puts "#{key} => #{value}"
  end
end

I was surprised by the undefined method 'each' for #<#<Class:0x007f3e1ced2ef0>:0x007f3e1cd07f08>

The attributes object returned from the Google API client isn't Enumerable so users have to know to call #to_hash on it. I vote to update def attributes to return @gapi["attributes"].to_hash

@beccasaurus
Copy link

Other than attributes, Gcloud Pub/Sub has been pleasant and straight-forward.

Two other things surprised me though ...

Blocking on Pull

Subscription#pull defaults to blocking. IMHO I expect an API client call to return immediately. If I want it to wait for a response, I would expect to pass an option. Also it would be nice to have a method on Subscription such as Subscription#wait_for_messages which invokes #pull with the appropriate option.

Message objects

The first time that I pulled messages from a subscription, I tried:

messages = subscription.pull # immediate: true
messages.each do |message|
  puts "Message ID: #{message.id}"
  puts "Message Data: #{message.data}"
end

I had to go into the code to realize that an Event object is what is really returned. This results in my code being confusing:

messages = subscription.pull
real_messages = messages.map {|not_really_a_message| not_really_a_message.message }

... annnnnnnddddd ... I just saw that Event became ReceivedMessage and it delegates data and attributes to the Message object 😄 Thank you!!! 😃

@beccasaurus
Copy link

re: Blocking on pull by default, my assumption that the default #pull would pull immediately is influenced by an assumption that Gcloud will frequently be used in web applications.

For waiting for messages, not only would an alias be nice but also a method that blocks indefinitely, yielding retrieved messages to a block when received. eg:

subscription.listen do |message|
    # this does a Connection#pull.
    # when messages are returned, they are yielded and then #pull is called again.
    # if the connection fails, #pull is called again.
    #
    # this provides a way to easily give a block that will continue to listen for
    # and process messages until manually breaking out of the loop
end

I would need to verify, but node's gcloud lets you say subscription.on("message", function(message) { /* ... */ }); which may do this. As an example of this being a desirable feature.

@blowmage
Copy link
Contributor

@tmatsuo We've been working on creating a subscription in one project for a topic in a different project. We actually think the following code works right now, with no code changes:

require "gcloud/pubsub"

pubsub = Gcloud.pubsub "project-a", "/path/to/keyfile.json"
# This object holds the connection to the project
puts pubsub.project #=> "project-a"

# We can get a topic in another project (as long as we have permissions)
topic = pubsub.topic "projects/project-b/topics/my-topic"
# When creating a subscription, it uses the original connection and project
subscription = topic.subscription "my-sub"  
puts subscription.name #=> "projects/project-a/subscriptions/my-sub"
puts subscription.topic #=> Topic("projects/project-b/topics/my-topic")

However, we don't know how to test it. Specifically, we can't figure out how to grant a service account in the first project permissions to the second project. We would really like to verify this works before adding documentation for this, even if it is a manual test and not automated. Can you point us in the right direction for managing these permissions?

@blowmage
Copy link
Contributor

@remi

Message#attributes returns an object directly from the Google API client.

You are correct, we will fix this. It should return a Hash, not a Google API Client object.

Subscription#pull defaults to blocking. IMHO I expect an API client call to return immediately.

This is strange, because the default behavior is indeed to return immediately. In looking into this further I noticed that the API call to Pub/Sub sometimes took a second to return, although the flag to return immediately is set. When setting immediate to false the call blocks for 60 seconds until Faraday times the request out. (We don't have a mechanism for handling the timeout now, but we are working on this now.)

Also it would be nice to have a method on Subscription such as Subscription#wait_for_messages which invokes #pull with the appropriate option.

This makes sense. We will add this.

For waiting for messages, not only would an alias be nice but also a method that blocks indefinitely, yielding retrieved messages to a block when received.

We also agree with this. Although after some design work on our end we suggest a slight change. Your code yielded one message at a time:

subscription.listen do |message|
  # do the thing
end

We think that it should yield all the messages that are returned from Subscription#pull at a time:

subscription.listen do |messages|
  messages.each do |message|
    # do the thing
  end
end

The reason for this is that we don't have node's nonblocking events, and we don't want to create threads for the users. Our concerns are that we don't want to pull one message at a time because that would be too inefficient. Likewise, we don't want to pull so many messages that the acknowledgement deadline has passed before each message object is yielded. Yielding all pulled messages gives control to the user, and they can use options like threading to process the messages if they want.

sub.listen(max: 20) do |msgs|
  threads = msgs.map do |msg|
    Thread.new do
      # do the thing
    end
  end
  threads.each { |t| t.join }
  sub.ack msgs
end

Thoughts?

@beccasaurus
Copy link

the default behavior is indeed to return immediately

My fault!

I got this impression when I read "Results can be returned immediately with the :immediate option".

Then we show an example that explicitly sets immediate: true which made me assume that :immediate was not true by default.

# Results can be returned immediately with the +:immediate+ option:

require "gcloud/pubsub"
pubsub = Gcloud.pubsub

sub = pubsub.subscription "my-topic-sub", immediate: true
msgs = sub.pull

In the Pulling Messages section, can we show an example of using immediate: false? And note that immediate: true is the default?

I also didn't realize that pubsub.subscription "name" takes an immediate option. What this option does is not intuitive to me ... I'm actually going to look at the code right now to see what it does.


We think that it should yield all the messages that are returned from Subscription#pull at a time

This makes sense so that the developer can ack all of the messages that were received from a particular #pull. Although #listen would be a bit of a helper method for developers who want to easily listen for individual messages. Developers still have full control over pulling behavior by manually pulling.

Node also has an autoAck option, which I think would be useful.

I like ...

subscription.listen(max: 20, auto_ack: true) do |message|
  # do something
end

... because it's simple and I foresee many uses of #listen looking like:

subscription.listen do |messages|
  messages.each do |message|
    # do something
    subscription.ack message
  end
end

But, as you pointed out, this API essentially forces users to send individual ack messages which isn't at all preferable! Although when auto_ack: true we could ack all messages before yielding them. And developers can always #pull manually to have full control.

hmm...

@beccasaurus
Copy link

Is pubsub.subscription "name", immediate: true actually a typo here? Project#subscription doesn't take options 😄

@beccasaurus
Copy link

In the Subscription#pull documentation, update the :immediate option documentation:

When +true+, the system will respond immediately, either with a
message if available or +nil+ if no message is available. When not
specified, or when +false+, the call will block until a message is
available, or may return UNAVAILABLE if no messages become available
within a reasonable amount of time. (+Boolean+)

Should become:

When +true+ or not specified, the system will respond immediately, either with a
message if available or +nil+ if no message is available.
When +false+, the call will block until a message is
available, or may return UNAVAILABLE if no messages become available
within a reasonable amount of time. (+Boolean+)

And it might be confusing to say that nil may be returned when #pull returns an Array. So maybe:

When +true+ or not specified, the system will respond immediately, either with
messages if available or an empty +Array+ if no messages are available.

Also, what does it mean that #pull "may return UNAVAILABLE"? The wording "return UNAVAILABLE" confuses me a bit. Maybe:

When +true+ or not specified, the system will respond immediately, either with a
message if available or +nil+ if no message is available.
When +false+, the call will block until a message is available.
An ApiError with status UNAVAILABLE is raised if no messages become available
within a reasonable amount of time. (+Boolean+)

@beccasaurus
Copy link

How about something like this for the Pulling Messages section?

Pulling Messages

Messages are pulled from a Subscription.

require "gcloud/pubsub"

pubsub = Gcloud.pubsub
subscription = pubsub.subscription "my-topic-sub"

messages = subscription.pull

A maximum number of messages returned can also be specified:

pubsub = Gcloud.pubsub
subscription = pubsub.subscription "my-topic-sub"

messages = subscription.pull max: 10

Messages are returned immediately. If no messages are available, an empty array is returned.

Pulling messages can block until a message is available using the +:immediate+ option:

pubsub = Gcloud.pubsub
subscription = pubsub.subscription "my-topic-sub"

messages = subscription.pull immediate: false

An ApiError with status UNAVAILABLE is raised if no messages become available
within a reasonable amount of time.

@blowmage
Copy link
Contributor

@remi

the default behavior is indeed to return immediately

My fault!

I got this impression when I read "Results can be returned immediately with the :immediate option".

Not your fault at all. That is confusing. We have corrected that in PR #192.

Node also has an autoAck option, which I think would be useful.

I opened issue #81 for auto ack, but no discussion ever came up around it. It is problematic though, I'm personally not sure if we should encourage folks to ack a message before it is handled. Could we move this discussion to that issue?

Is pubsub.subscription "name", immediate: true actually a typo here? Project#subscription doesn't take options 😄

Ugh, yes, we found that earlier today. So embarrassing...

How about something like this for the Pulling Messages section?

That is actually very close to the updated docs in #192.

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 17, 2015

@blowmage

We've been working on creating a subscription in one project for a topic in a different project. We actually think the following code works right now, with no code changes:

require "gcloud/pubsub"

pubsub = Gcloud.pubsub "project-a", "/path/to/keyfile.json"
# This object holds the connection to the project
puts pubsub.project #=> "project-a"

# We can get a topic in another project (as long as we have permissions)
topic = pubsub.topic "projects/project-b/topics/my-topic"
# When creating a subscription, it uses the original connection and project
subscription = topic.subscription "my-sub"  
puts subscription.name #=> "projects/project-a/subscriptions/my-sub"
puts subscription.topic #=> Topic("projects/project-b/topics/my-topic")

However, we don't know how to test it. Specifically, we can't figure out how to grant a service account in the first project permissions to the second project. We would really like to verify this works before adding documentation for this, even if it is a manual test and not automated. Can you point us in the right direction for managing these permissions?

For now, the naive way to do this is to add the service account (or accout) to the other project with an Editor or greater permission.

Soon (it is actually in production, but no documentation), you can set finer grained ACL on topic/subscriptions.

First, look at the gmail push notification example at (Grant publish rights on your topic):
https://developers.google.com/gmail/api/guides/push

In this case, you're authorizing the service account owned by gmail team, to publish messages to that topic. The role "roles/pubsub.publisher" allows it.

As you see in this example, now you can attach various fine grained roles on your topics/subscriptions by calling setIamPolicy method.

If you want to create a subscription in project-a, subscribing to a topic in project-b, you can use the role "roles/pubsub.subscriber". In this case, the example will become:

POST "https://pubsub.googleapis.com/v1/{resource=projects/project-b/topics/mytopic}:setIamPolicy"
Content-type: application/json

{
  "policy": {
    "bindings": [{
      "role": "roles/pubsub.subscriber",
      "members": ["serviceAccount:your-service-account@developer.gserviceaccount.com"],
    }],
  }
}

For now, there is not handy tool for managing the ACLs, but you can use the API Explorer at:
https://developers.google.com/apis-explorer/#p/pubsub/v1/

topic = pubsub.topic "projects/project-b/topics/my-topic"

Does the above code issue an API call to Cloud Pub/Sub?
If so, that's little bit problematic, because the role "roles/pubsub.subscriber" only allows subscribing (the user can not get/list topics).

@blowmage
Copy link
Contributor

@tmatsuo

For now, the naive way to do this is to add the service account (or accout) to the other project with an Editor or greater permission.

Thanks! We will work on adding cross-project permissions first thing Monday morning. Hopefully we can demonstrate this working without any code changes.

topic = pubsub.topic "projects/project-b/topics/my-topic"

Does the above code issue an API call to Cloud Pub/Sub?
If so, that's little bit problematic, because the role "roles/pubsub.subscriber" only allows subscribing (the user can not get/list topics).

This does not make an API call to Cloud Pub/Sub. If you were calling pubsub.get_topic it would, but pubsub.topic does not.

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 18, 2015

@blowmage

This does not make an API call to Cloud Pub/Sub. If you were calling pubsub.get_topic it would, but pubsub.topic does not.

Great :)

topic = pubsub.topic "projects/project-b/topics/my-topic"

I think it is cleaner to have the project parameter as follows:

topic = pubsub.topic "my-topic", project: "project-b"

Sorry if I'm writing wrong code, but what do you think?

@blowmage
Copy link
Contributor

@tmatsuo

I think it is cleaner to have the project parameter as follows:

topic = pubsub.topic "my-topic", project: "project-b"

We will enable this syntax in #192 today.

@blowmage
Copy link
Contributor

@tmatsuo

If you want to create a subscription in project-a, subscribing to a topic in project-b, you can use the role "roles/pubsub.subscriber". In this case, the example will become:

POST "https://pubsub.googleapis.com/v1/{resource=projects/project-b/topics/mytopic}:setIamPolicy"
Content-type: application/json

{
  "policy": {
    "bindings": [{
      "role": "roles/pubsub.subscriber",
      "members": ["serviceAccount:your-service-account@developer.gserviceaccount.com"],
    }],
  }
}

I see that getIamPolicy/setIamPolicy are listed in the API Explorer, but they are not listed in the API Documentation. Should gcloud-ruby expose this functionality?

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 20, 2015

@blowmage
Great point. We will add documentations for these method soon, and they will become official. So it's good to have it in gcloud series anyways.

@blowmage
Copy link
Contributor

@tmatsuo Okay, we will add support for getting and setting the policy. We also see testIamPermissions, is that something we should add as well?

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 20, 2015

@blowmage
Yes, but on the second thought, it is hard for you to support correctly without the official documentation, because you can not tell what values are valid for the role.

I will update here once we have done with the documentation.

@blowmage
Copy link
Contributor

@tmatsuo We are adding methods to get and set the policy on topic and subscription, using the service definition as our guide. Right now they are returning and accepting a hash of values which match what the Google API client is using. We can either document the hash structure in the gcloud-ruby documentation, or we can mark these methods hidden so they don't show up in the docs until you have the main documentation settled. Which would you like us to do?

(The methods have been helpful in setting policies for testing, so we would like to keep them if possible.)

@tmatsuo
Copy link
Contributor

tmatsuo commented Jul 20, 2015

Either is fine :)

@blowmage
Copy link
Contributor

Thanks for your valuable feedback! Please open new github issues for future requests. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

5 participants