Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: channel sampling #286

Merged
merged 1 commit into from
Dec 18, 2013
Merged

nsqd: channel sampling #286

merged 1 commit into from
Dec 18, 2013

Conversation

elubow
Copy link
Contributor

@elubow elubow commented Dec 15, 2013

This is still a work in progress, but I think a few things merit discussion before finishing.

The basics are done here for adding sampling through the IDENTIFY. I added a channelProperties struct like in the original commit to store meta information about the channel. The reason I added this again was for new clients. This way when a new client connects, if they have a different sampleRate from the another client on the same channel, either we can choose to throw an error (so all clients maintain the same sample rate) or we can allow it.

I believe @mreiferson was suggesting having the channel have no knowledge of a client's sampling rate. I'm not sure if this is a good idea. Take for example, clientA sampling only 10% of the messages on channelA and clientB sampling 50% of the messages on channelA, what would we expect each client to get? 60%, 40%, 10% (because that client came in first), 50% because the client came last. I think it's not obvious what is to be expected by the client in this scenario. I like the idea of setting channelProperties (using the struct) because can then subscribe to the channel and be "upgraded" (or downgraded depending on how you view it) to sampling or to the correct samplingRate if the channel is already being sampled.

@mreiferson
Copy link
Member

It is my opinion that sampling is a property of the client. This is similar to the state we're in now where sampling can only be implemented in the client by dropping messages on the floor.

My proposal for this implementation is to move this to an equivalent-in-semantics, server-side feature that simply makes everything more efficient (which is often one of the goals of sampling in the first place).

For the record, as things stand today, there exists no guarantee that 2 clients subscribed to the same channel will receive 50/50 distribution. It's likely that they will, but there's nothing that guarantees it.

I think the claim that this approach makes it possible that two clients can be subscribed to the same channel with different sampling rates is equivalent to the likelihood of subscribing to the wrong channel, i.e. not likely. I acknowledge that it's obviously possible, but I believe naming convention and an improved nsqadmin can mitigate and uncover this.

@mreiferson
Copy link
Member

❤️ it

I know you're not going to like this old friend, but we need a test :)

@elubow
Copy link
Contributor Author

elubow commented Dec 17, 2013

The test has been added and tests for sampling within a 3% variance of the sample rate. The final verdict was that this is being done through the IDENTIFY and the message is dropped at the client's messagePump.

@elubow
Copy link
Contributor Author

elubow commented Dec 17, 2013

The last commit was an addition assertion to ensure we've seen a minimum number of messages. The original test only checked for variance above the sample rate not below.

@mreiferson
Copy link
Member

I'm happy with this...

any objections @jehiah ?

@jehiah
Copy link
Member

jehiah commented Dec 18, 2013

I'm good.

outputBufferTimeoutUpdateChan = nil
case interval := <-heartbeatUpdateChan:
client.Heartbeat.Stop()
if interval > 0 {
client.Heartbeat = time.NewTicker(interval)
}
// you can't update heartbeat anymore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you undo these unrelated changes that I had added from this line up to 226?

I want it in a separate commit with an explanation as to why it's happening... I will follow up with a separate PR.

mreiferson added a commit that referenced this pull request Dec 18, 2013
@mreiferson mreiferson merged commit deabf26 into nsqio:master Dec 18, 2013
@elubow elubow deleted the channel_sampling branch December 18, 2013 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants