Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic semaphore initialization from ticket quota #98

Closed
dalehamel opened this issue Feb 8, 2017 · 4 comments · Fixed by #120
Closed

Dynamic semaphore initialization from ticket quota #98

dalehamel opened this issue Feb 8, 2017 · 4 comments · Fixed by #120

Comments

@dalehamel
Copy link
Member

dalehamel commented Feb 8, 2017

The gist of what is proposed here is to allow us to eliminate the assumption that there are a fixed number of workers (resource consumers) on a particular host. In a more dynamic scheduling environment (think: kubernetes), we cannot be certain of the number of resource consumers on a given host.

This is problematic, because under the current model we would have a ticket quota of a fixed size for a fixed number of workers. For illustration:

  • Assume we have a resource that permits 5 tickets (T) -> T = 5
  • Assume we have 10 workers (W) -> W = 10
  • In this case, a quota (Q) of only half of the workers may access the resource at a time - > Q = 0.5

So, since W is no longer static, we need T to be able to react to it, such to preserve Q at 0.5.

The proposed implementation is as follows:

  • Have a new semaphore based on the maximum number of tickets that tracks the tickets per worker (needs to be unique per process or thread -probably parent pid or pid_threadid). This would be the "quota semaphore", or "worker quota semaphore", tracking the number of worker tickets that have been issued. As new, unique, workers are added, we decrement this value. As they are removed, increment it.
  • The difference between the quota semaphore and the configured global maximum is the number of workers participating in the quota. This allows us to keep track of W.
  • As we update the quota semaphore, we can dynamically update the number of available RPC tickets (T), in order to maintain the desired quota (Q).

A nuance to this is:

When workers unregister themselves (they're killed or stop and SEM_UNDO does its thing), the worker count needs to be adjusted by something. We can cache the worker count in a semaphore in the semaphore set for the resource. On #acquire if it's different, we call update_ticket_count. It seems better for this reason to do it at #acquire time rather than #register time.

@dalehamel
Copy link
Member Author

Had a chat with @sirupsen and I think that the approach we've settled on boils down to:

Have a new semaphore based on the maximum number of tickets that tracks the tickets per worker (needs to be unique per process or thread  -probably parent pid or pid_threadid).

The difference between this and the configured global maximum is the number of tickets currently available for that resource.

Based on the updates to this, we can dynamically update the number of available RPC tickets, based on some fraction. If the floor of this is different from the previous floor, do a thread-safe update of the ticket count. 

@sirupsen
Copy link
Contributor

sirupsen commented Feb 8, 2017

@csfrancis @casperisfine any comments on this approach? Namely the newest comment from Dale

@dalehamel
Copy link
Member Author

@csfrancis asked me:

in k8s are sysv semaphores shared across the entire physical host? that seems strange

and this brings up a point of clarification: the reason why any of this is necessary is because we have to use hostIPC for logging. Because we are logging to SySV MQ, we are forced into using the host IPC namespace.

@sirupsen
Copy link
Contributor

sirupsen commented Feb 8, 2017

K, @csfrancis had a short conversation and he's onboard with the solution of dynamically adjusting ticket counts. Some excellent point Scott made:

  • We should call the configuration option quota, not tickets to not confuse the two. ArgumentError should be raised if both are set.
  • When workers unregister themselves (they're killed or stop and SEM_UNDO does its thing), the worker count needs to be adjusted by something. We can cache the worker count in a semaphore in the semaphore set for the resource. On #acquire if it's different, we call update_ticket_count. It seems better for this reason to do it at #acquire time rather than #register time.
  • A problem with the alternative approach of Semian["#{Process.ppid}_#{Thread.id}_#{resource_name} that Scott pointed out is that you'll basically have to GC it. By default, this limit is 32,000 on Linux. We have about a 100 in Shopify. If we run, say 10 pods per host, it'll take 32,000 / (100 * 10) = 32 deploys before we exhaust this space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants