-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Asynchronous Concurrency Limiting #105
Comments
/message: enduser |
Thank you for your interest in OpenFaaS. This project is maintained and made available for hobbyists and commercial users alike, so we need to balance our time across everyone's needs. Whilst we are excited in your interest in using OpenFaaS, we would also ask you to take a look at our contribution guide on Setting expectations, support and SLAs. Commercial users can purchase support in order to get dedicated help from OpenFaaS Ltd, or they can book ad-hoc consulting hours to get an engineer to dedicate time to helping them. If that is not a good fit for you at this time, please check out the OpenFaaS GitHub Sponsors options which are priced for practitioners like yourself. Organisations can also sponsor through their GitHub billing relationship. When you become a sponsor as an indvidual, it will show this on your issues and PRs, so that the community can see that you are supporting our work, and can prioritise your needs. Thank you for supporting OpenFaaS. |
Thanks for the suggestion @kevin-lindsay-1 - we will prioritise your request on the roadmap and have a look into it, in due course. |
Hi, just wanted to chime in as I'm also running into this exact scenario. One thing I noticed is that whatever is load balancing the requests to the function pods does not always send the request to an idle pod. For example, if I have 2 pods of a function with It maybe worthwhile to implement some kind of monitor for when a function pod is "at capacity", i.e. handling |
Sounds like you may not have direct_functions set to false, which causes the gateway to use a Kubernetes service instead of the endpoint load balancing logic in faas-netes. See the guide on production use in the docs and upgrade to the latest version of the chart. @derekyyyan |
I can set I'm also not using Linkerd or Istio. I'll try replicating the problem when I get the chance, but here's my helm chart settings in case you can spot something I'm doing wrong. I'm on version 7.1.0 so should be mostly the same as the latest (7.2.0). values.yamlfunctionNamespace: openfaas-fn
httpProbe: false
psp: true
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx-private
cert-manager.io/cluster-issuer: letsencrypt-production
tls:
- hosts:
- <redacted>
secretName: openfaas-tls-cert
hosts:
- host: <redacted>
serviceName: gateway
servicePort: 8080
path: /
faasnetes:
httpProbe: false
imagePullPolicy: Always
setNonRootUser: true
gateway:
replicas: 2
queueWorker:
replicas: 10
ackWait: 1m
faasIdler:
create: false |
@derekyyyan this is a different topic to the one that Kevin opened, so we are going off piste here and you should raise your own ticket. However. Show us how to reproduce your hypothesis that endpoint load-balancing isn't working for you?
What makes you think this is the case? We've done quite extensive testing and see load-balancing taking place aggressively. I'm including output to show it happening as expected. You are welcome to try the same commands and monitor the results. faas-cli store deploy nodeinfo --label com.openfaas.scale.min=5 -g 192.168.0.26:31112
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-fjpf2
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-fjpf2
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-8nvll
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-5wt4c
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-47ghl
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-8nvll
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-47ghl
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-47ghl
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-5wt4c
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-5wt4c
alex@alex-nuc8:~$ curl -s http://192.168.0.26:31112/function/nodeinfo|head -n1
Hostname: nodeinfo-855949ff9f-dlzt6
alex@alex-nuc8:~$ Async:
|
@kevin-lindsay-1 to focus back on your original suggestion.
A retry has been implemented by at least one user that I know of, and makes for a more natural fit for async functions than synchronous because:
My concerns are:
|
@alexellis The way I see it, the fact that you get a Getting a Re-queueing an async invocation after a
That's a lot of boilerplate and workflow accommodation that in my opinion would be better served being part of the OF framework itself, adding another feature to make sure that things "just work". |
@alexellis I didn't mean to derail the original request - I'll create a separate issue if I get the chance to reproduce the behavior. Although if we get the original feature implemented, I won't have to worry about queue workers sending requests to a "busy" pod as they can simply retry later. |
Thanks for sharing your expectations Kevin. When you have some time, I'd appreciate a reply to the questions that I asked and the concerns on retrying. |
@alexellis Ah, sorry, I wasn't sure if those concerns were around the OF implementation or the user implementation.
Does
I use separate queues for my async functions already because the timing of specific jobs can get fairly difficult to configure with one global queue. A little too much mental overhead for something that's pretty easy to fix with multiple queues, considering how lightweight queue workers are.
Agreed, however if I recall correctly, the If this wasn't implemented in OF I would have to do this myself, and it would likely require the same if not more expensive state, because I don't have the kind of contextual information that something internal to OF would have, so I would have to more-or-less keep track of "everything" so that I can invoke the same function with the exact same headers and payload, as well as cleaning out such a stateful system.
I would say that this is a contract between watchdog and the queue worker, and should therefore be only their collective concern, that is the scope of HTTP codes shouldn't increase unless the flow changes; application-level retry would maybe be a different feature request, but if that one day gets requested and implemented, a distinction between the watchdog and the function/handler itself may need to be made explicit in the HTTP request between the watchdog and queue-worker, if it's not already.
As far as I'm aware with async, watchdog should return a |
Feel free to try out this prototype in your cluster - it's valid until 7th March to play with. # Deploy demo
kind create cluster
arkade install openfaas --max-inflight=5 --set queueWorker.image=alexellis2/queue-worker:retry2
# Deploy test function
faas-cli deploy --name sleep --image ghcr.io/openfaas/sleep:latest \
--env sleep_duration=5s \
--env write_timeout=10s \
--env max_inflight=1 \
--env combine_output=false \
--env read_timeout=10s \
--label com.openfaas.scale.min=1
# Monitor (Terminal 1)
kubectl logs -n openfaas deploy/queue-worker -f|grep -v Received
# Now test with (Terminal 2)
time curl -i http://127.0.0.1:8080/async-function/sleep -d "" && sleep 4 && time curl -i http://127.0.0.1:8080/async-function/sleep -d ""
# View the logs in Terminal 1
You'll see the first request goes in-flight, the second gets retried a few times before being processed.
# Now try again and cause congestion with hey, 10 requests, only 1 can run at once.
hey -c 1 -n 10 -m POST -d "" http://127.0.0.1:8080/async-function/sleep
# To tune the minimum amount of replicas ready, add the the faas-cli deploy:
--label com.openfaas.scale.min=5
# To tune the concurrent connections to each function, update faas-cli deploy:
--env max_inflight=2
|
Will test |
@alexellis I have tested this, playing with the minimum number of pods, and it seems to work quite well. The only thing that I'm wondering about is that when the queue's Specifically, I set the flag |
Thanks for trying it with a basic scenario. OpenFaaS doesn't know anything about individual pods, endpoint IPs, or whether a pod is receiving more or less traffic than has been configured for its watchdog. Endpoints are picked at random. You'll need a service mesh if you want "least connections" as the semantics. Even then, you will see some retries. Perhaps you can try it with a staging workload now to see how well it processes your specific needs? The max retry limit is 10 attempts with 100 or 500ms as the delay for the backoff. |
I ran a quick test, this is what I got:
|
@alexellis, thank you for making a prototype so quickly. I was able to replicate what @kevin-lindsay-1 reported: I set This behavior is what I was originally referring to, sorry if I failed to explain clearly.
I see. I think having a basic retry will inevitably lead to inefficient use of pods, but for my use case this is acceptable, and I can try using service mesh to improve efficiency. Couple more questions/requests:
|
The request was for asynchronous concurrency limiting, that is what the preview / demo does. The ack time has to be set to:
So if you have a maximum backoff of 32 seconds, and a maximum function timeout of 2 minutes, then you need to set your ACK time to 2m32s. Each message is either executed to completion, or finds itself facing a retry, then waits once, before being re-queued. This means that the ACK time is predictable and bounded. Retries are to be expected, that was the feature requested here. OpenFaaS does not track connections to pods, that's the job of a service mesh. Even with a service-mesh, you cannot guarantee that a retried message will hit a pod with open capacity, it can however help that request hit a pod with fewer connections. There are two mitigations for any concern about "a queue worker being clogged up" as such:
Remember that we are using a very contrived example here, of one function which can only process one request at once. The 429 messages could be used to trigger an auto-scaling alert, I'm aware of one or two users who do this. My assumption is that, if the work is completed and the currency is handled in a sane, bounded way, then it is irrelevant whether there is capacity in one of five pods, and a retry still happens. Show us with real data why that matters? Show us what happens when you add a service mesh. Whatever the workload, some tuning and experimentation will be required. This is a specialist case and I would like to see some real usage from the demo before considering further changes or features. |
While I go do some load testing, a couple more clarifying questions:
What happens if a message has been retried for the max # of times? Are you saying that it gets put back into the queue? Or does the queue worker give up and sends a 429 as the official response to the callback URL?
If
I agree that my use case is niche (and perhaps Kevin's too), but I definitely will be using this demo, as the original behavior where the queue worker discards messages when function pods are busy simply doesn't work for me. The extra requests from my previous comment are nice-to-have's, but I could live without those. |
I imagine the latter, but it would be nice to know this explicitly as part of what I assume will be in a follow-up PR to the documentation.
I use a different queue for each async job because of such fine tuning like this (mainly to remove ambiguity), but just looking at your description of the setup I'd say unless you have a pool of other async jobs that are going to be queued at the same time, you probably shouldn't specifically tell the system to effectively overload the workload of the pods. You can likely remove this issue by autoscaling and adding additional pods to handle the workload. |
That actually sounds like a really elegant way to autoscale; I'll need to test that out.
From the testing I've done this is a major step and I'm happy enough with the current behavior; I'll need to eval it over time and see if I've got any pain points, but as it stands I don't currently have any tangible items that would make me think that this solution isn't currently satisfactory. Can't think of any additional things that this feature needs right now. As long as this feature exposes tuning parameters, I'm sure I can take it a long way before encountering any new needs. |
One note on that: if Example:
This can likely be mostly, if not entirely, mitigated by handling |
I've done some load testing, all with 100 requests and a sleep duration of 10s, and using Request Bin to record callbacks:
|
I think the ideal solution would be to dynamically scale either |
@derekyyyan you are missing Do you have a use-case for this request yet @derekyyyan, or is this mostly academic? I'm asking because you will need to do some testing, and tuning depending on your specific use-cases, probably at a queue-worker level. Where you quote a "36% decrease in efficiency" - this is a specific example of where tuning would be beneficial. One replica processing 100 requests with a concurrency level of 5 doesn't make a lot of sense. The demo that I published doesn't have configuration for the backoff delay, or the maximum number of retries. It also doesn't support dead-letter behaviour, like you mentioned. |
For scaling up, try something like: - alert: APIHighRetryRate
expr: sum(rate(gateway_function_invocation_total{code="429"}[10s])) BY (function_name) > 1
for: 5s
labels:
service: gateway
severity: major
annotations:
description: High retry rate on "{{$labels.function_name}}"
summary: High retry rate on "{{$labels.function_name}}" Run This is what it looks like, but beware that the number of 429s will change over time, starting quickly and then backing off as the exponential retry time goes up. |
Hi, My exposure to At the time, my function logged the Now that we're here (I see this mentioned above) the use case I am thinking of is:
For function retries, It seems like my options are:
I am currently leaning towards option 3 to do function retries, using LinkerD to do the retry for me. For example, if my function passes the Also, I'll try to test cold starts in a scale from zero setting. I'm not sure what to expect there...if the pod isn't alive, the watchdog won't be able to respond, right? So in that case, the queue worker I am guessing would return |
Just went through the demo more, and better understand how these settings do tuning. I didn't get a chance to do cold start testing, or with LinkerD but will do so later today.
I'm guessing the variance is to due to picking pods at random, instead of free ones? I'll see if the LinkerD tcp connection balancing stuff makes that go away - like you mentioned Alex. |
Thank you for sharing those results. There are quite a few variables here including the average duration of a function execution. For Linkerd, just make sure that you have disabled endpoint load-balancing so that it can do its thing (direct_functions=true) |
Yes, totally. Now that I've dug in only a little, I see more knobs to turn
and levers to crank...there's a lot!
Appreciate the reminder regarding direct functions.
I'm logging and visualizing using loki stack. I took some screen shots of
the exponential back off tapering...it was nice to visualize when there was
more load.
…On Sat, Mar 6, 2021, 11:37 AM Alex Ellis ***@***.***> wrote:
Thank you for sharing those results. There are quite a few variables here
including the average duration of a function execution.
For Linkerd, just make sure that you have disabled endpoint load-balancing
so that it can do its thing (direct_functions=true)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJWSDHGGQJZALXPUPVZRO3TCJK53ANCNFSM4YIAHTDQ>
.
|
Wow, it's very balanced with OpenFaaS is chewing through the queue and the demand is spread out evenly amongst the sleep function pods. Granted, I am using the sleep function, so there's no variance with each request. But still....cool. Thanks for sharing a demo! |
That's sounds even better than expected. Do you have a similar number to share like your previous test to compare with? What about some of those screenshots? |
@alexellis as it pertains to the prometheus rule for Here are some arguments I might expect against such a change:
To me, it seems that concurrency limits are effectively enforced even if not explicitly provided, by pod CPU/mem limits. I would argue that anyone creating a function should be aware of their function's limits, and since limits have reasonable defaults for CPU and mem, I would argue that concurrency should probably also have a reasonable default, because eventually, a pod is likely to crash if it takes on too many requests, which is kind of just another form of concurrency limiting. Like a lot in OF, it's up to the person writing the function to know how CPU and memory intensive the function is, and as such should probably be expected to know how many invocations it can handle at once before hitting said limits.
A potential issue with this approach, to me, is that if you don't limit concurrency, you could crash the pod, which I would argue is worse than having to retry, especially if your function is stateful or non-idempotent. Even though those issues would be caused by lack of following best practices, I would argue that if OF can protect users from such a case, it probably should, because if a developer isn't willing to retry a function, who's to say they're willing to write stateless or idempotent functions? Additionally, there are likely tricks that could be used to lower the likelihood of receiving a 429 in sync invocations, but I'm not sure that's necessary, as you can auto-retry with most HTTP tools with minimal effort. As a side effect of returning 429s by default, it would improve visibility, because identifying a 429 is easy, if not automatic, but identifying that an error corresponds to a pod crashing is much more difficult to identify and troubleshoot. Another likely beneficial side effect might be that having the user be explicit about their concurrency limits will probably also cause them to think about tuning their pod's limits, which might lead users to go for many small pods rather than few large pods when they might not otherwise think about it. |
Any updates/plans? I encountered the same scenario. Until then, I’ll fork this repo & re-enqueue |
@talhof8 as far as I am aware, resolving this issue would probably require a slightly different set of features, and is only waiting being prioritized (enough demand) & incubated. |
@alexellis I notice that this is in the helm chart, but it doesn't appear to be configurable. Do you think it would be reasonable to create an issue for this? Thinking of making a PR for this, but I'd like to at least offer a discussion if you think it would be a good idea to make other things configurable in here. Perhaps as a separate issue, it might also be nice to be able to choose the alert used for autoscaling, so that you can have 1 function use a certain rule, and other use a different rule, which would have a number of advantages, such as safely testing scaling behavior. |
I would be happy to see an issue. Thanks for commenting. This thread can probably be closed due to the release of the pro queue-worker |
Agreed |
@derekyyyan @talhof8 you may want to take a look at what we did here: https://docs.openfaas.com/openfaas-pro/retries/ |
My actions before raising this issue
Expected Behaviour
If you set
max_inflight
on an async function'sof-watchdog
, you might naturally assume that a queue worker will handle the case when it tries to invoke a function on a pod that has reached itsmax_inflight
, and would therefore send the request to a different pod, if possible.Current Behaviour
If you use a sync function and set
of-watchdog.max_inflight
on an of-watchdog pod, and invoke that function above that amount (assuming 1 pod), you will as expected get429
s letting you know that you’ve sent more requests than you’ve configured to be allowed to be worked on per pod.However, the same behavior exists when using async functions; if you set
queue.max_inflight=5
andof-watchdog.max_inflight=1
, if the queue worker attempts to send that request to a pod that already has 1 invocation in progress, it receives the same429
and forwards it off to the gateway as your official response. I propose that a queue worker would instead retry on429
s, or a similar retry mechanism (as a429
is a completely valid response for functions at the moment).Possible Solution
Having a queue worker retry to a different pod (if possible), potentially with incremental backoff.
I’m also open to a completely different mechanism, and willing to implement.
Context
Having a queue worker retry if it attempts to invoke a pod that’s already hit its maximum concurrent invocations would probably be the last technical hurdle for me to get async autoscaling working well for “long-running” functions, and therefore I'd say this issue is related to openfaas/faas#657.
Your Environment
faas-cli version
): 0.12.21docker version
(e.g. Docker 17.0.05 ): 20.10.2The text was updated successfully, but these errors were encountered: