Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep functions open #114

Closed
gijswobben opened this issue Aug 4, 2017 · 12 comments
Closed

Keep functions open #114

gijswobben opened this issue Aug 4, 2017 · 12 comments
Assignees

Comments

@gijswobben
Copy link

Is it possible to keep a function running? The initialization of my function takes a long time so I have a delay in every call I make. It would be nice if my function could stay open and read every new line that is coming in via STDIN.

@alexellis
Copy link
Member

alexellis commented Aug 4, 2017

Hi @gijswobben thanks for testing out FaaS. It would be great if you could fill out the issue template which was provided when you opened the issue.

Please let us know some more details

  • Runtime (Node/Python/Go etc)
  • Backend - Swarm or Kubernetes
  • How long it takes to run your requests

Also if you can share code or push it to a private/public repo that would help.

@gijswobben
Copy link
Author

The template didn't seem applicable as this is a question (not an issue).. Everything is working fine, I was just wondering if I could speed things up. The initialization of my script cannot run any faster because it has to load some libraries and models from disk. After that I can use a very fast function to do something. I'd love to cache the libraries and models so my function can execute faster.

Runtime -> R (or Python) for data science (XGBoost model)
Backend -> Swarm
Function run time when all libraries are loaded 33ms, initialization time 1.5s

@alexellis
Copy link
Member

alexellis commented Aug 4, 2017

The function watchdog works by forking a process - reading the entire body and then piping that to stdin, anything written to stdout is returned to the caller. So reusing the process as it is, would not possible, however there are some work-arounds that other people are using:

  • Batching - Instead of calling the function 1k times with 1 piece of data each time, call the function with a batch of 1k data items. It may involve a helper bash script
  • Use the asynchronous work - This model is perfect for TensorFlow and machine learning - since you can run the requests for much longer - https://gist.github.com/alexellis/62dad83b11890962ba49042afe258bb1
  • Implement the watchdog interface - As long as you expose a port on 8080 and act like the watchdog FaaS will still manage your container, track metrics and scale it. You could use flask for this.
  • Create a microservice - Create a microservice that is long running and let your function handle the ingestion of data - passing it on to the (XGBoost / R service) and returning the results.

@alexellis
Copy link
Member

Hi, @gijswobben it's been 16 days since my response so I'll close this issue. Please do re-open if you need to, or join us on Slack to chat about any of the solutions outlined above.

@alexellis
Copy link
Member

@gijswobben I've had a chance to revisit this item and I think I have a technical approach for helping with this. Did you try any of the outlined solutions? Do you have an example online somewhere?

@gijswobben
Copy link
Author

I've tried the first approach and the second. The last 2 are no option for me, in my opinion they introduce too much overhead code for something that should be a "function as a service"...

The approaches I did try work, but there is still a lot of overhead. I'm dealing with large quantities of data so even with the batching approach there is a lot of overhead. I don't have an example online since I'm developing this for a customer...

I'm curious to hear your solution. I think the only way to go here is to keep the function open and direct all new requests to this function, reading and writing over STDIN and STDOUT line by line. However, this solution would require quite some changes to watchdog.

@alexellis
Copy link
Member

alexellis commented Aug 30, 2017

Regarding 1 & 2 - batching data and running async would make a huge dent in the issue. The asynchronous work was merged and is available in 0.6.2. https://github.com/alexellis/faas/releases/tag/0.6.2

I have made the changes to the watchdog (not yet available on GitHub) and did some testing - it's blisteringly fast. This is the JVM & Java - https://www.youtube.com/watch?v=gG6z-4a1gpQ

Join us on Slack and let's do some testing with the new approach? email alex@openfaas.com

@alexellis alexellis self-assigned this Aug 30, 2017
@alexellis alexellis reopened this Aug 30, 2017
@gijswobben
Copy link
Author

Thanks, I give it a try

@alexellis alexellis mentioned this issue Aug 30, 2017
11 tasks
@gijswobben
Copy link
Author

Performance is okay.. Certainly not what I hoped for but it gets the job done. What you show in you're video is indeed a fast example, however, try to add a 15 second delay to the initialization of your function and run it again. The startup cost (in this case the delay) is so high that it will ruin your performance.

I think we can close this issue. Thank you very much for the effort and a workable solution!

@alexellis
Copy link
Member

I think you might be going about it a different way to intended.

If you need a 15 second boot-up - then you need it, nobody can change that - but you can put it into the initialiser and not the handler 👍

Maybe you can explain what you mean?

@gijswobben
Copy link
Author

Totally agreed, I cannot avoid the 15 second boot-up. However, it would be nice if I didn't have to wait that 15 seconds with every new call. It would be great to have such an "initialiser". Run the initialiser once, and than run the handler as many times as needed (for every call). However, looking through the documentation and code, there is no way to do such an initialisation right now, is there?

@alexellis
Copy link
Member

No you could add an explicit initialisation separately. We can adjust the template for that or you could do something simple like write a lock file to understand whether to run the initialization a second time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants