New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: reliable asynchronous processing #103

Closed
alexellis opened this Issue Jul 7, 2017 · 11 comments

Comments

Projects
None yet
3 participants
@alexellis
Member

alexellis commented Jul 7, 2017

Asynchronous processing should be possible for long-running functions.

Must have:

  • Work can be accepted through a new route or a Header
/async/function/<function_name>

Or via Header:

X-etc: async
  • Work is accepted immediately and a 202 Accepted is returned. This should be handed off to a queue.

  • One or more (scaleable) asynchronous workers read from a queue and call functions

  • Should dequeue item atomically

  • Upon failure another worker should pick up the item

  • For initial version - HTTP should be used by worker to call function just like the gateway does. Timeout will depend on the configuration of the function.

  • Prometheus metrics to be logged for work queued/processed/outstanding

Could have:

  • Watchdog configuration to state whether async/sync is supported
  • Validation in gateway for invocation method
  • Retry logic on failure

Nice to have:

  • Additional logging beyond docker service logs
  • Callback URL could be specified via header or query-string - this could be called by the framework upon completion

Notes:

Have looked into Kafka - design looks overly complex for task at hand.
NATS queuing is not resilient - but NATS Streaming may be suitable.

@alexellis alexellis self-assigned this Jul 7, 2017

@Tofull

This comment has been minimized.

Show comment
Hide comment
@Tofull

Tofull Jul 11, 2017

If you need a beta tester for asynchronous processing, I am in !

Tofull commented Jul 11, 2017

If you need a beta tester for asynchronous processing, I am in !

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Jul 11, 2017

Member

Thanks @Tofull - I've started a quick proof of concept with NATs streaming.

Have you started using FaaS or creating functions already? Do you have async workloads ready for testing?

Member

alexellis commented Jul 11, 2017

Thanks @Tofull - I've started a quick proof of concept with NATs streaming.

Have you started using FaaS or creating functions already? Do you have async workloads ready for testing?

@Tofull

This comment has been minimized.

Show comment
Hide comment
@Tofull

Tofull Jul 11, 2017

Amazing ! You roks ! :)

I used FaaS to deploy some functions that my machine learning experts made with magic.

FaaS works great for prediction service as it is a synchronous task.

As some processing functions need time (training our models), we would use async processing and we already have a workflow ready for testing.

Tofull commented Jul 11, 2017

Amazing ! You roks ! :)

I used FaaS to deploy some functions that my machine learning experts made with magic.

FaaS works great for prediction service as it is a synchronous task.

As some processing functions need time (training our models), we would use async processing and we already have a workflow ready for testing.

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Jul 11, 2017

Member

That sounds like a great use-case. Is there anything you can share on a blog or on Twitter?

Member

alexellis commented Jul 11, 2017

That sounds like a great use-case. Is there anything you can share on a blog or on Twitter?

@Tofull

This comment has been minimized.

Show comment
Hide comment
@Tofull

Tofull Jul 12, 2017

When our developments will gain in stability, we will be able to communicate and mention FaaS as the solution we decided to use in a tweet or some presentations we will make ('cause we are working with French industries in aerospace 🛰 & 🌎 earth observation fields).
FaaS with async should completely meet our needs, here at the Institute of Technology Saint Exupéry. 😃

Tofull commented Jul 12, 2017

When our developments will gain in stability, we will be able to communicate and mention FaaS as the solution we decided to use in a tweet or some presentations we will make ('cause we are working with French industries in aerospace 🛰 & 🌎 earth observation fields).
FaaS with async should completely meet our needs, here at the Institute of Technology Saint Exupéry. 😃

@sandrom

This comment has been minimized.

Show comment
Hide comment
@sandrom

sandrom Jul 22, 2017

It would probably help to map out the use cases for the async processing first as there are a couple different ones I can think of that usually require different guarantees and metrics. anyone know what the users of this library would favor in use cases for this as this would ease up choosing the right queueing options here too. kafka for example might seem overly complex (and it is complex) but it has its uses, but usually I wouldnt choose that for simple response queues like here generally. nats is nice general purpose, but i fear doesnt encompass all options. there is also the possibility of a mixed solution making it simple pub/sub with separate log database (usually you need recent stuff, which is in mem, but you dont lose old stuff this way and support stuff like "oh my car is offline for 15min because of shit internet, but can still get its response"), which gives quite a lot of flexibility and isnt that hard to implement.
problem with using queuing systems like nats is that you end up having a ton of a ton of queues piling up. While this is completely acceptable in your infrastructure for workers and services as its quite limited, when it comes to response queues, thats not so feasable really. mqtt suffers from similar problems in the end when load gets high. seen a couple of implementations that offload mqtt queues to databases though.

sandrom commented Jul 22, 2017

It would probably help to map out the use cases for the async processing first as there are a couple different ones I can think of that usually require different guarantees and metrics. anyone know what the users of this library would favor in use cases for this as this would ease up choosing the right queueing options here too. kafka for example might seem overly complex (and it is complex) but it has its uses, but usually I wouldnt choose that for simple response queues like here generally. nats is nice general purpose, but i fear doesnt encompass all options. there is also the possibility of a mixed solution making it simple pub/sub with separate log database (usually you need recent stuff, which is in mem, but you dont lose old stuff this way and support stuff like "oh my car is offline for 15min because of shit internet, but can still get its response"), which gives quite a lot of flexibility and isnt that hard to implement.
problem with using queuing systems like nats is that you end up having a ton of a ton of queues piling up. While this is completely acceptable in your infrastructure for workers and services as its quite limited, when it comes to response queues, thats not so feasable really. mqtt suffers from similar problems in the end when load gets high. seen a couple of implementations that offload mqtt queues to databases though.

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Jul 22, 2017

Member

As you have mentioned - the various queue implementations available have their own pros/cons. Ideally it should be easy to swap between different "queue" providers or implementations.

This initial branch / work is based around a NATs streaming queue which does have persistence and resilience.

You can see the progress here:

https://github.com/alexellis/faas/tree/async_nats

Guide to testing the branch:

https://gist.github.com/alexellis/62dad83b11890962ba49042afe258bb1

Member

alexellis commented Jul 22, 2017

As you have mentioned - the various queue implementations available have their own pros/cons. Ideally it should be easy to swap between different "queue" providers or implementations.

This initial branch / work is based around a NATs streaming queue which does have persistence and resilience.

You can see the progress here:

https://github.com/alexellis/faas/tree/async_nats

Guide to testing the branch:

https://gist.github.com/alexellis/62dad83b11890962ba49042afe258bb1

@sandrom

This comment has been minimized.

Show comment
Hide comment
@sandrom

sandrom Jul 22, 2017

ah i must have missed that idea - that sounds like the best possible outcome, yes :)

sandrom commented Jul 22, 2017

ah i must have missed that idea - that sounds like the best possible outcome, yes :)

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Aug 8, 2017

Member

Hey @Tofull do you have a draft or published blog yet?

Member

alexellis commented Aug 8, 2017

Hey @Tofull do you have a draft or published blog yet?

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Aug 18, 2017

Member

Please see changes in #131

Member

alexellis commented Aug 18, 2017

Please see changes in #131

@alexellis

This comment has been minimized.

Show comment
Hide comment
@alexellis

alexellis Sep 2, 2017

Member

Work merged into master and released in 0.6.3 https://github.com/alexellis/faas/releases/tag/0.6.2

If anyone wants to start on a Kafka implementation that would be great - otherwise let's spend time using the async code and doing edge-case testing.

Member

alexellis commented Sep 2, 2017

Work merged into master and released in 0.6.3 https://github.com/alexellis/faas/releases/tag/0.6.2

If anyone wants to start on a Kafka implementation that would be great - otherwise let's spend time using the async code and doing edge-case testing.

@alexellis alexellis closed this Sep 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment