Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve NATS event store implementation #4

Open
erwinvaneyk opened this issue Aug 7, 2017 · 2 comments
Open

Improve NATS event store implementation #4

erwinvaneyk opened this issue Aug 7, 2017 · 2 comments

Comments

@erwinvaneyk
Copy link
Member

erwinvaneyk commented Aug 7, 2017

The currently implemented event store was written with a limited knowledge of NATS streaming, abusing some of the features.

Currently the naieve setup consists of the following subject structure

  • workflow.<id>: each subchannel contains one workflow.
  • invocation.<id> : each subchannel contains the activities of one workflow invocation.
  • _activity: NATS streaming doesn't support wildcard subscriptions as of the moment of writing (see Proposal: Support Wildcard Semantics nats-io/nats-streaming-server#340 for progress on the feature). In order to replicate the behavior of a wildcard subscription this channel is used to publish notifications of when something happens in one of the various workflow/invocation subchannels. The event store implementation has one 'metasubscriptions' on this channel that creates new subscribers when new subjects appear.

Issues with this setup:

  • subjects/channels seem expensive. NATS streaming puts limits on the number of subscriptions and the number of subjects. Though these are artificial, and can be increased, and it remains unclear what happens if changed to a much higher number.
  • subjects/channels cannot be deleted in a straightforward way. This means, even if invocations complete, they still remain in NATS as channels/subjects.
  • Messages cannot be deleted either. Nor can certain channels/messages be marked as deletable. Currently NATS just starts deleting the oldest messages once it is full.

Possible solutions:

  1. Move workflows out of the event store. As these are considered immutable when generated/parsed, they can be kept out of the event store and left to be handled by Fission.
    • Though it might be needed to store them somewhere persistent, as workflow invocations are tied to these parsed workflows. Information which might be lost if the parsed workflow is not stored.
  2. Use a single subject for all the things. I am not sure what the performance hit of this would be as, subscribers would need to go over all messages when recreating the state of anything.
  3. Use a subject per workflow. In this case the workflow is stored together with associated invocations. The problem here might be that you will not be able to delete (when that option becomes available) any subject as that would also
  4. Keep the current implementation and work with the NATS team to implement some of the missing functionality:
    • More advanced garbage collection; ability to mark subjects/messages as GC'able
    • Wildcard subscription support for NATS streaming
    • Ability to delete or archive subjects (or even messages) manually
  5. Switch to a different database or message bus. There is no perfect solution on the market yet that contains all the required properties of the event store (fast, lightweight, persistent, reliable, scalable). A partial implementation exists for BoltDB (dropped after realizing it would need implementation of the entire pubsub functionality) which might be an alternative.

Currently, for the prototype, this is a low priority issue, as for small usage (<1000 invocations) it works just fine. Nothing is persisted yet, as fission-nats.yml deployment is still using in-memory, and can be cleared by simply restarting that deployment. So, until the prototype is advanced enough that it becomes clear what is needed from the event store, the current implementation is okay.

@saidimu
Copy link

saidimu commented Aug 22, 2017

Curious if Kafka was ever considered. Seems like it would fit the bill.

5. Switch to a different database or message bus. There is no perfect solution on the market yet that contains all the required properties of the event store (fast, lightweight, persistent, reliable, scalable).

@erwinvaneyk
Copy link
Member Author

@saidimu it is indeed one of the options we looked at. However, Kafka is on the other side spectrum, containing a lot of overhead and features that are not needed for a simple, internal data store. For the main requirements of it needing to be fast and lightweight NATS seems preferable to Kafka. We are working with the NATS team to resolve some of the issues (in their codebase or this one)

That said, this is one of the parts that still needs to be improved a lot. The interface used to communicate with the data store is deliberately as simple as possible to allow for easy implementation of another data store. So, if preferred, it could be an option to add a Kafka backend as well.

@erwinvaneyk erwinvaneyk added this to the beta milestone Nov 24, 2017
@erwinvaneyk erwinvaneyk changed the title Fix NATS event store implementation Improve NATS event store implementation Feb 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants