Durability Revisited #198

mrrodriguez · 2016-05-27T18:27:23Z

Background

Clara currently has a fairly basic implementation of durability. This is documented [1] and mostly involves the namespace clara.rules.durability.

From my understanding, the goals of this are:

Provide a way to serialize all of the facts in working memory into an EDN-style format that is wrapped in some extra information to connect it to Clara's local memory implementation later.
Provide only a minimal amount of data needed to be able to restore a new sessions working memory state back to what it previously was.
Leave it up to the caller to figure out how to make this into a serializable format
- e.g. Store the EDN as strings and read with an EDN string reader, use something like Fressian, etc.

Pitfalls I currently see in what we have:

I do not think this work has been maintained necessarily to the point of being able to deal with all nodes in sessions that exist today.
- e.g. AccumulateNode is handled, but not AccumulateWithJoinFilterNode
The rulebase itself is not persisted. The caller has to be able to provide this themselves.
The session state is restored by inserting all the facts back into a new session with an empty working memory.
- fire-rules cannot be called on this new session now. If it were, there would be many facts produced again due to activations being re-added when we inserted all of the facts prevously in an old working memory state.
  - This makes this new session effectively "query only".
The goal of minimizing stored state, trades-off for needing more time to restore a new session to that working memory state.
- All rule conditions must be re-evaluated and all tokens re-propagated throughout the network etc.
- Essentially, the working memory state is not preserved for the rulebase, only the facts that were in it are.
There is an assumption that all of the data returned to the caller is in some sort of EDN serializable format.
- This doesn't handle arbitrary objects that the consumer may have in memory.
- They could choose to extend something like clojure.core/print-dup for these types, which is a viable strategy.
  - However it is strange that they need to know how to serialize all of the internal structures of Clara, like tokens, etc.

Desired functionality

We have a need for a durability layer of Clara that has different goals in mind:

Provide a way to store (serialize) and restore (deserialize) session state as quickly as possible.
- Time is more valuable than space here. Space is a factor to consider, but we must maximimize the time.
- Restoring the session memory state needs to be faster than re-running all the rules against all of the same old data, plus any new changes again.
A restored session state should be ready-to-go. It should be able to have new facts inserted and the fire-rules called again as if we are in the same process as the original session was in.
- The session state needs to be ready immediately too. There should not be time spent on re-evaluating rule conditions and propagating tokens, activations, etc.
The rulebase should be able to be stored as well, but this should be optional and able to be done separately.
- This part isn't as important, but will be right now until we can ensure that rulebases always compile consistently with consistent node id's etc.
The caller should have some sort of hook into serialization choices for arbitrary objects.
- clojure.core/print-dup is probably a good option on first-pass at least.
Returning the session state to the caller isn't too imporant.
- It also probably will be slower than just writing it directly to some sort of given output stream and reading from an input stream later.
  - We want to maximize time.

This functionality is similar to what Drools has available with its "marshalling" strategies etc.

Initial proposed API

I am thinking something like this would be a good first-pass at this.

;; (require '[clara.rules.durability :as d])

;; `out1`, `out2`, and `out3` are some java.io.OutputStream

;; `in1`, `in2`, and `in3` are some java.io.InputStream that
;; are opened where `out1`, `out2`, and `out3` wrote out to, respectively.

;; For storing the session along with the rulebase.
(d/store-session-state-to session
                          out1
                          {:with-rulebase? true})

;; For storing the session alone.
(d/store-session-state-to session
                          out2
                          {:with-rulebase? false})

;; For storing only the session's rulebase.
(d/store-rulebase-state-to session out3)

;; Restore full session, including rulebase.
(d/restore-session-state-from in1 {}) ; This could be optional.

;; Restoring a session reusing the rulebase from the :base-session.
;; This is for cases where you didn't persist the rulebase per session
;; stored.  e.g. maybe you stored a lot of sessions and don't want to
;; waste space if they are all the same rulebase.
(d/restore-session-state-from in2
                              {:base-session session})

;; We could fail with an informative error if trying restore a session
;; with no :base-session and no rulebase stored.


;; Restoring only the rulebase
(d/restore-rulebase-state-from in3)

The demo

This should be fairly flexible still. I have implemented a rough first-pass at this, which still needs some edges dealt with. It requires a few minor changes to clara.rules.compiler just to have the necessary information for successful serialization. I'll show this work more when it is more polished.

However, I thought perhaps a more interesting thing to show now would be a demo project that persists a session with Avro SpecificRecord data in working memory. The interesting part here is that the caller has a lot of freedom on how to persist the facts in memory.

I have a demo project up
https://github.com/mrrodriguez/clara-durability-demo

The main namespace to look at is
https://github.com/mrrodriguez/clara-durability-demo/blob/master/src/clara_durability/core.clj

Note: I'm using clojure.core/print-dup as the extension point for the caller. I think this will tend to be sufficient, but this could be generalized further if class-type-based dispatch was too weak for some cases.

This is just one possible strategy, but in this demo case I just wrote "placeholder"s into the Clara stored session state when encountering Avro SpecificRecord objects. I stored them up into a dynamically bound vector. This gives the caller the freedom to do what they want to store this vector. I stored it as an Avro array with union typed items and then all in a single Avro file. Clara durability doesn't have to know this. When reading back in, the persisted session state calls a caller-defined function that transparently slips these Avro objects back in where they belonged in Clara's working memory state.

This "placeholder" approach really isn't much different to the Drools' IdentityPlaceholderResolverStrategy marshalling strategy also discussed some at [2]. I'll mention that I think it is cool that there is much less ceremony needed to get similar functionality in Clojure (no strategy pattern proliferation of classes needed).

Java serializable's could also be handled in a similar way.

Also there were Clojure records in memory as well. The caller didn't need to handle these since Clojure has a default print-dup implementation defined for clojure.lang.IRecord's.

Etc

I think the approach here would be sufficient for the needs I currently have. I also think it is extensible and still achieves the goal of allowing the caller to use the appropriate serialization formats for their situation. I don't see anything preventing the use of something like Fressian as well.

My first pass of this on the Clara durabilty side would just be to store the session state as EDN-like character data. This may not be the most compact, but since the rulebase can be saved separately and the items in memory can be stored in a custom serialization format, I don't think it will matter much on first pass. This could be changed to something more compact from the Clara side if desired, such as Fressian, etc.

So I'm just interested to hear your thoughts on all this. Thanks!

[1] https://github.com/rbrush/clara-rules/wiki/Durability
[2] "4.2.4.8. Marshalling" around http://docs.jboss.org/drools/release/6.4.0.Final/drools-docs/html_single/#KIERunningSection

The text was updated successfully, but these errors were encountered:

rbrush · 2016-05-28T16:58:56Z

Sounds like a fun one. I agree we can do much better than the current pass of the durability namespace, (which I labeled as experimental for a reason.) A couple thoughts:

I think we can get an efficient, repeatable rule base serialization by basing it on the beta graph structure (the value returned by clara.rules.compiler/to-beta-graph and defined by clara.rules.schema/BetaGraph), and memoizing or caching the try-eval calls. This structure is after the analysis and elimination of redundant expressions, is easy to inspect and debug, and includes node ids which aids in a faster working memory serialization. We could hold onto the beta graph and serialize/deserialize it to EDN or a more efficient structure.

Deserializing the working memory could be pretty efficient if we cache try-eval, so if the same expression is eval'd multiple times we can just return the previous result. This has some other advantages as well, speeding up reloading of rules during interactive development and allowing different rule bases with overlapping expressions to share compiled functions. (This could significantly reduce the overhead if there is significant overlap of multiple rule bases.)

As for serializing the working memory itself, we should be able to almost directly serialize contents of WorkingMemory to some external store, since it will be consistent with the node ids in the BetaGraph structure. This lets us keep all of the calculated state rather than re-running facts through the session.

Agreed we need to support user-provided serializers, and it would be nice to offer first-class support for popular serialization toolkits (such as Fressian and Kryo, which also supports Avro), possibly in separate projects. Just a thought, but I'd be tempted to have something like an ISessionSerializer protocol that users can extend for for different formats.

mrrodriguez · 2016-05-28T18:57:24Z

Thanks for the feedback. Yeah I think there are two sort of separate concerns at play here: serializing the rulbase and serializing the working memory.

My main focus for my initial pass will be on the working memory with primitive support for adding along the rulebase . I was leaning towards some sort of extensible protocol point. Print-dup was a simple first pass that just made it possible to do custom things with caller-defined types. This is what I had in the example. I like the idea of having some already provided common implementations. A separate project/module makes sense to keep the depended out and all that. I'll look more at this and think of how a protocol alone could allow this flexibility I had in the demo but without as much needed manual intervention.

For the rulebase I've done a really basic pass, but I think there are some really good ideas in what you mentioned around try-eval caching.

We must remember when checking for equivalent forms to eval that all metadata is the same too. Metadata can have things like type hint tags that would throw ClassCastException if we used the wrong eval'd constraint form in some conditions. This should be fairly rare but it is something to keep in mind that = alone won't check. Potemkin library has done similar checking for defrecord+ etc I do believe so we can pull ideas there.

On my initial, more brute force pass to store the rulebase I did be sure to reuse nodes when they already exist elsewhere in the structure. There are a lot of references to the same nodes across the rulebase structure right now. I also think only the :alpha-roots are actually needed for a functional rulebase but I left the rest for now.

Doing something more clever with the beta graph is undoubtedly better in the long run though.

I do think we have one or two issues to tackle on node ID consistency across processes. We were looking to explore this some more in an upcoming issue. I think we've seen some non-determinism in how the network compiles right now and there are at least 1 or 2 places that are the cause.

I'll be without a laptop to work on this until Tuesday or Wednesday next week though so I won't have any code to show or anything until at least after that. I do plan on working on it more next week though.

mrrodriguez · 2016-08-16T15:14:13Z

As a quick update, I have made a lot of progress on this and will plan on having a working implementation to review via PR before long.

One question I currently am having is is there any real reason to preserve the functions in clara.rules.durability?

These have not been properly maintained over the changes and enhancements to Clara. Their tests have passed for the most part, but that is just because they are not testing many of the flows possible in the network. As one example, there is currently no support at all for many of the types of nodes in the rulebase, such as AccumulateWithJoinFilterNode etc. [1]

I have wanted to use this namespace as the primary entry point for this new durability layer and just remove the old. The namespace name is well-suited for this, so I didn't want to just make a new name up. Also, I don't think it makes a lot of sense to maintain this old implementation that already has fallen out of maintenance.

The hope is the new durability API, although with somewhat different goals, is the replacement and the implementation that can be relied on by consumers in the future (we may still want to make some "experimental" or caveat notes on it in the shorter term though).

What are your thoughts on this?

[1] https://github.com/rbrush/clara-rules/blob/0.11.1/src/main/clojure/clara/rules/durability.clj#L19

rbrush · 2016-08-16T15:38:00Z

Feel free to replace the clara.rules.durability namespace with your efforts. That namespace is clearly labeled as experimental and really doesn't work for a number of cases now, anyway.

mrrodriguez · 2016-08-30T22:03:59Z

Pull request up for this #219

WilliamParker · 2016-10-21T10:47:39Z

@mrrodriguez I think this issue can be closed; do you agree?

mrrodriguez · 2016-10-21T23:27:47Z

@WilliamParker yes, this can be closed.

There is still work to do to document the durability in the Github wiki here, however, that can be separate. I've held off on that a bit initially to ensure the API is really in a somewhat stable state (still experimental here).

I will close it.

mrrodriguez mentioned this issue Jun 8, 2016

Rules dynamic/hot deployment #40

Open

mrrodriguez mentioned this issue Aug 30, 2016

Revised durability #219

Merged

mrrodriguez closed this as completed Oct 21, 2016

This was referenced Jan 25, 2017

Durability example is out of date oracle-samples/clara-examples#8

Open

Default fact serializer for durability #262

Open

WilliamParker added this to the 0.13.0-RC1 milestone Feb 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Durability Revisited #198

Durability Revisited #198

mrrodriguez commented May 27, 2016

rbrush commented May 28, 2016

mrrodriguez commented May 28, 2016

mrrodriguez commented Aug 16, 2016

rbrush commented Aug 16, 2016

mrrodriguez commented Aug 30, 2016

WilliamParker commented Oct 21, 2016

mrrodriguez commented Oct 21, 2016

Durability Revisited #198

Durability Revisited #198

Comments

mrrodriguez commented May 27, 2016

Background

Desired functionality

Initial proposed API

The demo

Etc

rbrush commented May 28, 2016

mrrodriguez commented May 28, 2016

mrrodriguez commented Aug 16, 2016

rbrush commented Aug 16, 2016

mrrodriguez commented Aug 30, 2016

WilliamParker commented Oct 21, 2016

mrrodriguez commented Oct 21, 2016