-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revised durability #219
Revised durability #219
Conversation
I re-pushed these changes since I didn't realize I broke my tests in a last minute change yesterday. The tests should be passing now. |
Looking at this, the current implementation needs to be aware of each type of node and serialize it, and attach additional information to the nodes like the expressions used to create them. This effectively makes all of our node types part of the serialization contract. This might be necessary, but has the downside of making sure we keep the compatible with what previous versions may have serialized with them. Just for discussion -- either as part of this change or a future one -- should we consider keeping and serializing the beta graph structure defined by this schema, and using a Fressian (or other) serialization of that as our rule base? It keeps the node IDs and all expressions that need to be compiled, which we could cache at deserialization time as well. It seems like it might be simpler to keep and (de-)serialize that rather than plugging into the runtime network. Perhaps there are some downsides to this, such as performance costs, that make it not worthwhile? |
children | ||
join-bindings | ||
(:new-bindings beta-node)) | ||
{:accum-expr accum-expr}))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make these items fields on the nodes themselves rather than using metadata? It seems like metadata just obscures things from someone who might be wanting to inspect or debug the nodes themselves, as opposed to just having it on the record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this relates to #219 (comment)
I do think these should be made into first-class attributes of something. I think the metadata is sort of a "hack".
I'll echo the statement I made there
I actually went with this upon first pass because it was least invasive to the clara.rules.compiler and the clara.rules.engine.
The metadata was a minimal addition to get this off the ground. To think about this though, if we did want a non-metadata approach, would putting it on the node really be the correct place? Perhaps that would be useful for debugging (I think I would have had some value from that in the past). However, if we targeted the beta graph for serialization instead, then this wouldn't necessarily be node attributes.
So again, I just went with this as a "getting the implementation off the ground" approach. If you have strong reservations against that though, we can explore our alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha. Agreed that if we took the path of serializing the beta graph, we could (and should) remove this from the nodes outright. We can revisit this after considering the beta graph approach, if necessary.
I agree with you that I think this is a better approach in the longer term if we are going to get to a place where serialization from one version of Clara can be deserialized in a different version. The more we can "delay the compilation" process the better off we'd be in that regard as long as the compilation "recipe" (the beta graph here) doesn't need to change in non-passive ways. This would add added cost to the deserialization of the rulebase however. I think we'd need to The main drivers behind these changes (from the perspective of how it is implemented currently and our real-world use-case) are for highly optimized deserialization times, with the serialization time a lesser concern. Obviously people's use-cases will vary and we should really strive to make all paths as quick as we can. I think we also are going to run up to issues with maintaining a passivity on serialization of sessions that were based on the (local only right now) memory representations that we may also change from one version of Clara to another. Again, any sort of added logic we have to do during deserialization time can hurt performance. Overall, I would like to just take this approach for the rulebase serialization with strong intention of changing it to something like the beta graph as you suggest. I actually went with this upon first pass because it was least invasive to the clara.rules.compiler and the clara.rules.engine. The implementation of the ISessionSerializer is not something that we'd expect to have many customized implementations of, since it will necessarily have a decent level of coupling to Clara's internal structures. So I thought we could evolve it a bit more before starting to claim that rulebase serialization can be passive across versions of Clara. Let me know of any further concerns you have over this right now though. The feedback is certainly valuable. |
I forgot to add that before I went with the Fressian based ISessionSerializer implementation, I had a mostly working branch that required no foreign dependencies and instead relied on Also, any sort of manipulations done to I did keep this stuff in a public gist in case you or anyone else was interested to see it (just for fun or learning). I will not claim that it 100% works with the latest version of I do not plan to maintain this right now though, since like I said, it just wasn't performant enough for our practical use. |
session-serializer | ||
mem-serializer) | ||
|
||
(let [rulebase-data (.toByteArray rulebase-baos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbrush
This is the transition to byte array backed streams instead of temp files as you commented on previously.
Fair enough. I don't mind moving forward with an experimental approach and refactoring it as we go. As for performance, I'd imagine most use cases would involve (de-)serialization of sessions rather than putting rule bases in the critical path. I'm sure we can optimize our current compiler logic, but in the rule base scenario you'd have to go through the compilation/JIT/JVM hotspot optimization every time you deserialized no matter what. |
I agree that the session deserialization is likely to be the most reasonable case to expect to be put on the critical performance path. You have good points on the compilation of the rulebase issues. A lot of the Clara compiler time is really spent in the Clojure compiler, so there is only so far you can go with some of that (not sharing, trying to share compiled forms a bit more perhaps, etc). |
This is a first pass at the work described in #198 .
There are two protocols in
clara.rules.durability
clara.rules.durability.fressian
namespaceorg.clojure/data.fressian
see https://github.com/clojure/data.fressian:dev
dependency for testing).clara.rules.durability
can be useful concepts or helpful to implement the IWorkingMemorySerializer. The Fressian handlers defined inclara.rules.durability.fressian
have been written to be able to serialize/deserialize most Clojure persistent data structures, metadata, and preserve record identity relationships among each other or where they are used in different aggregates. This is useful to implementations of IWorkingMemorySerializer as well, if Fressian is in consideration.The approach to serialization is documented quite a bit within the
clara.rules.durability
andclara.rules.durability.fressian
namespaces.The old functions from this
clara.rules.durability
namespace have been removed since they were not actively maintained and were out of sync with the evolution of Clara's memory and the engine over time. The motivations behind this new approach to durability can be found in #198 .As the
clara.rules.durability
namespace doc string states, this namespace should be considered experimental still. The API itself is subject to change and even more importantly, rule sessions or rulebases serialized in one version of Clara are not guaranteed to be able to be deserialized in another version.