New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EZP-20461: Finishing the implementation using Signal/Slot by enabling all Service integration tests #436
Conversation
This Pull Request does not respect our Coding Standards, please, see the report below:
|
Object states does not seem to be implemented completely, commenting for now.
Looks really good, |
*/ | ||
public function deleteLocation( $locationId ) | ||
{ | ||
throw new \Exception( "Not implemented yet." ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a proxy function this should probably be implemented as it's so low hanging.
@andrerom All your remarks have been taken into account. |
mm, can't see any cache clearing in Cache/ContentHandler->create, as the struct can contain locations it looks to create locations that should affect the subtree cache id's? |
oops, indeed, forgot cache clearing... |
@andrerom:
have been added to:
in commit: 427b32add2b3c05f53d5db2b2e87ef2861bf9f15. I'm wondering if it needs to be added to the following ones as well:
But I guess it's not needed as it doesn't perform changes in terms of Location, right? |
|
Doesn't publish, so I would say "not needed as well"
AFAIK, it creates a copy of the content, but does not publish it as well, so as for
cache clearing added.
Apparently not needed: it only takes care of creating/removing data specific to the user ("ezuser") and nothing at the Content level. |
Ah yes, was thinking userService.. +1 review ping @pspanja / @bdunogier |
I do not see the reason for TextLine to use Aside from that question, this looks really good and has my +1. |
Because a field may exist in multiple languages for a same Content / Version.
|
Ok this clears it up, +1 it is. |
Value Objects in SignalsThe change introduces passing value objects with signals. We explicitly focused on just passing scalar types with signals because of potential asynchronous signal processing. If we want to implement asynchronous processing of signals, those signals must be stored in some queue together with their data. This ought to be trivial as long as we stay with scalar types and IDs referencing the objects. When passing around full value objects, two problems occur:
I suggest to revert back to passing around IDs identifying the value objects and reloading them back from the storage. This might seem like overhead, but with caching in the persistence layer it should not hurt too much. From my point of view the potential problems with value objects embedded in signals outweigh the possible performance drawback. If we decide to pass value objects around, we should do so in all places consistently and clearly document the potential issues. We could also convert the value objects to IDs in the asynchronous processing handler, but then the APIs would differ for asynchronous processed signals and synchronous processed signals. Content\Search\Handler::deleteLocation()What is the reason to implement this in the search backend? I would think that the logic for resolving a location / subtree to the affected content objects could happen in the Public API implementation. Implementing logic like this in every persistence layer will introduce bugs, and makes the implementation of persistence layers far more complex. (Something similar already makes implementing an UrlAlias-Hander far too complex.) Even if this reduces the potential runtime optimizations, I would consider it better to move the involved logic to the business layer and just use the Cache Purging LogicI did not review the cache purging logic in For example: I do not see that cached NamingPlease rename "ez_mid" and "ez_mstring" to "ez_multi_id" and "ez_multi_string" in Those identifiers are supposed to be readable by implementors of the Persistence API and they should provide a basic insight of their meaning. Abbreviations should always be avoided. The internal identifier in Solr is something different, since it not part of the external API. "ms" is fine there. Method Complexity
This method should be extracted into an own mapper class. The logic is getting to complex and the method body to long to sanely maintain it in the future. Also making the Mapper changeable / over-writable could be a sensible extension point. |
It has not been made for performance reasons, even if this is one of the benefit.
I agree with your concerns, but to me, this is highly hypothetical and nothing prevents creating asynchronous signals in the future starting from a synchronous one and take at that moment the design decision of what need to be stored in a queue and how.
I very much agree with that. In fact, I planned to create a separate refactoring issue to consistently pass the Value Objects in Signals. But to make this PR lighter to review and progress, I decided it should rather be treated aside.
At a first glance: it doesn't look difficult to implement in current and future backends while a bit more difficult in the Public API implementation. This is also a very common operation that should be run as fast as possible. The impact of putting the O(n) loop higher in the hierarchy would make it close to impossible to have a final O(1) operation in the backend.
+1, I fully agree with you. But this PR shouldn't change the current cache purging logic as it is out of the scope of the issue it addresses. This should be refactored aside I would say.
Agreed, that will indeed be much more readable!
I somewhat agree: externalizing it would decrease the class complexity at the price of increasing the overall complexity but would indeed allow an extension point. |
I agree with passing around full objects, as it caters to the more general usecase eg: typical signal emitted by legacy kernel on node/view is node_id, which forces user to fetch node again to make any use of the signal also having coded ezcontentstaging, I learned a lot about serializing as much as possible of the current-state info being a good thing. Of course users can get id of value from serialized data and refetch it if what they want is the opposite. |
Side note: can we please make content-staging become the prime usecase for signalslots and rest-api? |
This will not work, as soon as you have multiple frontend nodes, which fire events. We then need a global, sortable revision attached to the changes and must ensure the events are idempotent. This is not yet the case. Currently the events are very likely to be processed in the correct order, but this is not ensured by anything and you will get inconsistencies, when using it that way. This is exactly one reason, why I am against the full value objects embedded in the events – it makes you assume, you could implement something like this, while this won't work in environments, consisting of more then one node. Eventual consistency in a multi-node environment requires a little more effort – otherwise you will "replicate" inconsistent data, which will be almost impossible to fix later. We sure can work out a way for content distribution to work, even on top of the existing events, but this is more then just fetching the data and serializing it, if want to make it resistant against common problems in network architectures. |
We were aware, that there might be information missing, since we just added the default values, as we needed them. I can see that the delete signal makes it hard to fetch additional information later ;-) … I would still suggest to just add this missing information to the event structs – but keep it scalar values.
We need this "asynchronous" processing as soon as we have multiple frontend nodes, I guess. Since the events are a public API, we won't be able to change them later again, but we would require two different event APIs (one strictly only for localhost, and one for multi-node-environments). And since network architecture is not a trivial issue, and a topic often misunderstood, I guess the APIs might commonly get misused. From an architectural point of view I would still prefer the "safe" way and just use simple scalars. Additional note: As far as I know: The multi-node environments are a strategic goal, so we should optimize the architecture for that, and not for the much simpler single-node use-cases.
It is just that I would prefer to remove basically all logic from the persistence backend and keep the API as slim as possible, with the only the essential storage methods. That is the main point of this layer, after all. Abstraction, of course, always removes potential for optimizations. In the end it is a matter of taste. |
@kore I see your point, but i'm not 100% sure about it. (sorry for going a bit offtopic and detailed)
Or are you thinking about architectures where there is no centralized db/nosqldb/whatever-content-storage? As for the "idempotent" part: replaying a "delete node" or "move node" can of course never be idempotent. |
@gggeek: Thanks for the input. This is definitely something we should keep in mind. But I will take the discussion on this out of this thread. I will write down a rough draft how we can make this work, mentioning the involved problems. I will CC you, once I wrote that down. |
How would the information be stored in a queue? Would that be the whole signal object that would be serialized? In that case I see the benefit of having scalars only. Who would serialize the info? A generic slot catching all signals, are only signals that needs to be treated asynchronously? I have the feeling that we can take the gentle approach of keeping Value Objects in signals to avoid costly fetch operations, cache not helping here and this has the benefit of simplifying the slots as well, and only store what needs to be stored in a queue from the signals, with some possible conversion. That way, we move some complexity from all Slots implementations to a possible conversion mechanism (that can most certainly be generalized). e.g.:
|
With the old state signals are just a name and a simple hash map. That's trivial to store with every format, which comes to mind. If we introduce a conversion layer:
Additionally there are potential issues even with locally deferred event processing. Events / signals are asynchronous by nature – using them in another way I would really dislike. We should, at least, also give them another name, then. (For the same reason, the Symfony2 kernel events should have been called Pipes & Filters, not events, because this is, what it is.) Also, event data must be considered immutable, but our value objects suggest otherwise. People will modify those value objects in one event handler, which will then affect all following handlers of the same event. This does not happen with scalar values. |
@kore maybe outlining the different usecases for async-events as opposed to a pipe|filter system is also something which can be beneficial in the document you mentioned writing. Apart from content-staging, I can see generation of image-variations or video transcoding (both need an async/deferred unit of work), custom cache-purging. And the holy grail: the no-shared-cache-with-minimallly-chatty-intra-node-cache-expiry-messaging architecture |
The PR has been reworked and Value Objects are not anymore part of signals. However, none of the other remarks have been taken into account here as they are all out of the scope of the issue targeted by this PR (including s/_mid/_multi_id/ which I thought was introduced here, but is already part of master as of 969e992) and I guess those remarks should be taken into account regardless of this PR being merged or not as they remain valid anyway. |
+1 |
Perfect. Thanks @patrickallaert :-) +1 |
@@ -632,6 +659,24 @@ public function rollback() | |||
} | |||
|
|||
--$this->transactionDepth; | |||
unset( $this->commitEventsQueue[$this->transactionCount] ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickallaert A bit late maybe, but should this int be decreased here? Or should transactionDepth be used as key instead?
Context: Looks like this will fail if there is several rollbacks in nested transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch :-S sorry @andrerom I don't have any clue about this, this is way too old for my memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really relevant anymore :)
signals are now by default sent only when transaction they are in are committed, making them all transaction safe and ready for being async safe.
https://jira.ez.no/browse/EZP-20461