[RFC][WIP][2.3] Cache Component #5902

dlsniper · 2012-11-03T20:22:09Z

Bug fix: no
Feature addition: yes
Backwards compatibility break: no
Symfony2 tests pass: yes
Fixes the following tickets: #1513, #3211
License of the code: MIT
Documentation PR: ~
Todo:

feedback from GH;
implement more features for namespaces;
implement a tagging mechanism;
add more tests;
implement more caching drivers;
add create documentation PR for the component;
fix any CS break;
optimizations?

Hello,

This is an attempt to implement a much needed component in Symfony, hence the Request For Comments status.

After reading and the thinking about this for the past weeks and knowing the previous discussions about the possible implementation of such a feature I've decided to make something as simple as possible while still allowing for flexibility to extend it as one might see fit.

One could argue that I'm missing cache namespaces, caching locking and so on, but those are subject to discussion / implementation.

I'll try and add the remaining things from the TODO list in the specified order.

Also keep in mind that this is only the component, I've split the bundle into a separate PR to maintain the feedback separated., see #5903. Also, if it's decided, I'll create a new PR for the implementation of the component in other components/places.

At least for now I don't care about coding style breaks and so on, I think it's the least important thing right now (don't worry, I'll fix everything, if anything, before merging), so please don't comment about it!

As a last thing, I really want to finish this in time for 2.2 release/feature freeze so if you want to help me out, any and all PRs are welcomed.

Thank you!

stof · 2012-11-03T21:29:27Z

This does not fixes #3211 at all as it does not take this discussion into account at all.

dlsniper · 2012-11-03T21:45:14Z

Hi @stof it's a WIP for the moment, I have taken the talk into consideration but since the original author of it doesn't have time nor anyone else jumped with PRs there or meanwhile, it will take a while to take all that into consideration and write all the code around that. Now that I have the bundle working the rest of the component should be done faster as I wanted first to have the user perspective into account as well.

stof · 2012-11-03T21:56:09Z

src/Symfony/Component/Cache/Cache.php

+        } else {
+            $this->instances[$driver][$instanceName] = new $this->drivers[$driver]($configuration);
+
+            $result = $this->instances[$driver][$instanceName];


This would make it return adriver instance, not an integer or a boolean

I'm not sure who to make this as if you want the cache to be managed by the code then you need to get the cache instance but if you already know your instance name then there's no point in returning it, right? Or should I make it return it just for the sake of consistency?

The return value of a method should be consistent. Otherwise, it makes it really difficult to use it. Creating a crappy interface is a bad idea.

stof · 2012-11-03T22:07:42Z

@dlsniper The issue is that what you haven't taken into account is a discussion about the architecture of the component. So it means that taking it into account requires rewriting your component almost entirely.

stof · 2012-11-03T23:45:20Z

src/Symfony/Component/Cache/Container/CacheContainer.php

+    /**
+     * {@inheritdoc}
+     */
+    public function setRawMetadata($metadata)


you should typehint the argument as an array

dlsniper · 2012-11-06T23:10:58Z

Things left to be done that I'm currently not sure about:

implement namespaces if needed -> I'm a bit unsure how/why should we do this. It could be added as a metadata field and then used when the user saves the object into the cache. If it's ok like that, then implementation should be a matter of minutes;
implement locking mechanism for writing -> On this one I'm a bit against it as I can't find a use case for it. If the cache needs to be written by the user and it takes a lot of resources, then the application is not implemented correctly. If the cache needs to be populated by the users and it doesn't take much resources then who cares? If the application has a large traffic volume and cache is generated by the users then the problem is the application designer as for such application the cache should be user-pulled not user-pushed. So if you find any use cases that I didn't covered or I was wrong, do tell;
tagging: I'll implement this as described for namespaces. Add a special key to the metadata, called tags, then have a special key that can hold tags => keynames.

stof · 2012-11-06T23:35:32Z

src/Symfony/Component/Cache/Cache.php

+            return $this;
+        }
+
+        if ($this->debugMode && $this->logger) {


why this check on the debug mode before logging at the debug level ? There is absolutely no such code in Symfony currently.

dlsniper · 2012-11-09T09:29:32Z

@mvrhov maybe I wasn't clear when I actually said that I don't mind using anything that @fabpot decides, but until then I'll continue with one possible implementation for it.

Anyone that wants this to actually move forward should either help with useful comments, like @stof does, either make their own version which uses X library for it and submit a PR if they aren't happy with this.

Talking about doing stuff without actually doing them is worst that reinventing the wheel imho.

dlsniper · 2012-11-09T09:30:18Z

@fabpot should I divert the efforts into making a bridge in Symfony to use Stash then?

fabpot · 2012-11-09T09:32:23Z

@dlsniper We are not in a hurry (not yet at least). So, let's first review Stash before starting yet another PR. We already have 4 PRs for the cache, which is already more than enough.

If we decide that Stash is indeed the way to go, then, we will figure out how to integrate it into Symfony.

tedivm · 2012-11-10T08:51:20Z

@fabpot There are two PR's right now on the Stash and Stash Documentation projects which may resolve some of the weirdness you mention. I'm pushing towards the next release, which does include a bit of API cleanup. The new documentation also contains more examples that include setting up the Pool class, and is being kept up to date with the API Cleanup changes.

API Cleanup
Documentation Rebuild

dlsniper · 2012-11-12T16:04:31Z

@mvrhov I've looked into into Stash but I find it too bloated for a simple thing. Caching should just work and that's it. There's nothing bad in having something that doesn't implement 131241 patterns just to make everyone happy.

Also, after some serious talk with some friends, @c-datculescu and others, we've still haven't been able to find a proper use case for having a lock() method for cache. Just by adding it, it brings a lot of troubles to userland and like I've already stated above, if you find yourself in need to a lock on the cache system maybe you shouldn't do that in the first place.

I'll finish up with implementing namespaces then follow-up with tags and I think that's about it.

More drivers will then follow.

Please, keep an open mind about having something that looks simple, it seems that the article on Symfony.com which was stating that simplicity is a lost art, I believe that this was the one: http://blog.peterfisher.me.uk/2012/08/02/the-lost-art-of-simplicity-a-talk-by-josh-holmes/ I'll try and find the right one later on.

stloyd · 2012-11-12T16:07:25Z

src/Symfony/Component/Cache/.gitattributes

@@ -0,0 +1,2 @@
+Tests/ export-ignore


Should be /Tests to work on Windows too.

This is how currently all the other components have it. Should I make this change and also submit a PR for the rest of them?

It was fixed recently in 2.1 but seems like not yet merged into master.

fabpot · 2012-11-12T16:49:25Z

@dlsniper Keep in mind that at this stage, we need to keep an open mind. We need to evaluate different options. If you think that Stash is not a good option, please share your concerns (I have some concerns as well that I will share once the Stash refactoring PR lands as it definitely goes in the right direction). But just saying that Stash is bloated does not help.

dlsniper · 2012-11-12T18:07:02Z

@fabpot yes, I'll come with some concrete things about it but I'll wait for @tedivm to merge his PR first as maybe some things will either be more clear or improve as I didn't had the patience to check the diff as well.

Again, please take my opinion with a grain of salt, I might be biased but if we decide about anything, I'll be the first one either to help out either to make the PR.

Best regards.

c-datculescu · 2012-11-12T18:24:54Z

Hello.

Since @dlsniper mentioned me i think i should make some of my thoughts more clear regarding locking.
First of all, enabling locking onto a mechanism that is supposed to be very fast is obviously enough a bad ideea. I cannot come up with a real reason for doing that.
That being said, comes the next problem: implementing a high level lock (as opposed to a low-level lock), aka. implementing a pseudo-locking mechanism in php is a terrifyingly bad idea. Let me rephrase: it cannot be done. If the caching server does not support locking mechanisms natively, then it has to mean something. Let's remember again that apc/memcache and the rest of servers used for caching are made for very fast concurrent access, which locking disables in various ways.
Also, locking mechanisms i can imagine for the caches are pessimistic by default. First get the lock, then do your job and release the lock. Which is actually pretty bad. Locks should be granted as further in time as possible, and should be released as fast as possible. Which is probably not the case here.

Let's analyze a little bit a situation. You have a query which result set you want to be cached. According to the locking schema described, you obtain the lock, holding all rest of clients in wait. You start doing your operations, and luckily you manage to finish. But what happens if the php process exits for example? you end up with a lock set in the database for some time. Let's say you set the lock for a small amount of time though. Now the problem is that the lock can expire before the operation finished, so other clients will think that the record is there. And they couldn't be more wrong.
In both cases, for a lock set to low time you get problems, for locks set with high times you also get big problems. Taking into account that there is no such thing as transaction enabled operations on either memcached/apc, you can also end up in problems regarding partially saved data if it happens to write multiple keys in a series.

If we also factor into this equation multiple server setup and so on, i think the conclusion about locking should be pretty obvious. In my opinion it is impossible to do without hitting problems later down the road, and these problems can be very hard to debug/manage.

PS: i have also seen mentions about fallback for a chosen caching mechanism. That can end up in circular reference which would be really really bad to get out of.
PS2: If you really want to implement some sort of locking, i think the only solution is to execute the code that needs to be cached on all clients that cannot acquire a lock, and let them continue with execution. As soon as the key becomes available, then it will be used by all successive calls.

Best regards,
Cristian.

tedivm · 2012-11-13T06:36:44Z

@fabpot If there's anything you think I should add in as part of the refactoring job let me know. Development on that is going to slow down a little bit now that I'm out of the weekend (doing the real job during the week, with coding at night). There are a few more things planned, but I'd love any suggestions you have on top of that.

@dlsniper As I've been doing the refactoring I've also been updating the documentation. You should look at the dev site and see if that helps.

I am going to disagree with you about bloat. While it does have a lot of features, they are geared towards performance. Just because something has a lot of code doesn't mean it's bloated- in fact many of the items make it faster or remove various types of issues. For example, by building in some intelligent serialization into the Filesystem driver Stash actually manages to avoid having to unserialize most datatypes, resulting in much faster reads even though there's more code. As another example, when MetArt switched to Stash from their own native memcached calls they were actually able to remove some of their search servers do to the performance gains (even though they weren't caching anything new).

@c-datculescu Your comments about locking on dead on, but I still feel there's a place for a certain form of locking. The big thing is to understand that it won't be perfect, and then find a way to make it so perfection isn't needed.

The big question is why lock something. In most case it's to prevent a cache stampede (the dog pile effect) by making it so only a single process is regenerating it. If this is the use case trying to be solved then locking is a valid option- the worst case scenario is that your imperfect lock gets missed, causing a few additional instances to rebuild the same data. The alternative to that is having all of the instances running reprocess the data, so providing an imperfect lock still has it's benefits.

Additionally some of the other problems you brought up are easy to mitigate. The problems mentioned above really only apply to exclusive locks, which aren't the only kind. It's possible to put in a write lock while still allowing reads, letting other processes use the old data without allowing them to write new. You can also tell the caching system to use a default when it's locked, or (and this is my personal favorite) actually have the item pregenerate several minutes before it expires.

As long as locks are optional, and the developer has control over how to respond to a lock, then they have some serious benefits.

Baachi · 2012-11-13T07:57:42Z

src/Symfony/Component/Cache/Driver/Apc.php

+ *
+ * @author  Florin Patan <florinpatan@gmail.com>
+ */
+class Apc implements BatchDriverInterface


I would suffix the classname with Driver.

dlsniper · 2012-11-13T21:55:31Z

@stof would you mind if I'd squash this a bit? I don't really like a long list of commits.

Baachi · 2012-11-15T07:39:37Z

I vote for integrate a existing cache library, like stash. I hate to reinvent the wheel. Symfony2 integrates already some great libraries like monolog, doctrine/common, twig and so on.

But i must say, i don't like the design from stash. The implementation with the Pool instance is a little bit "weird".
And the coding style need some love. But this things can be fixed.

Is there any reason, why we don't integrate the zend/cache library?

tedivm · 2012-11-15T07:46:46Z

@Baachi what about the Pool don't you like? It wasn't in the original versions of the library, but over time became more and more obvious that it had a real need. That being said, most people using Symfony won't see it directly, as they'll call cache objects out of a service (you can see this in the still very alpha bundle for Stash).

dlsniper · 2012-11-15T07:49:37Z

@Baachi if you look at one of the previous threads, zend/cache was proposed then dismissed as having too many dependencies on other Zend libraries.

While the codding in Stash isn't a problem, a quick run with a IDE can fix the style, the other thing that you mention, the cache pool is something that @tedivm should help you with.

And like I've said earlier, when Fabien decides what we should have for Symfony2, I'll either close this or migrate it to what's decided.

dlsniper · 2012-11-15T07:51:23Z

I see that @tedivm already replied. Can you tell us why that cache poll was a real need? I'm having troubles understanding it myself as it's not a approach I've usually seen. Thanks!

tedivm · 2012-11-15T08:02:50Z

@dlsniper I'll write something up tomorrow- I'm just about to head to bed here. As a summary though giving the Cache Item a sense of "state" (as in, it's associated with a specific object and will remember about it between calls, as opposed to the driver method where each operation is completely independent) reduces multiple race conditions, makes it possible to return false and null as cached objects with no difficulty at all.

In other words, the real value that the Pool brings is allowing each Item to be associated with a specific Key. I'll try to write a more in depth set of reasoning over the next couple days (or just cobble together the points from the PSR discussion).

dlsniper · 2012-11-28T19:45:06Z

@fabpot any thoughts on how should Symfony2 has the caching organized? I'm asking this because December is coming and most likely everyone will start being either away or very busy/away so there will be little time to take the decision and have the solution implemented / find every points that need to be analyzed/covered and tested in order to have something stable-ish by the end of this year.

I'll state again, I'm not attached to any particular solution as long as we have one and is implemented everywhere.

tedivm · 2012-11-28T20:03:43Z

Although those lines, I should point out that substantial progress has been made with bringing Stash up to line. I've release the v0.10.* line which contains the API recommendations I've received, as well as substantial amounts of my own ideas.

From my perspective the next step in there is to get feedback from Symfony project members about anything they feel needs to be changed. I'll incorporate those changes in, and either release more in the v0.10.* line or bump up to v0.11 if any are backwards incompatible.

At the same time, the Stash Bundle has made some progress (although that was on hold for the other API changes). It's test suite runs, but still needs to be fleshed out. We have some decent debugging built in, including a nice collector for data for the Symfony admin panel. We've made some adapters- a Session adapter and a Doctrine adapter- and more are on their way to cover other services. Once we have the adapters made and working with the individual projects, I'd like to work with them on incorporating some changes to make caching more efficient and possibly have a single unified interface (allowing others to more easily build out caching libraries without needing multiple adaptors).

I know a decision has not been made about whether or not to use Stash, but I'll continue making progress on the library and am happy to incorporate relevant changes that make Stash better regardless of whether it becomes the default caching library, so please keep the feedback coming!

lsmith77 · 2012-11-29T12:00:53Z

@dlsniper i really urge you to involve yourself in the discussions on the FIG mailinglist. think it would be huge pitty if we push through something for just Symfony2.

dlsniper · 2012-11-30T15:40:21Z

@lsmith77 I think that what I've implemented in here is a bit different that the existing proposals. Now, I'm not sure if I'm allowed to open up a new proposal, or even if I should open up a new one but I do believe that caching should be a simple, quick thing and that's the philosophy I've tried to have here.

Maybe in this regards, evert's proposal would be closer to this and what I'm trying to achieve.

What do you think the best approach would be?

lsmith77 · 2012-11-30T15:42:05Z

anyone can bring up a proposal .. but step one is to involve yourself in the discussion. there is always the chance that you will change other peoples minds, or you will change your own mind or both. but if you stay on the sidelines nothing will happen.

dlsniper · 2012-12-11T23:08:36Z

@fabpot should we mark this topic for 2.4? as it seems that FIG currently has long issues about naming things/placeholders rather that focusing even on bigger problems like logger, events, cache and so on, don't take it a rant.

I say 2.4 because, 2.3 iirc shouldn't introduce new things/major refactoring and things like HttpCache refactoring, proposed in #6213, could definitely benefit from this component so we either rush the implementation starting now either way for 2.4, with later making more sense if we want to be compatible with FIG decision or just provide a bridge (not that good) for that.

fabpot · 2013-03-23T17:08:40Z

Closing as there are 4 opened PR on this topic. I would prefer that we first discuss the interface and what we want before starting coding. I suggest that this discussion happens on the Symfony dev mailing-list. Also, we need to discuss what to do with the current discussion on the FIG group mailing-list.

fabpot · 2013-03-23T17:13:10Z

https://groups.google.com/forum/#!topic/symfony-devs/EJr6XhawJTE

stof reviewed Nov 3, 2012
View reviewed changes

dlsniper mentioned this pull request Nov 5, 2012

[RFC] Move logger and stopwatch to separate component #5911

Closed

stof reviewed Nov 6, 2012
View reviewed changes

stloyd reviewed Nov 12, 2012
View reviewed changes

Baachi reviewed Nov 13, 2012
View reviewed changes

Cache Component implementation

795471d

dlsniper mentioned this pull request Dec 1, 2012

Yet another cache proposal php-fig/fig-standards#63

Closed

fabpot closed this Mar 23, 2013

vitorbrandao mentioned this pull request Apr 22, 2013

Cache component ppi/framework#78

Closed

[RFC][WIP][2.3] Cache Component #5902

[RFC][WIP][2.3] Cache Component #5902

Conversation

dlsniper commented Nov 3, 2012

stof commented Nov 3, 2012

dlsniper commented Nov 3, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stof commented Nov 3, 2012

Choose a reason for hiding this comment

dlsniper commented Nov 6, 2012

Choose a reason for hiding this comment

dlsniper commented Nov 9, 2012

dlsniper commented Nov 9, 2012

fabpot commented Nov 9, 2012

tedivm commented Nov 10, 2012

dlsniper commented Nov 12, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabpot commented Nov 12, 2012

dlsniper commented Nov 12, 2012

c-datculescu commented Nov 12, 2012

tedivm commented Nov 13, 2012

Choose a reason for hiding this comment

dlsniper commented Nov 13, 2012

Baachi commented Nov 15, 2012

tedivm commented Nov 15, 2012

dlsniper commented Nov 15, 2012

dlsniper commented Nov 15, 2012

tedivm commented Nov 15, 2012

dlsniper commented Nov 28, 2012

tedivm commented Nov 28, 2012

lsmith77 commented Nov 29, 2012

dlsniper commented Nov 30, 2012

lsmith77 commented Nov 30, 2012

dlsniper commented Dec 11, 2012

fabpot commented Mar 23, 2013

fabpot commented Mar 23, 2013