[2.2][WIP] Cache Component #3211

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
Contributor

beberlei commented Jan 29, 2012

Bug fix: no
Feature addition: yes
Backwards compatibility break: no
Symfony2 tests pass: no (!)
Fixes the following tickets: GH-1513

This addresses GH-1513, implementing a Caching interface as new Symfony Component.

It also ships a very simple ApcCache implementation, however the focus should be on just having the CacheInterface and let others provide more complex implementations and management. With the interface however everyone can rely on a common API in their Symfony bundles and applications.

As an example a cache delegate implementation has been provided for both Validator and ClassLoader components.

This is just the first step. A second step would be integration into FrameworkBundle with a section:

framework:
  cache:
    name: 
       provider: symfony.cache.apc

Would create a service "cache.name".

Integration points would be the provider (expecting a service id) and an options key, so we could have:

framework:
  cache:
    name: 
       provider: doctrine.common.cache.apc
       options:
          prefix: my_cache_prefix

The first cache provider in the list should also be aliased to "cache", so that all bundles can always trust on "cache" being available as service.

@stof stof commented on the diff Jan 29, 2012

...y/Component/ClassLoader/CacheUniversalClassLoader.php
+ *
+ * * The PEAR naming convention for classes (http://pear.php.net/).
+ *
+ * Classes from a sub-namespace or a sub-hierarchy of PEAR classes can be
+ * looked for in a list of locations to ease the vendoring of a sub-set of
+ * classes for large projects.
+ *
+ * Example usage:
+ *
+ * require 'vendor/symfony/src/Symfony/Component/ClassLoader/UniversalClassLoader.php';
+ * require 'vendor/symfony/src/Symfony/Component/ClassLoader/CacheUniversalClassLoader.php';
+ *
+ * use Symfony\Component\ClassLoader\ApcUniversalClassLoader;
+ * use Symfony\Component\Cache\ApcCache;
+ *
+ * $apc = new ApcCache();
@stof

stof Jan 29, 2012

Member

this implies requiring the cache implementation and the interface first

@stof stof commented on the diff Jan 29, 2012

...ny/Component/Validator/Mapping/Cache/SymfonyCache.php
+ public function read($class)
+ {
+ return $this->cache->fetch($this->prefix.$class);
+ }
+
+ /**
+ * Stores a class metadata in the cache
+ *
+ * @param ClassMetadata $metadata A Class Metadata
+ */
+ public function write(ClassMetadata $metadata)
+ {
+ $this->cache->save($this->prefix.$metadata->getClassName(), $metadata);
+ }
+}
+
@stof

stof Jan 29, 2012

Member

there is an extra line at the end

@stof stof commented on the diff Jan 29, 2012

...ny/Component/Validator/Mapping/Cache/SymfonyCache.php
+ * Returns whether metadata for the given class exists in the cache
+ *
+ * @param string $class
+ */
+ public function has($class)
+ {
+ return $this->cache->contains($this->prefix.$class);
+ }
+
+ /**
+ * Returns the metadata for the given class from the cache
+ *
+ * @param string $class Class Name
+ *
+ * @return ClassMetadata
+ */
@stof

stof Jan 29, 2012

Member

should use @inheritdoc

@stof stof and 1 other commented on an outdated diff Jan 29, 2012

...ny/Component/Validator/Mapping/Cache/SymfonyCache.php
@@ -0,0 +1,65 @@
+<?php
+
+/*
+ * This file is part of the Symfony package.
+ *
+ * (c) Fabien Potencier <fabien@symfony.com>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
+
+namespace Symfony\Component\Validator\Mapping\Cache;
+
+use Symfony\Component\Validator\Mapping\ClassMetadata;
+use Symfony\Component\Cache\CacheInterface as SymfonyCache;
@stof

stof Jan 29, 2012

Member

this will break things. The current class already uses this name so you cannot alias the interface this way

@beberlei

beberlei Jan 29, 2012

Contributor

ah yes, hooray for not running a test on this ;)

Member

stof commented Jan 29, 2012

you should also add an implementation in the bridge using Doctrine Common to implement the Symfony interface

Contributor

beberlei commented Jan 29, 2012

Tests dont run because of APC redefined, only when CacheUniversalLoaderTEst is run in isolation. I copied the test 100%.

Question would be if we have this Cache component, we could drop the ApcUniversalClassLoader.

Contributor

beberlei commented Jan 29, 2012

@stof yes, i wanted to wait for the general perception of this PR before going further.

@alexandresalome alexandresalome commented on the diff Jan 29, 2012

src/Symfony/Component/Cache/CacheInterface.php
+ * Generic Caching interface that any caching provider can implement/provide.
+ *
+ * Used inside Symfony is suggested to be used by any third-party
+ * bundle to allow central cache management.
+ *
+ * @author Benjamin Eberlei <kontakt@beberlei.de>
+ */
+interface CacheInterface
+{
+ /**
+ * Fetches an entry from the cache.
+ *
+ * @param string $id cache id The id of the cache entry to fetch.
+ * @return string The cached data or FALSE, if no cache entry exists for the given id.
+ */
+ function fetch($id);
@alexandresalome

alexandresalome Jan 29, 2012

Contributor

I disagree with the idea of returning false if not found: what if data is boolean?

@beberlei

beberlei Jan 29, 2012

Contributor

What if data is NULL? same problem, fetch($id, $found) is not supported by all cache drivers though.

Thats what ->contains is for. The problem here is that we have to find the most common denominator.

@drm

drm Jan 29, 2012

Contributor

Exceptions make sense here imo

@vicb

vicb Jan 30, 2012

Contributor

I agree with @alexandresalome disagreeing here, the first requirement should be the ability to store any type of value (this is one of the PR I have submitted against Doctrine common.)

So the interface could look like:

<?php
interface CacheInterface
{
    function fetch($id, &$success);
    function contains($id);
    function save($id, $data, $lifeTime = 0);
    function delete($id);
}

I think there is one more problem with providing an interface with both fetch and contains as it encourages flawed code as

<?php
if  (!$cache->contains('id')) {
   $cache->save('id', compute());
}
doSomething($cache->fetch('id'));

So what about dropping contains ?

<?php
interface CacheInterface
{
    function fetch($id, &$value, $fetchValue = true);
    function save($id, $data, $lifeTime = 0);
    function delete($id);
}
  • fetch would returns true when the value is in cache,
  • the third parameter could be set to false when you really need the contains behavior

The above example would become:

<?php
if  (!$cache->fetch('id', $value)) {
   $cache->save('id', $value = compute());
}
doSomething($value);
@beberlei

beberlei Jan 30, 2012

Contributor

Neither Memcache, Xcache nor Zends Cache support an API for checking if a key really exists or not. (Memcached does support it). An API that works by reference for the values is really unintuitive, an API interface should take expectations of users into account.

Throwing an exception is misusing exceptions as a control-flow and will lead to ugly code in practice. I guess fetch($id, &$found) does make sense and could get rid of contains(), but personally saving false/null/emptystring into a cache entry screams for trouble anyways.

@vicb

vicb Jan 30, 2012

Contributor

Neither Memcache, Xcache nor Zends Cache support an API for checking if a key really exists or not

I don't think this is a valid reason to propagate the flaw to the cache interface, this would defeat the purpose of caching.

An API that works by reference for the values is really unintuitive, an API interface should take expectations of users into account.

I can not think of anything better. I prefer a weird interface over weird bugs that would be impossible to debug. BTW the symfony code already suffers from this flaw here

@drm

drm Jan 30, 2012

Contributor

Throwing an exception is misusing exceptions as a control-flow and will lead to ugly code in practice.

Well, since you are defining the API in this case, how could it be ugly? :) If the interface says fetch() must throw an exception if the key does not exist, the consumer knows it always has to check for existence before fetching the value. There could be an extra convenience method wrapping that logic. The uglyness is up to the implementation of the interface. But it's a matter of taste I guess.

@vicb

vicb Jan 31, 2012

Contributor

@beberlei
I guess I read your message too fast yesterday. We are pretty much aligned on this.

Dropping contains would be a good thing. So fetch would return both found and value, one of them has to be returned by reference anyway. Wich way to choose:

<?php
// I prefer
if  (!$cache->fetch('id', $value)) {}
//over
$value = $cache->fetch('id', $found);
if  (!$found) {}

Plus we should take into account that we should provide a way to call contains - which is the third argument of my proposal - then I do prefer my proposal over returning $found by ref but both would suit me.

@drm

  • Strictly speaking a cache miss could not be considered as an exception ("an abnormal event occurring during the execution of a routine"),
  • Your proposal to check for existence is the perfect example of why we should drop the contains() method. It would lead to a race condition which will randomly produces error 500 (i.e. your entry get evicted in between the contains and |fetch` calls)
@rande

rande Feb 15, 2012

Contributor

You can to store a CacheElement object, like https://github.com/sonata-project/SonataCacheBundle/blob/master/Cache/CacheElement.php. So you don't have any issue with the returned value.

@vicb

vicb Feb 15, 2012

Contributor

@rande

  • this option is also being discussed in the Cache Component PR
  • so your get returns either null or CacheElement in your implementation (your phpDoc is not very useful here) ?
  • @tedivm do you see an added value in returning a CacheElement on a miss rather than null ?

BTW great bundle, could be a source of inspiration for the CacheBundle I am working on currently (based on the one from @lsmith77).

@stof

stof Feb 15, 2012

Member

@rande see the discussion page as things like that have already been suggested. And IMO, users should not have to bother with an extra object wrapping the data to be able to store them.

@tedivm

tedivm Feb 15, 2012

@vicb

I commented in the pull request and it's somewhat relevant to this. The basic idea in the request was that I agreed with @stof that we should consider the CacheInterface and the DriverInterface as separate pieces. The reason I bring this up is that my answer depends on the approach.

  • For the CacheInterface I think returning a null is critical. The users should be able to put any object they want in (null, false, objects, etc) and pull them out again directly- they shouldn't have to do type checking to confirm that their object is what they expect it to be. Additionally there is a huge difference between a "miss" and the object not existing- if the object is not there it's a miss, but there are cases where there will be an object and it will still be a miss.
  • For the DriverInterface I feel a CacheElement (or something) should be returned for every request, because the higher level caching library is going to need to know various pieces of metadata such as the TTL. The users (doctrine, etc) will never have to interact with the DriverInterface.
@vicb

vicb Feb 15, 2012

Contributor

@tedivm

  • I agree, my question was about the "DriverInterface". One question though, could you eplain: "there are cases where there will be an object and it will still be a miss" ?
  • You assume that the TTL is global for all the items here, right ? If this is the case we might be able to retrieve it from a different place. I am not saying that always returning an object is a bad, just trying to understand.
@tedivm

tedivm Feb 15, 2012

I agree, my question was about the "DriverInterface". One question though, could you eplain: "there are cases where there will be an object and it will still be a miss" ?

On object can be present and stale, at which point it would be considered a miss even though it still exists. If you let the higher level caching layer that the drives feed into make the decisions about what to do with stale data (reuse it, discard it, etc) then you have a very powerful tool. Stash has some examples of this.

You assume that the TTL is global for all the items here, right ? If this is the case we might be able to retrieve it from a different place. I am not saying that always returning an object is a bad, just trying to understand.

To be clear, the TTL I'm referring to is the remaining TTL for that specific object. If the start TTL was 180 seconds, but only 20 seconds remain to expire, then it would be very useful to know that. Additionally it may be useful to know that it expired 120 minutes ago- using a datetime (or timestamp) internally for that expire date is important.

@vicb

vicb Feb 15, 2012

Contributor

Your first reply also answered my second question. That's a very interesting discussion, many thanks for you inputs ! I really need to find some time to take a deeper look at Stash

Member

Seldaek commented Jan 30, 2012

@lsmith77: +1 on using the proposed PSR cache interface.. and to adjust to any changes it has until 2.1 is released. We should really push for finalization of it before 2.1, that way we can join the party.

Contributor

lstrojny commented Jan 30, 2012

btw: for caching in "the real world" [tm] cache tagging is a feature one cannot ignore. It can be implemented for every backend source in a more or less complicated fashion. Can our interfaces at least allow for tags to be specified during on save(). Additionally a deleteTags(array $tags) method should be available.

Contributor

alexandresalome commented Jan 30, 2012

Concerning details of "what a cache abstraction" should provide, the discussion on the newsgroup, provided by @lsmith77 seems great.

The point is that since each component implements the cache on its own, performance is great.

The more abstraction you add between the needing and the solution, the less performance you have.

Keeping cache interfaces is important in component.

The best would probably be to create a PSR Bridge, isnt it?

Contributor

beberlei commented Jan 30, 2012

@lstrojny tags are an important use-case, however implementations that hack this into backends do that at huge costs. Uses don't recognize that implementing tags with memcache/apc requires you to cache all keys in a key. This causes tons of race-conditions on high load.

Contributor

vicb commented Jan 31, 2012

@beberlei, @lstrojny Would it be possible to provide a basic CacheInterface and some other interfaces as TaggableInterface, XYZInterface with decorators for cache system where those features are not natively supported ?

Contributor

lstrojny commented Jan 31, 2012

@vicb yep, that would be fine.

Owner

fabpot commented Feb 11, 2012

I also think that this is quite important to have "something" in core. How can we move forward?

Contributor

dlsniper commented Feb 12, 2012

If this component will get more development on it Symfony2 should also support passing the components here as either services or something similar (think singletons) as one might bump into 'too many connection' problems if you need to use same type of service, say memcache, everywhere in the application.

The problem that I'm trying to solve is this:

  • one memcache connection for session
  • one memcache connection for DB cache
  • one memcache connection for Symfony Cache component
    all tied to the same server. Now, using the above approach, I've just limited the number of users that I can support on a single server because instead of reusing a existing connection I'm using 3 connections.

I've had this problem already into production where I had simply too many open connections for our memcache server pool and the way to solve it was to merge everything into a single central service.

I'm thinking something like this for example:

framework:
    session:
        storage_connection:   symfony.cache.memcached

doctrine:
    orm:
        entity_managers:
            default:
                result_cache_driver:
                    connection:  symfony.cache.memcached

I don't know if Doctrine would be able to support something like this, but maybe it could considered or I'm just talking non-sense. (I know connection is not currently supported in Doctrine configuration nor storage_connection exists in Symfony2).

Hope all of the above makes sense to you guys.

/Cheers

Member

stof commented Feb 12, 2012

@dlsniper We first need to figure how the cache component should look like before configuring it. Providing a few services is not an issue at all once we have the code.

Contributor

vicb commented Feb 12, 2012

@dlsniper I'd like to come with something like this in the CacheBundle (#3225). I hope I can find time on work on this during the coming week.

Contributor

dlsniper commented Feb 12, 2012

@stof I've added some of the changes in the above commit and I think I could do more to it but seems people are not sure about the right direction for now. IMHO the cache components should all provide the same functions and capabilities by default, using the cache interface, but for example, if one wants to use multi-fetch option from memcache(d) library, one should have access to it by calling the function, getMultiByKey, directly. I mean, if I configure the application to use Memcache(d) I'll know for sure which things I'm allowed to call on a memcache(d) object, don't I?

@vicb I could try and help you with this as well if you want me to. I can do some work on this as I'm fairly interested in getting this into Symfony2 master sooner rather that later.

Member

Seldaek commented Feb 12, 2012

I would like to reiterate that PLEASE take a look at the proposed PSR cache interface and PLEASE make this PR compatible as much as possible. It may well lead nowhere, but at least it has a lot of feedback built in already from many different projects.

Contributor

vicb commented Feb 12, 2012

@dlsniper Any help is welcomed, I'll try to update the PR in the coming days with code a detailed todo list (It will be based on doctrine common until this cmp gets ready.)

Member

stof commented Feb 12, 2012

@dlsniper you did not added anything. You only referenced the PR from your fork (not even yours in fact by a fork from @dlsniper2). To be included in the PR, the commit needs to be part of @beberlei's branch as the PR is sent from his branch

Contributor

dlsniper commented Feb 12, 2012

@Seldaek I'll update the code in the next hours to support that, it's part two of the work. Thanks for the link
@stof I know, I'm working on a final PR with @Seldaek suggestion as well

Contributor

vicb commented Feb 12, 2012

@Seldaek thanks for the link. The one thing I don't like there is get() with a single argument and the exists() method which encourage writing flawed code (race conditions). I'll try to take some time to go through the mail thread and submit feedback.

Member

Seldaek commented Feb 12, 2012

@vicb thanks. I know it's not perfect yet, but I would rather have us contribute to the effort than going our separate way.

@ghost

ghost commented Feb 12, 2012

@stof now I've made the PR for @beberlei to merge it if he really wants. Also, I'm dlsnper as well since GH doesn't allow forking a fork :)
@Seldaek I'm with @vicb on the need of having ::get() returning the status of the get operation while the data from the cache is 'returned' via a parameter passed by reference.

Member

stof commented Feb 12, 2012

I think the oppposite would be more logical. get() should return the value you get IMO

Member

Seldaek commented Feb 12, 2012

I agree with @stof. get() should return the value, and optionally you
can check if it worked if this is relevant to your use case.

Contributor

dlsniper commented Feb 12, 2012

So the question is: should I make the PR full PSR compatible and adjust it when/if it changes or wait to see if @vicb proposal will pass and FIG will accept it?

Also, what's next? Implement other cache components? I can do the work on them if so.

Contributor

vicb commented Feb 12, 2012

I agree too, get() should return a value, that's why we should have fetch() instead (see my reasoning).

I would be ok with returning a value but:

  • @stof I really would like to see motivation for each idea, "be more logical" does not sound like a good reason to me,
  • @Seldaek do you have a lot of use case when you don't want to check the availability ?
Member

Seldaek commented Feb 12, 2012

@vicb in most cases, cache entries contain data, not null. So I don't care so much if the key existed, if I get a null I can safely assume it did not exist and populate the cache. Also it's reasonable to assume people are familiar with apc_fetch which works like that too.

Contributor

vicb commented Feb 12, 2012

if I get a null

That's checking for availability, isn't it ?

<?php
$obj = $cache->get($key);
if (null !== $obj) {...}
//vs
if ($cache->fetch($key, $obj)) {...}

We are defining a new interface, I don't think we should assume anything. It's a great opportunity to make something that is less error prone (thinking remove exists here) and easier to use in the most common uses cases (thinking returning availability here).

edit: I have changed get for fetch - this is to understand the comment from stof below.

Member

stof commented Feb 12, 2012

except that if ($cache->get($key, $obj)) {...} is confusing as ->get() does not give you the value

Contributor

dlsniper commented Feb 12, 2012

I think the latest commit should handle this pretty well even if it breaks PSR.

We could also have either an additional interface between the PSR interface and the implementation or something similar which should cover the fetch function.

There's also the question about using traits, if we follow PSR, as that's dependent of PHP 5.4 which hasn't been released yet and the fact that if we want this component to be used in the framework it will bump the framework requirements to PHP 5.4. That will make Symfony unusable for most corporate users as everyone knows how hard is to make such upgrades to production servers. Also not only corporate users will be affected, people who host their projects on servers that have not yet been upgraded to PHP 5.3 (I think I might know one case) will have no chance to find hosts offering PHP 5.4 as soon as it's released and basically will kill usage of SF2. I know we can avoid trait usage but I thought mentioning this.

Member

stof commented Feb 12, 2012

@dlsniper why are you talking about traits here ? Nothing requires using traits here.

Contributor

dlsniper commented Feb 12, 2012

@stof maybe I just misunderstood the role the trait specified here: https://github.com/evert/fig-standards/blob/master/proposed/objectcache.md in which case please disregard that part of the comment :)

Member

stof commented Feb 12, 2012

@dlsniper traits are a way to do horizontal reusability. But you can implement the interface without using the traits for this

Contributor

lstrojny commented Feb 13, 2012

What about a result object

<?php
interface CacheInterface
{
    /** @return Entry */
    public function get($key);
}

interface EntryInterface
{
    public function exists();
    public function getKey();
}

class Entry implements EntryInterface
{
    public function exists()
    {}

    public function getValue()
    {}

    public function getKey()
    {}
}

class NullEntry
{
    public function exists()
    {
        return false;
    }
}

So one would do:

<?php
$entry = $cache->get("foo");
if ($entry->exists()) {
    ...
}
Contributor

vicb commented Feb 13, 2012

@lstrojny this has been proposed on the PSR ml by both @beberlei and Robert Hafner, I like the idea very much.

Contributor

lstrojny commented Feb 13, 2012

We should not base our design decisions on PSR, but provide a proxy/bridge/whatever once PSR went, if ever, through. Let’s implement what we think is best and provide interoperability later.

Contributor

vicb commented Feb 13, 2012

It would obviously be better to conform with what the PSR decides for interoperability. We also have a chance to make proposals to the PSR. If the PSR comes with a solution that doesn't suit our needs of takes too long then we can re-consider this position but let's try first.

In the mean time we can come up with a bundle and a component, there are a lot of other things to discuss and tune. Then we'll update to whatever we decide.

Thoughts ?

Contributor

beberlei commented Feb 13, 2012

Well there are three ways and we have to decide on one.

  1. $value Cache::get($key, &$exists)
  2. bool Cache::get($key, &$value)
  3. Result Cache::get($key) with Cache::exists() and Cache::value()

I prefer 1. since that is also how apc API looks like.

Member

Seldaek commented Feb 13, 2012

I'm for 1. too as I said above, for apc and just because it is more obvious, all the get() methods return something's value.

Contributor

lstrojny commented Feb 13, 2012

I prefer 3. Much cleaner, no references, no false friends and I can imagine a lot of cool things that can be done by extending the Result object.

Contributor

vicb commented Feb 13, 2012

I think the idea is to build 3 on top on 1 or 2, right ?
Then we should select either 1 or 2 to start with. I favor 2 vs. 1 because it allows more concise code but I am ok with 1 also.

As said on the mailing list, I would prefer 1' (or 2') with a third argument (Boolean or ::FETCH_VALUE / ::FETCH_KEY) to allow retrieving availability only (it can improve perf for some use cases).

Member

Tobion commented Feb 13, 2012

I'm for 2 if it's called fetch because you often only need to check availability once and then you don't need to initialize a variable.

Contributor

Baachi commented Feb 13, 2012

I'm for 1, too. I think it's easier to understand.

Member

stof commented Feb 13, 2012

I'm for 1 too.

@fabpot what do you think ?

Contributor

vicb commented Feb 13, 2012

Well let's try to be objective and list the pros and cons of mixed get($key, &$found) vs Boolean fetch($key, $value)

Pros:

  • A get, fetch, retrieve, etc method has to return a value, not a boolean. Isser methods return booleans, getters return values.
  • It doesn't make use of reference: what about &$found ?
  • I like it / it is easier to understand / it looks like my favorite backend / it is more logical / it is more obvious / less confusing: this is a personal opinion, each of us has its own (do not judge by the number, some have more than one opinion)

Cons:

  • easier to get get wrong: if (null / false === $cache->get($key) { ... }
  • less concise:
<?php
$value = $cache->get($key, $found);
if (!$found) {...}
//vs
if (!$cache->fetch($key, $value)) {...}
  • I don't like it : personal opinion again

Please let me know which pros / cons should be added to the list and let's pick the winner. Voting does not seems like a good idea here, we'll fall in the personal opinion thing again.


Some personal opinions:

  • I finally made up my find for solution 2
  • I would really to hear the voice of Robert Hafner the author of Stash. It seems to me he got a lot of things right.
Contributor

vicb commented Feb 13, 2012

looping in Robert (@tedivm)

Contributor

beberlei commented Feb 13, 2012

Sorry but doesn't bool Cache::fetch($key, &$value) need pass by reference? so its not in argument in any cases favor. is not actually an argument at all. 1.) is just a matter of expectations, if i get/fetch something i expect the return value to be the "real thing".

Contributor

vicb commented Feb 14, 2012

Not everybody as the same expectations for fetch, see odbc_fetch_info. If the name is the only problem with might come with something better.

Contributor

Baachi commented Feb 14, 2012

I agree with @beberlei.

If i call a method named get orfetch, i expect the return value to be the real thing.

Symfony's ParameterBag works with mixed get($key). For more consistency, it solution 1 the best solution.

tedivm commented Feb 14, 2012

Thanks for bringing me into this discussion. I'm a big fan of Symfony, and am slightly obsessed with caching, so I'm looking forward to being part of this discussion.

I've got a lot of points I'd like to make, so I'll just dive right in. I should apologize in advance for the wall of text I'm spewing forth here.

Separation of Functions

I can't tell you how many problems are solved by separating out the raw underlying drivers from the actual API. Generally speaking the drivers (the things that access services like memcached, apc, and other cache systems) should be dead simple with just the basic "common denominator" code between systems. Those drivers get injected into the classes people should actually use to interact with the cache (people also being underlying libraries with caching).

An example you can probably all relate to is Twig. Twig has a number of backends to it (filesystem, string, array, or custom ones) which get created once, injected into an Environment object as part of it's creation. The Twig Environment class is essentially just a factory for the Template classes. Symfony encapsulates the entire process, allowing developers to simply call the resources they're looking at without thinking about how it's passed.

This is essentially what is suggested by the "result object" idea, with the main difference being that the object is injected with the driver rather than the driver returning the cache object. This allows to us to keep the underlying drivers as "dumb" as possible (lcd) while still allowing for some significantly advanced functionality to exist.

Misses versus Contains versus &$fetch

You guys already rejected 'contains' or 'isset' (seriously, good freaking call). You also want to return the exact value put in, even if it is null or false (again something I agree with completely).

The big question is how to figure out how to accomplish this. Adding a reference variable to the fetch call solves this in some ways, but adds a bit of clutter. On the other hand adding an additional call to the currently defined proposal either makes things more inefficient by requiring multiple calls to the caching service or forcing the caching library to store requests internally after retrieving it.

Splitting this into multiple classes solves the problem. The front end should be as basic as possible, and the complicated stuff can get handled in the driver.

<?php
$value = $cache->get();
if($cache->isMiss())
{
    $value = expensiveCode($key);    
    $cache->store($value);
}

Consistency

For the user facing API (the cache object discussed above) to work properly it must be as consistent as possible. Developers should never have to build in any special cases to handle differing backends.

Simply put, the developer should never have to think about the underlying caching services at all- thats a problem that needs to be solved by the specific installations. As a system needs to scale up the system admins should be able to move from smaller scale systems (filesystem, apc) to larger (memcached) or even combine the two, all without the developer doing a thing.

If tagging is implemented in the front end, for instance, it should either be handled completely in the front end or required in all of the drivers- otherwise developers won't really be able to use the features (or, if they do, they won't be able to use any of the engines that don't).

Grouping

There should be a way to remove selective groups of items from the cache- either clearing the entire cache or clearing each item individually are both full of problems.

That being said, I've never found a good way to get tagging to consistently work across multiple back ends without some serious overhead. I've found that hierarchal systems are much easier to build out.

Maintenance

Different systems have different capabilities, and some need regular maintenance. While APC and memcached will remove expired items automatically, filesystem or sqlite based caches need to be purged of items explicitly occasionally to clear out items that haven't been called in awhile. If you're using sqlite it also helps to rebuild it occasionally.

Unfortunately many of these actions take a long time, and should not be done as part of random requests. There should be some action exposed by the underlying drivers which can be called regular (cron job?) that can run without affecting someone's request.

Error Isolation

Errors with the cache system should be logged but never disrupt the actual program flow. Code is designed to deal with misses, and for the cache an error can be considered just a miss. This way any problems should only result in degraded performance.

Stampede Protection

I can write about this forever, but the long and short of it is that the user level api should have a "lock" function.

Alright, at this point I need some sleep. I'm looking forward to continuing these discussion soon. Before going on though I wanted to point out there is a Stash Bundle that provides Stash as a service in Symfony which handles some of what has been discussed. I plan on moving Stash over to Github in the next week as well, but you can learn about it at it's current site.

Owner

fabpot commented Feb 14, 2012

For what is worth at this point, I'm in favor of option 1.

Contributor

willdurand commented Feb 14, 2012

+1 for option 1 too

tedivm commented Feb 14, 2012

Option 1 is the cleaner of the existing options, true. However, I still think it has a big flaw that is being overlooked- namely, sometimes you need to know exactly what the expiration time is, not just a boolean state of whether it is expired or not. This is why I think a request object makes sense.

Let me give you an example- when I first implemented Stash I did so without any stampede protection. Essentially, when any object returned a miss then every request thats made between that miss and it being repopulated will result in the expensive code running again. For some things (user profiles, as an example) that may not be a big deal, but for others (popular search queries, or javascript minification) this process can be quite fatal to system performance.

There are a number of ways to handle this- limit reprocessing to one request and serve a default value, or the stale data itself- but one of the users Stash (met-art.com, nsfw) recommended a feature that has become one of our most popular, regenerating the cached item before it actually expires. The benefit here is that no stale data ever has to be served and only one request will actually "miss" and have to reprocess things.

That piece of functionality was easy (relatively speaking), but it requires that the system be aware of what the TTL actually is.

Basically, this is another reason why the "drivers" behind the caching (the things that interact with the storage engines) should be a separate interface than the one used to interact with the cache on the whole (the interface that doctrine and the other libraries will support for placing and retrieving their items from the cache). Otherwise the advanced features that come down the line will have to be built into each driver separately (since they'd be called directly) or implemented in the individual systems like Doctrine. If the drivers are as dumb as possible and the advanced logic used them, but wasn't implemented in them, then the caching system can grow over time without the users or the drivers having to make any changes.

Robert

Contributor

vicb commented Feb 14, 2012

We are designing a component that is very important for the future of Symfony.
Objective reasons would be much more useful than "+1"s. It should not be a matter of personal preference but more usability.

Member

stof commented Feb 14, 2012

@tedivm how do you decide whether the cache should be regenerated or no ? Simply doing a condition on the remaining TTL is not enough as it could still be regenerated twice in parallel.

Owner

fabpot commented Feb 14, 2012

In HttpCache, we have some code that tries to avoid the dogpile effect (https://github.com/symfony/symfony/blob/master/src/Symfony/Component/HttpKernel/HttpCache/HttpCache.php#L497)

tedivm commented Feb 15, 2012

@stof, glad you asked :-)

The first part of implementing any sort of stampede control is issuing a lock to prevent this from happening. When stampede protection is enabled then every miss is followed by a subsequent check of this lock, and if the lock is engaged (and hasn't timed out) then the miss is considered a hit and the data is passed back as if it were fine (depending on the specific type of invalidation technique provided).

In the example of pregenerating the expired code

  1. The Stash object gets the TTL back from it's underlying drivers,
  2. Stash checks to see if it's in the regeneration cycle (default being 40 seconds before real expiration, but that can be set to anything).
    • If it is not in danger of expiring, it is returned.
  3. If it is, Stash checks to see if a lock if issued.
    • If there is a lock, the cache is returned as a hit instead of a miss.
  4. If no lock is issued, Stash places one.
  5. Stash waits for the item to be passed to the "store" function.
  6. Stash releases the lock.

This process is slightly different depending on the invalidation technique used, but is fairly sound.

tedivm commented Feb 15, 2012

I just wanted to loop @jhallbachner into this conversation. He's helped out quite a bit with Stash, and wrote the Symfony bundle for it. He's also awesome, which is only slightly related. I think he'll be able to provide a lot of great feedback.

Contributor

willdurand commented Feb 15, 2012

Let me explain (once again) why option 1 seems the best choice.

Result Cache::get($key) with Cache::exists() and Cache::value()

It seems overkill to add a Result object, it's not really "fluent".
Moreover, two different ways to get a value requires to decide which way to use, this is not the aim of an interface..

bool Cache::get($key, &$value)

A get, fetch, retrieve, etc method has to return a value, not a boolean. Isser methods return booleans, getters return values.

$value Cache::get($key, &$exists)

Then, here is the best choice as you combine the need to know whether the value has been retrieved or not through the $exists parameter, and the value itself. It's consistent with other Symfony2 APIs.

If the purpose of this discussion is the interface, and not internal stuffs, here are my 2 cents (not a personal opinion) on these three options.

Another point I would like to introduce is the region-based approach. A region is a kind of logical cache separated from the other ones, that way you can manage each region separately (useful for clear).

<?php

$value Cache::get($key, &$exists, $region = 'default');

Cache::put($key, $value, $region = 'default'); // put, save, ?

By the way, are we all agree on the put/save name?

Contributor

vicb commented Feb 15, 2012

@willdurand thanks for the feedback, I have updated the pros/cons list.

I do not really get your first point:

  • "seems" = personal opinion ?
  • what two ways are you talking about ?

btw Cache::exists() looks odd (i.e. the object has to "exist" because you are using it) Cache::isMiss() is way better.

About your last point:

I think the "region" (or namespace) should be set at another time (should it be the constructor or a method call). Then you can use different cache "providers"

<?php
$cacheReg1 = $container->get('...');
$cacheReg1->[put/save/whatever]($key, $value);
$cacheReg2->[put/save/whatever]($key, $value);
Contributor

willdurand commented Feb 15, 2012

"seems" = personal opinion ?

No, actually I read various caching layers in other programming languages, and none of them have this concept of yet another result object.

what two ways are you talking about ?

What is the aim of Cache::value() in the third option? Oh, maybe I am wrong, and I should read Result::exists(), Result::value().. Then, there is only one way to get the value:

<?php

$resultObj = Cache::get($key);
$resultObj->value();

As I said, it's not fluent, and overkill just to handle null/false values.

Personal opinion: I'm sure to not put null values in cache, and if there are null values, then I'll fill in again the cache. I don't want to use a proxy class, and write more code. I expect to put a value with its key, and to retrieve this value using this key. I want to get exactly what I put before, without anything else. (user story)

Contributor

willdurand commented Feb 15, 2012

Oh, and you're right for regions/namespaces.

Member

stof commented Feb 15, 2012

I don't think we should have a Result object returned by the CacheInterface to the end user. This makes the cache interface more complex to use.
However, if we separate the CacheInterface exposed to the end-user and a DriverInterface used by the core implementation of the CacheInterface, we could use such an object as return value of the DriverInterface to use it internally in the Cache to handle the cache regeneration and so on

@fabpot fabpot added a commit that referenced this pull request Feb 15, 2012

@fabpot fabpot merged branch vicb/memcache/fix (PR #3363)
Commits
-------

b95284e [Profiler] Fix memcache(d)

Discussion
----------

[Profiler] Fix memcache(d) storages

This fixes an ambiguity...

The memcache(d) storages have a `$lifetime` option. The name indicates that we are talking about a ttl (in seconds). This is wrong is `$lifetime` > 2592000 (=30 days), see http://fr.php.net/manual/en/memcache.set.php.

Doctrine is also [affected](vicb/common@e9ab2d2).

The ambiguity also exists in the session storage but to a lesser extend as those storage directly use memcache(d) options rather than a `$lifetime`. @drak could you confirm ?

Hopefully the Cache Component will get it right (#3211).
3f76d0f
Contributor

rande commented Feb 15, 2012

Hello

I just discover this thread ... so sorry for the duplicated effort. At the last cmf hackday, I have extracted the cache classes from the Sonata Page Bundle to create the Sonata Cache Bundle : https://github.com/sonata-project/SonataCacheBundle

The bundle provides differents features :

  • backend adapter : ESI, memcached, mongodb and APC
  • command lines to flush cache
  • invalidation solutions for each backend

CacheElement vs Raw Data

The cache backend always stores a CacheElement object, the native functions know how to serialize and unserialize this data. As there is no stable solution to the return null or false if the cache does not exist. Maybe we can throw a CacheNotFound exception. The current Sonata Cache Bundle returns a CacheElement object. But this can be changed ...

Contextual Keys

The bundle can also store cache metadata as contextual keys. Depends on the backend this information can be used or not. The bundle provides a solution to recored which objects are used inside a twig template. This information can be store as a metadata, so it is possible to invalidate the related caches if the object is changed.

Don't know how this can move forward between the LiipCacheBundle, the SonataCacheBundle and this current implementation. But I will be pretty happy with only one solution.

Contributor

vicb commented Feb 15, 2012

@rande

  • The exception has already been discussed in this thread and it seems we have an agreement here (stop me anyone if I am wrong)
  • You might want to check the CacheBundle #3225 also. The current focus there is configuration (i.e. no caching) as nothing I have seen so far seem to cover our needs (i.e. ability to add backends from any bundle, ability to share backend & drivers, ...). I have a bunch of updates to push there once I fix "an issue" with the Config.
Contributor

rande commented Feb 15, 2012

@vicb sorry for the exception ... the page is quite long ;)

I will have a look to #3225. Seems the backend group feature is a missing feature of Sonata Cache Backend ;)

tedivm commented Feb 15, 2012

@stof

However, if we separate the CacheInterface exposed to the end-user and a DriverInterface used by the core implementation of the CacheInterface, we could use such an object as return value of the DriverInterface to use it internally in the Cache to handle the cache regeneration and so on.

Sorry if I haven't been clear, but this is exactly what I'm trying to propose. Drivers are too low level to be useful in the real world, which would mean that the systems using CacheInterface would all have to roll in advanced caching features themselves. Separating the Drivers from the Interface has been my biggest goal here and on the standards list.

I also find that once you make this decision it'll open up a lot of possibilities that may change how this interface is designed, but before getting into that I think we should discuss that separation and whether it's going to be the case or not.

@stof

However, if we separate the CacheInterface exposed to the end-user and a DriverInterface used by the core implementation of the CacheInterface, we could use such an object as return value of the DriverInterface to use it internally in the Cache to handle the cache regeneration and so on.

I agree. From an internals perspective, having each Driver be required to return a results object is extremely valuable for the reasons discussed in the thread (it enables storage of null/false, it avoids the race condition with exists/get, and it allows information like exact expiration time to be available to the cache layer). If we require each Driver to return a results object and then have CacheInterface mediate the results to the user then a decision on user-facing API can be made independently of the technical considerations.

@willdurand

Another point I would like to introduce is the region-based approach.

A per-request region doesn't make a lot of sense to me; it's just a subset of hierarchical keys (which is a whole discussion in itself). A per-service namespace/region would make sense (as a way to, for example, segregate Doctrine result cache keys from Doctrine DQL cache keys within a single XCache instance) but that should happen at the factory and shouldn't change once the user is using the service object.

Contributor

merk commented Feb 16, 2012

Would it be worth considering just using Stash as symfony2's caching system?

Contributor

mvrhov commented Feb 16, 2012

What @merk proposed also come in my mind, but I don't know why I didn't have the guts to say it.

tedivm commented Feb 17, 2012

Well, I can honestly say that would thrill me. I'd be willing to take any steps needed to make that easier.

A couple of points along those lines- we've gone through and moved both Stash and it's Symfony bundle over to github today. The documentation is still over at the google code site, but we'll get that migrated over soon.

The bundle is fairly new but is being used by us on a few of our projects. It has some nice features like integration into the web profile tool, and I personally love how well it works with the Symfony configuration system. @jhallbachner wrote pretty much all of it, so he'll be able to speak about that part itself. We're obviously open to any suggestions on that (and will implement whatever interface or standard Symfony ends up adopting, so Stash will always be an option).

The upcoming plans for Stash itself (now that it's moved from google code) my next step is adopting PSR-0. I'm also embarrassingly excited to build the Redis handler, since I've been waiting for an excuse to do that. We also pass all unit tests on windows as of this morning (beforehand only the filesystem and sqlite handlers had issues). I'm not sure what else you guys would have on a wishlist for a perfect caching library, but let me know and it'll end up on there.

Contributor

snc commented Feb 17, 2012

@tedivm I planned a redis handler for my https://github.com/snc/SncRedisBundle, I don't know what your plans are but maybe we could work together...

Contributor

lstrojny commented Feb 17, 2012

@ALL: I think it is a good idea to consider existing projects as blessed cache component for Symfony. My concerns with Stash would be:

  • Static Handlers and Utilities but, more problematic, the static Manager class
  • PSR-0
  • Coding style (whitespace, e.g. Exceptions not being named *Exception)
  • A lot of little quirks here and there that I'm sure can be solved

If we can overcome those things, I think it would be a great foundation.

tedivm commented Feb 17, 2012

Static Handlers and Utilities but, more problematic, the static Manager class

The only static function in any of the handlers is the "canEnable" function, which can be used to make sure extensions are installed and whatnot. There is a static in-script cache that is very, very simple to disable but has some nice performance benefits (and I don't think should cause any issues, but would be happy to explore further).

The Manager and Box classes are not meant to be used in Frameworks like Symfony- they're basically example containers that people can throw into existing or quick applications. A better example for how to handle it in Symfony (static free) is the Cache Service in our bundle.

As to the utilities, I've actually been eliminating them over time. At this point most of them can be moved right into the filesystem handler itself, since it's the one that uses most. Others we can ditch if we moved to 5.3 (something I'm strongly considering).

PSR-0

This isn't a huge change for us- we're already structured in the same way, we're just lacking underscores. The big reason I haven't is because I was debating whether to go with full namespace support but was weary of dropping 5.2 support. There are other reasons why that's appealing

Coding style (whitespace, e.g. Exceptions not being named *Exception)

Making the Exception name changes shouldn't be a problem, and I'll make sure the bundle follows all the standards (I need to read through them again before I commit to any style changes on Stash, but I don't think that'll be an issue either). If there is anything else specific that you think should be addressed just let me know.

A lot of little quirks here and there that I'm sure can be solved

Yeah, just keep that list coming!

Contributor

lstrojny commented Feb 17, 2012

I would personally be in favor of dropping 5.2 support, as 5.4 knocks already on the door and we should not support outdated PHP versions. Dropping 5.2 support allows you to use closures as well. But before going more into details I think we should make a decision if we want to go with Stash. It wouldn’t be that fair to ask you for a number of changes and than deciding not to do it.

tedivm commented Feb 17, 2012

Eh, so far all of the changes you've mentioned are either things I've already intended to do or just make sense. Also, regardless of whether Stash is "the" caching library or not I do plan on fully supporting Symfony ( @jhallbachner and I use it quite a bit), so it won't be wasted effort. Really, my worst case scenario is that I put a bunch of work in to make my library better- not exactly something I'm worried about.

Contributor

lstrojny commented Feb 17, 2012

OK, fair enough :)

tedivm commented Feb 19, 2012

Alright, took the plunge and Stash now supports PSR-0 (at least, it passes all unit tests). The Symfony bundle hasn't been updated to reflect that yet, as there are some other changes I want to make as well first. The documentation still shows the old API (which I tagged right before making these changes, so you can still play around with that if it's easier).

Member

stof commented Feb 28, 2012

@tedivm I don't see the need of defining an interface extending Iterator without changing anything. Simply use Iterator.

Member

stof commented Feb 28, 2012

@tedivm I don't see the need of defining an interface extending Iterator without changing anything. Simply use Iterator.

tedivm commented Feb 28, 2012

@stof- The document is a work in progress (not a final draft), which is why I haven't passed it around too far yet (still soliciting feedback). With the Cache\Iterator I had originally thought I would do a little more with it, and then realized that was probably a mistake. Part of me still feels it should exist, since unlike an Iterator it is expected to only return Cache\Item objects, but I admit that's probably weak reasoning.

As I mentioned, this is a work in progress- I appreciate any feedback on this. If people want to open issues or send PR's along I'd be thrilled. At the moment I haven't sent it along to the PSR group themselves, as I wanted to make sure any obvious issues got cleared up.

On a separate note, Stash has been updated to address many of the issues brought up (coding conventions, static functionality, exception naming, PSR-0)- mainly due to a lot of hard work by @kbond, @vicb, and @lstrojny. It's also been integrated with travis-ci (which is so cool, and yet another reasons I should have moved to github awhile ago) and supports Composer. If there are any other things that would make integration with Symfony easier please let me know- I'm very receptive to suggestions.

tedivm commented Feb 29, 2012

Since what you all saw was more of a draft than a finished or polished document, I thought I'd mention that I've updated it quite a bit today (including taking your advice, @stof, about using the Iterator directly).

I'm also curious about whether this proposal makes sense for this discussion. The basic interfaces (as well as some of the extensions) accomplish pretty much everything discussed here, and in fact incorporate a lot of this discussion.

Contributor

dlsniper commented Mar 9, 2012

Hello,

Any words on this? Or any position from the dev team about this matter?
Where should the community move it's efforts for this to come true?

Kind regards.

Owner

fabpot commented Mar 10, 2012

I have asked @vicb to lead this effort. This will probably land into Symfony 2.2.

Contributor

pulzarraider commented May 13, 2012

With my current project I've started using cache much more than before. My initial implementation was done on SonataCahceBundle, but I have run into some problems that I had to solve. Here are my experiences of what Cache component/Cache bundle should cover, maybe it can help you design the cache implementation for real usage (some of these issues was mentioned in previous comments):

  • Give availability to use many instances of memcache(d)/APC/... cache clasess with different configuration options (different server, different prefix etc.). Let user choose what he needs.
  • Allow increment/decrement methods - APC/memcache(d)/Redis/... - all of them have inc/decr methods implemented. They should be availbale also in Cache Component. If these methods are used, NO OBJECT (CacheElement or something similar) may be used. So using some caching object may not be ideal.
  • The cache component should provide some lock method to allow rendering of cache to one process only. The main reason for using cache is to accelerate project. If one process have started rendering cache, another process should wait. It's not good idea to let unlimited processes start rendering cache. It will cause many problems, if the rendering process is too difficult.
  • Cache component may be configured to use fallback cache services. The main reason for this is: if no data were found in normal cache service, one process will lock it and starts rendering the cache, other process accesing the same key is not waiting, but using cached data from fallback cache service. Fallback cache service contains old data, but in many cases it's much better than waiting for rendering. The ttl of data in fallback service should be always higher than in normal cache service. The fallback cache is updated after the rendering of normal cache is finished.
  • Cache component should give availability to speed up perfromance. Redis/memcache(d) have some methods, that accept array of keys to be deleted. It's much better that calling delete in a loop. Redis have also multi method - multiple command are send at once. I would like to have some abstract access to these special methods in the new Symfony cache component/bundle to speed up performance of my project. If some cache mechanism doesn't support these features, it may be implemented with basic methods on lower layer of cache component.
  • Cache SHOULD support tags. If some caching mechanism don't support it (memcache), the user should be warned to use some sophisticated cache mechanism (Redis). Invalidating and using tags under certain conditions can be simulated in mamcache, too.
Contributor

dlsniper commented Aug 4, 2012

I know it's a bit of a set aside talk for now but I'll write this as a reminder: the cache component should be able to be used from anywhere in the other components, think of Forms, Templating and so on.

Member

stof commented Oct 13, 2012

@vicb any news on this ? According to the last comment of @fabpot on this ticket, you were in charge of this feature.

Contributor

vicb commented Oct 15, 2012

I won't have time to work on this for at least the coming month. Anybody feel free to start something (a RFC ?).

Contributor

lolautruche commented Nov 26, 2012

Hi

I think we at least need an interface to rely on, with a basic implementation provided (the one from Config component maybe ?). This would be simple enough not to break too much tests/current implementation and would have the advantage to allow better integration of cache libs (Stash, Zeta Components).

I might be able to start an RFC (should I do it as an issue or as a PR ?).

Contributor

dlsniper commented Nov 26, 2012

I think we need to wait for @fabpot to figure what we need/how he wants to deal with the problem in the framework then take any action.

I've taken my own implementations to a suspend mode just because I do agree with him that we are in no rush and we should make sure that the problem is being correctly handled.

If you have any ideas about it, you could/should post it either here or in the other existing cache issues/PRs so we could have a central point of reference about this problem.

dlsniper referenced this pull request in php-fig/fig-standards Dec 1, 2012

Closed

Yet another cache proposal #63

Owner

fabpot commented Mar 23, 2013

Closing as there are 4 opened PR on this topic. I would prefer that we first discuss the interface and what we want before starting coding. I suggest that this discussion happens on the Symfony dev mailing-list. Also, we need to discuss what to do with the current discussion on the FIG group mailing-list.

fabpot closed this Mar 23, 2013

noisebleed referenced this pull request in ppi/framework Apr 22, 2013

Closed

Cache component #78

Contributor

sukei commented Sep 2, 2013

Hello, any news on this?

Contributor

dlsniper commented Sep 2, 2013

@sukei FIG is the place to ask, When they'll figure out about PSR Cache then this could go on...

JHGitty commented Dec 29, 2015

I think it would be nice to have a generic Cache component in Symfony.
Is PSR-6 ready? http://www.php-fig.org/psr/psr-6/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment