Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Interop of upcoming Symfony Cache 3.2 features #19728

Closed
nicolas-grekas opened this issue Aug 24, 2016 · 40 comments
Closed

[RFC] Interop of upcoming Symfony Cache 3.2 features #19728

nicolas-grekas opened this issue Aug 24, 2016 · 40 comments
Labels
Cache RFC RFC = Request For Comments (proposals about features that you want to be discussed)

Comments

@nicolas-grekas
Copy link
Member

In Symfony Cache 3.2, we're going to ship tag based invalidation with TagAwareInterface and the tag() method on CacheItem, and namespace based invalidation with ForkableAdapterInterface in #19521.

See pending doc PR to know how they are intented to be used:
https://github.com/symfony/symfony-docs/pull/6858/files?short_path=2b6cef4#diff-2b6cef44ab8160dd54c3796627fbff0b

Symfony 3.2 will be released in 3 months. During that period, I'd be really happy to get feedback from implementers of PSR-6 to see if and how these interfaces could be promoted to a new PSR that they might be interested in implementing.

ping @Nyholm @aequasi @tedivm @andrerom

@nicolas-grekas nicolas-grekas changed the title Interop of upcoming Symfony Cache 3.2 features [RFC] Interop of upcoming Symfony Cache 3.2 features Aug 24, 2016
@Crell
Copy link
Contributor

Crell commented Aug 24, 2016

I would recommend talking to some of the Drupal folks. Drupal 8 has a (non-PSR-6) cache system that supports both tags and cache contexts. Having a generalized support for that functionality would be great; that team likely has good insights into any odd edge cases to consider.

Paging @wimleers.

@nicolas-grekas
Copy link
Member Author

@wimleers @Crell are you at DrupalCon Dublin? Maybe meet there?

@Nyholm
Copy link
Member

Nyholm commented Sep 26, 2016

I would be happy to see tags in a PSR. I think there are some issues with your suggestion. Or... maybe not issues but I think the solution is not complete.

Consider how I, as a third party library, would use a CachePool that supports tagging. I would need something like TaggablePoolInterface to type hint for. I would also have to make sure that getItem returns an object that implements an interface that has the tag function. See example

With the current suggestion, a third party library have to to like this:

class Foo {
  public function __construct(WhatInterface?? $cachePool) {
    // ..
  }

  public function doCalculations($data) {
    $item = $this->cachePool->getItem($this->getKey($data));
    if (!$item->isHit()) {
      $item->set($this->doSomeWork());

      // Do I know that this item supports tagging?
      if (method_exists($item, 'tag')) {
         $item->tag('foo');
      }

     // ...
  }
  // ..
}

But I do believe the CacheItem should have a getTags function. But since items are not transferable between pools it may not be required. Just as getExpirationDate...

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Sep 27, 2016

About getTags on CacheItems, I also think it's not required in the same way than getExpirationDate isn't.
On the pool interface, you have clearTags(array $tags), we have invalidateTags($tags). Some minor differences there: obviously the name, then the type hint. I guess that by looking at PSR-6 (getItem/getItems), we could have two methods, e.g. invalidateTag/invalidateTags or clearTag/clearTags?
I don't know what TAG_SEPARATOR is nor do I see the need for it, do you (in terms of interopt of course)?
About your TaggableItemInterface, I explicitly didn't add any similar interface in Symfony because since cache items aren't interoperable, there is no need for one in Symfony. At the PSR level, it's different of course. Should a PSR with something like your TaggableItemInterface have setTags, addTag, addTags and/or tag methods? IMHO, if the FIG were to work on this, I'd bet on addTag+addTags :)
WDYT @aequasi @tedivm @andrerom @wimleers @Crell?

@Crell
Copy link
Contributor

Crell commented Sep 30, 2016

I'm at DrupalCon in a meeting right now, but let's talk in a half hour or so.

Also, if we're talking about tags pinging @davidstrauss.

@nicolas-grekas
Copy link
Member Author

Adding @LionsAd to the loop.

@LionsAd
Copy link

LionsAd commented Oct 3, 2016

As spoken at DrupalCon, we definitely need a $item->getTags() method for retrieved items as else we would need to store the tags twice - once in metadata and once normally.

Also we definitely should look at a generic method of "bubbling".

@tedivm
Copy link

tedivm commented Oct 3, 2016

My personal advice is to avoid tags. There is no way to do tags in systems like memcache without massive overhead the removes a lot of the benefits of tags to begin with. If you look at every major caching engine in php (stash, zendcache, doctine)- not one of them supports tagging across all of their backends.

Group invalidation is best done with hierarchical/stacked caches or namespaces.

@andrerom
Copy link
Contributor

andrerom commented Oct 4, 2016

There is no way to do tags in systems like memcache ...

In the case of Symfony, memcached support is currently skipped (missing php7 support), and in the case of Drupal the tags are afaik stored in db (actually all seems to be stored there atm, besides APCu option) which as a bonus avoids memcached cache eviction issues for tags.

So there are ways, and even if tags gets added as a standard it is as far as I know going to be a add-on to the current one, not replace it.

Group invalidation is best done with hierarchical/stacked caches or namespaces.

Hierarchies is why Stash is slow on a networked setup to begin with, to many calls to the backend per cache item on what is supposed to be the fast path: get. So if you ask me tags are superior, also on being able to clear relevant data more easily, while still at risk for clearing to much in some cases, far less then hierarchies.

anyway, back to the subject.

As spoken at DrupalCon, we definitely need a $item->getTags() method for retrieved items as else we would need to store the tags twice - once in metadata and once normally.

Not that I have any strong opinions on if it should on extended item or extended pool, but why is it needed on the item? Need access to them as they are passed true the system?

@LionsAd
Copy link

LionsAd commented Oct 4, 2016

@andrerom The reason is that Drupal does not know the tags when retrieving the cache item. It only knows the tags, when storing them.

So we need all the tags the cache item was stored with.

@nicolas-grekas
Copy link
Member Author

If I understood correctly, Drupal is using the list of tags on cache hits to forward them to higher level caches (e.g. Varnish) so that these are able to invalidate by tag at their level also.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Oct 4, 2016

About tags and backends (e.g. memcache), please note also that it is possible so have different backends for tags and for items (Symfony Cache 3.2 already allows this), so you could store items in memcache and their tags in db or redis or something else that is reliable enough for that.

@wimleers
Copy link

wimleers commented Oct 4, 2016

Sorry, GitHub notifications don't really work for me. If you need my feedback, ping me on Twitter, IRC or e-mail me.

I'm happy to see that you talked to Fabian (@LionsAd) at DrupalCon Dublin last week :)


If I understood correctly, Drupal is using the list of tags on cache hits to forward them to higher level caches (e.g. Varnish) so that these are able to invalidate by tag at their level also.

Correct. See https://www.drupal.org/developing/api/8/cache/tags/varnish, and http://foshttpcache.readthedocs.io/en/1.4.0/varnish-configuration.html#tagging.


Also note that Drupal is not alone in supporting this. Varnish can trivially be configured to support it. Fastly has supported this for years (they call it Surrogate-Keys), CloudFlare and KeyCDN added support after they saw Drupal 8 shipping with it.

It just is an elegant, simple solution to a big problem.

@wimleers
Copy link

wimleers commented Oct 4, 2016

I see that @Crell also mentioned "cache contexts", besides just "cache tags".

You may want to read Drupal 8's documentation on those concepts. We have a trifecta of "cachebility metadata":

  1. tags (for data dependencies)
  2. contexts (for request context dependencies, like HTTP's Vary)
  3. max-age (for time dependencies, like HTTP's Cache-Control: max-age)

Read more about them at:


Finally, for some extra context as to how we use this: Drupal's rendering system is extremely flexible and overridable/extensible at every level. This is how its module ecosystem can add so much functionality without breaking things.

But it's of course costly to have such a flexible system. So, we "render cache" rendered fragments of the page. This is what we call "render caching".

The question then becomes: how can we ensure that those cached fragments are invalidated when necessary (cache tags + max-age) and varied as necessary (cache contexts)? Because if they contain the title of some content that is modified, or contain content whose permissions were just revoked from some user role, those cached fragments could surely cause massive problems (also of the security kind).

Well, these 3 things (tags/contexts/max-age) are associated with every rendered thing. Every rendered thing also can have "attachments" (for assets libraries: CSS + JS, HTTP headers, HTML <head> values). All of that metadata (cacheability metadata + attachments) is then "bubbled" to the Response level (like JS events bubble up the DOM). Then during KernelEvents::RESPONSE a response's attachments are brought into the response itself: HtmlResponseAttachmentsProcessor.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Oct 5, 2016

[edit: proposal superseded below]

So, what about proposing two interfaces in a PSR for tags, basically:

interface TaggableCacheItemInterface extends CacheItemInterface
{
    /**
     * @return array
     */
    public function getTags();
    /**
     * @return $this
     */
    public function setTags(array $tags);
    /**
     * @return $this
     */
    public function addTag($tag);
    /**
     * @return $this
     */
    public function addTags(array $tag);
}

and

interface TaggableCacheItemPoolInterface extends CacheItemPoolInterface
{
    /**
     * @return bool
     */
    public function invalidateTag($tag);
    /**
     * @return bool
     */
    public function invalidateTags(array $tags);
}

making them throw whenever a tag contains reserved chars (same list as for keys).
we'd need of course some description of the expected behavior of TaggableCacheItemPoolInterface instances, that is: return instances of TaggableCacheItemInterface, and storing tags when saving of course.

@LionsAd
Copy link

LionsAd commented Oct 5, 2016

Generally +1 however, I think it should be invalidateTag() / invalidateTags() instead of clearTag*().

Clearing sounds like removing, but it is invalidation what happens here.

@nicolas-grekas
Copy link
Member Author

updated for invalidateTag(s)

@LionsAd
Copy link

LionsAd commented Oct 6, 2016

Does PSR in general use singular + plural functions for the interface?

In general I am not a fan of hybrid interfaces (e.g. one function taking both), but just providing:

getTags()
setTags(array)
addTags(array)

invalidateTags(array)

I would say is enough. Or is there some standard I am not aware of?

Especially since with PHP 5.4+ it has become so easy to write: invalidateTags(['rendered']);

@nicolas-grekas
Copy link
Member Author

@Crell any stand on the above comment? PSR6 has e.g. getItem+getItems, deleteItem+deleteItems.

fabpot added a commit that referenced this issue Oct 9, 2016
…Tags() (nicolas-grekas)

This PR was merged into the 3.2-dev branch.

Discussion
----------

[Cache] Accept only array in TagAwareAdapter::invalidateTags()

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

Just to be sure that we won't suffer from some stupid incompatibility if a cache-tags PSR is published one day. See discussion: #19728 (comment)

Commits
-------

a08acc7 [Cache] Accept only array in TagAwareAdapter::invalidateTags()
@d3f3kt
Copy link

d3f3kt commented Nov 6, 2016

There is no way to do tags in systems like memcache without massive overhead the removes a lot of the benefits of tags to begin with

Well it is possible. I've opened yesterday an issue - Leaseweb/LswMemcacheBundle#84
In my opinion its a very nice feature to have such an implementation. It's a very lightweight way to manage items in combination with tags.

@andrerom
Copy link
Contributor

andrerom commented Nov 7, 2016

@d3f3kt might be it's possible, but in your implementation example you end up doing several gets. Symfony Cache supports multi get, and is cable of loading all items using just one load operation, even with tags, lots of tags, which is pretty important once you have latency on contacting Memcached/Redis (e.g. on AWS).

So it would need to somehow (efficiently on large dataset) do item expiry when tag is expired, instead of checking it on load.

@javiereguiluz javiereguiluz added Cache RFC RFC = Request For Comments (proposals about features that you want to be discussed) labels Nov 10, 2016
@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 9, 2016

See #20851 for an implementation of the CacheItem::getTags() method.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 19, 2016

Please note this particular behavior of CacheItem::getTags() in #20851, that could make it into the PSR if you agree on it:

  • on a miss, it always returns an empty array
  • on a hit, it's more complex:
    • it returns the array of tags of course,
    • but once set() or tag() is call, the tag list is reseted, so that the new saved value has only fresh tags,
    • the tag list is not reseted when calling the expiration related setters, so that one can fetch + expand the validity period without loosing already set tags

WDYT?

@LionsAd
Copy link

LionsAd commented Dec 23, 2016

Looks great to me, I had been under the impression that cache items are immutable and you could just prolong the expiration period by creating a new one from the existing one?

But yes it makes sense that the tags are reset in those cases.

@Nyholm
Copy link
Member

Nyholm commented Dec 28, 2016

but once set() or tag() is call, the tag list is reseted, so that the new saved value has only fresh tags,

Im not too sure I agree with this. When we had the function tag sure, it makes sense. But now we have addTags and setTags. I, as a user, would not expect that my tags disappear unless Im doing setTags([]).

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 28, 2016

Tags are an artifact of the computation of the cached data. Adding a tag without actually computing the value makes no sense in this respect.
When it does have some sense for some particular need, then one could use first getTags, and use the result to set tags again. But this shouldn't be the default IMHO.

People are going to do things like eg:

$item = $pool->getItem('foo');
if (!$item->isHit()) {
    // compute $value step #1
    $item->addTag('foo');
    // compute $value step #2
    $item->addTag('bar');

    $pool->save($item->set($value));
}

But this code is non trivially broken as you can spot, because the previous tags if any will be kept for the new value.

The reasoning is not related to addTag() (which for tag() is a semantic alias really to me), but to getTags():
if we were to remove getTags() altogether, then not having any tags loaded from the pool would be the natural behavior. Adding getTags() shouldn't change this.

@Nyholm
Copy link
Member

Nyholm commented Dec 28, 2016

I see your point but I believe this will cause some WTF.

$item = $pool->getItem('foo');
if (!$item->isHit()) {
    $item->getTags(); // ['biz']

    $item->addTag('foo');
    $item->getTags(); // ['foo'] <--- addTags behaves like setTags here

    $item->addTag('bar');
    $item->getTags(); // ['foo', 'bar'] <--- addTags does not behave like setTags here
    // ...
}

if we were to remove getTags() altogether, then not having any tags loaded from the pool would be the natural behavior. Adding getTags() shouldn't change this.

Removing getTags and setTags, yes. Then it would make sense. And technically, having getTags and setTags should not make a difference. But it does from the DX perspective.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 28, 2016

I think we need different storages+vocabularies for tags coming from the pool and for tags set by in the current process.
e.g:

  • getTagsFromPool <= no corresponding setter, same as eg key()
  • addTag/addTags/setTags <= no corresponding getter, same as eg expiresAt(), always starts empty

If the idea is OK, we could think of better names :)

@Nyholm
Copy link
Member

Nyholm commented Dec 28, 2016

That would be better if we still want to have this behavior. But Im not convinced that we do. What would the output of this be:

$item = $pool->getItem('foo');
$item->getTags(); // ['']
$item->addTag('foo');
$pool->save($item->set($value));

$item->addTag('bar');
$pool->save($item->set($value));

echo $item->getTags(); // ???

Tags are an artifact of the computation of the cached data.

If we change this and consider tags as just being labels to an cache item. Then the behavior would be more intuitive.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 28, 2016

But a cache is not a data store, that's the big difference which invalidates the "label" pov. If you clear the cache then rebuild, you must get the same state as before.
But if you consider the previous state of the cached value (its tags) to create the next state, then you're not going to be able to rebuild anything.
So, there are really two different storages for tags: the local one, and the fetched one, and both don't mix together.

@Nyholm
Copy link
Member

Nyholm commented Dec 28, 2016

But a cache is not a data store

Your argument makes sense. We need to explain that very well in the PSR and with the method names.


👍 for two storages


This might be too expressive:

  • getTagsFromPool
  • addNewTag
  • addNewTags
  • setNewTags

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 28, 2016

Maybe remove the adders and keep only the setter, so that confusion is even less possible?
-> getOldTags + setNewTags?
-> getPreviousTags + setTags?
-> getHitTags + setTags?

@andrerom
Copy link
Contributor

andrerom commented Dec 28, 2016

But a cache is not a data store

Side: Just make it very clear cache for tags must not be evicted from cache before the item is, cause otherwise you have a cache integrity issue as clearing by tag won't take effect leaving stale cache around when it should not, causing app to get severe integrity issues.

@Nyholm
Copy link
Member

Nyholm commented Dec 28, 2016

Thank you @andrerom.

Removing the adders makes it easier. I like getPreviousTags and setTags.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Dec 29, 2016

Agree. Here are our updated interop interfaces:

interface TaggableCacheItemInterface extends CacheItemInterface
{
    /**
     * @return array
     */
    public function getPreviousTags();

    /**
     * @return $this
     */
    public function setTags(array $tags);
}

and

interface TaggableCacheItemPoolInterface extends CacheItemPoolInterface
{
    /**
     * @return bool
     */
    public function invalidateTag($tag);

    /**
     * @return bool
     */
    public function invalidateTags(array $tags);

    /**
     * {@inheritdoc}
     *
     * @return TaggableCacheItemInterface
     */
    public function getItem($key);
 
    /**
     * {@inheritdoc}
     *
     * @return \Traversable|TaggableCacheItemInterface[]
     */
}

@Nyholm
Copy link
Member

Nyholm commented Dec 29, 2016

I like it.

Could you overwrite the getItem and getItems?

interface TaggableCacheItemPoolInterface extends CacheItemPoolInterface
{
  // ..
  /**
  * {@inheritdoc}
  *
  * @return TaggableCacheItemInterface
  */
  public function getItem($key);
 
 /**
  * {@inheritdoc}
  *
  * @return array|\Traversable|TaggableCacheItemInterface[]
  */
  public function getItems(array $keys = []);

@Crell
Copy link
Contributor

Crell commented Dec 29, 2016

The root confusion here, I believe, is because of the somewhat dual nature of CacheItem as both a value object (when get()ing) and a command object (when set()ing). I apologize for that confusion.

IMO, CacheItem should be treated mainly as a command object as soon as you decide you're going to write it back. That means assume you're on your own to set all the tags you want to be recorded, exclusively. Do not "inherit" earlier tags at all. As @nicolas-grekas said, it's not a data store.

Which begs the question... what's the use case for getPreviousTags()? Why would you even care?

Also, I do think an incremental addTag() method is useful. You may not have all the tags to add in a single shot.

@nicolas-grekas
Copy link
Member Author

@Crell we exchanged a few words on the topic with @LionsAd (see somewhere above):

From @LionsAd:

we definitely need a $item->getTags() method for retrieved items as else we would need to store the tags twice - once in metadata and once normally.

Which I reformulated this way:

If I understood correctly, Drupal is using the list of tags on cache hits to forward them to higher level caches (e.g. Varnish) so that these are able to invalidate by tag at their level also.

It looks like a valid argument to me.

nicolas-grekas pushed a commit that referenced this issue Jan 14, 2017
…ters

| Q             | A
| ------------- | ---
| Branch?       | 3.1
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets |
| License       | MIT
| Doc PR        |

Fixes missing type hint for cache item on pool getters so methods on Symfony CacheItem
is correclty suggested when using IDE's or api documentation.

As proposed here: #19728 (comment)

Note: Specifically sets array of items as return type of getItems as phpdoc and IDEs supports
this by now, and since this is specifically what is being returned.
nicolas-grekas added a commit that referenced this issue Jan 14, 2017
…rface getters (andrerom)

This PR was merged into the 3.1 branch.

Discussion
----------

[DX][Cache] Set right cacheItem type hint on AdapterInterface getters

| Q             | A
| ------------- | ---
| Branch?       | 3.1
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets |
| License       | MIT
| Doc PR        |

Fixes missing cache item type hint on pool getters so methods on Symfony CacheItem
is correctly suggested when using IDE's or api documentation.

As proposed here: #19728 (comment)

_Note: Specifically sets array of CacheItems as return type of getItems as phpdoc and IDEs supports
this by now, and since this is specifically what is being returned. If Sami does not still support this we can adjust to what was originally suggested._

Commits
-------

5f7baa5 [DX][Cache] Set right type hint for cacheItem on AdapterInterface getters
symfony-splitter pushed a commit to symfony/cache that referenced this issue Jan 14, 2017
…ters

| Q             | A
| ------------- | ---
| Branch?       | 3.1
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets |
| License       | MIT
| Doc PR        |

Fixes missing type hint for cache item on pool getters so methods on Symfony CacheItem
is correclty suggested when using IDE's or api documentation.

As proposed here: symfony/symfony#19728 (comment)

Note: Specifically sets array of items as return type of getItems as phpdoc and IDEs supports
this by now, and since this is specifically what is being returned.
symfony-splitter pushed a commit to symfony/cache that referenced this issue Jan 14, 2017
…rface getters (andrerom)

This PR was merged into the 3.1 branch.

Discussion
----------

[DX][Cache] Set right cacheItem type hint on AdapterInterface getters

| Q             | A
| ------------- | ---
| Branch?       | 3.1
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets |
| License       | MIT
| Doc PR        |

Fixes missing cache item type hint on pool getters so methods on Symfony CacheItem
is correctly suggested when using IDE's or api documentation.

As proposed here: symfony/symfony#19728 (comment)

_Note: Specifically sets array of CacheItems as return type of getItems as phpdoc and IDEs supports
this by now, and since this is specifically what is being returned. If Sami does not still support this we can adjust to what was originally suggested._

Commits
-------

5f7baa5 [DX][Cache] Set right type hint for cacheItem on AdapterInterface getters
@Nyholm
Copy link
Member

Nyholm commented Feb 11, 2017

I want to bring some attention to this issue again.

As I can see it @Crell asked for an addTag because "You may not have all the tags to add in a single shot." That is the only thing left hanging (as far as I can see). That is a minor change.

Are the members/subscribers here in agreement that the interfaces Nicolas proposed fits the requirements? Are they working with Drupal, Symfony, others?

I've added an repo here creating those interfaces: https://github.com/php-cache/tag-interop
I would like to tag a stable version. If a future PSR will add stuff to it it would be okey, but I would like us to be able to do a non-BC breaking update from tag-interop to the PSR one.

fabpot added a commit that referenced this issue Mar 22, 2017
This PR was merged into the 3.3-dev branch.

Discussion
----------

[Cache] Add CacheItem::getPreviousTags()

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

As discussed in #19728

Commits
-------

da35466 [Cache] Add CacheItem::getPreviousTags()
@javiereguiluz
Copy link
Member

Closing this issue that talks about "the upcoming Symfony 3.2" because we already published 3.2 and 3.3. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cache RFC RFC = Request For Comments (proposals about features that you want to be discussed)
Projects
None yet
Development

No branches or pull requests

9 participants