Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post: Matryoshka #29

Merged
merged 10 commits into from Jan 20, 2015
145 changes: 145 additions & 0 deletions _posts/2015-01-13-matryoshka-configurable-caching-library-for-php.md
@@ -0,0 +1,145 @@
---
layout: with-comments
title: "Matryoshka: A Configurable Caching Library for PHP"
author: Marc Zych
author_url: https://github.com/marczych
summary: We recently open-sourced Matryoshka, a configurable caching library
for PHP which makes common operations easier and allows for on-the-fly
configuration.
---

Like most websites, we make heavy use of caching to reduce load on our servers and decrease page response times.
Our caching daemon of choice is [memcached].
The PHP extensions are certainly usable and provide all of the core functionality that you could need.
However, we use a lot of patterns to make our day to day caching much easier that aren't provided by the extensions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/that/which/ ?


In comes [Matryoshka], an open source caching library for PHP which makes common operations easier and allows for on-the-fly configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "So we developed..." instead of "In comes..."

I think we should take a bit more credit here, you wrote it from scratch.
+1 - David


# Configurable Behavior

Matryoshka is designed to be very configurable.
You can add functionality on the fly simply by wrapping an existing `Backend` with a new one.
We use this extensively to prefix keys, modify expiration times, disable gets, gather metrics, etc.

Start off with a `Memcache` instance:

{% highlight php startinline=true %}
// From the native extension.
$memcache = new Memcache();
$memcache->pconnect('localhost', 11211);
$cache = Matryoshka\Memcache::create($memcache);
{% endhighlight %}

Then prefix all the keys:

{% highlight php startinline=true %}
$prefixedCache = new Matryoshka\Prefix($cache, 'prefix-');
// The key ends up being "prefix-key".
$prefixedCache->set('key', 'value');
$value = $prefixedCache->get('key');
{% endhighlight %}

Finally double all expiration times for the prefixed backend:

{% highlight php startinline=true %}
$doubleExpiration = function($expiration) {
return $expiration * 2;
};
$cache = new Matryoshka\ExpirationChange($prefixedCache,
$doubleExpiration);
// Results in an expiration time of 20 for "prefix-key".
$cache->set('key', 'value', 10);
{% endhighlight %}

By using composition, caching configurations can be assembled on the fly quite easily.
The core API is identical for all backends so the caller doesn't need to be aware of the exact configuration.
Common configurations and base backends (like memcached connections) can be made into singletons or provided using dependency injection in your application.
Additionally, this architecture results in very maintainable and testable code because each class has exactly one job.

# Scopes

Cache invalidation is hard.
To make it easier, Matryoshka provides "cache scopes" to invalidate a group of keys at once.
[This works][Scope.php] by prefixing all keys with a unique value that is stored in the backend using the scope name.

{% highlight php startinline=true %}
$cache = new Matryoshka\Scope($memcachedBackend, 'name');

// This results in a get request to memcached for 'scope-name'
// which results in something like '0fb4ae36'. This `set` call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to me to either need more detail or less. What do you think about:

This creates a Backend where each new key that is set is prepended with a random prefix like '0fb4ae36'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have now makes sense to me but it can probably be improved. I want to make it clear that the random prefix is stored in memcached which makes it tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current text is fine.

// then results in a key of '0fb4ae36-key'.
$cache->set('key', 'value');
$cache->set('key2', 'value2');
$value = $cache->get('key'); // '0fb4ae36-key' => 'value'

// Deleting the scope results in a new scope value e.g. 'e093f71e'.
$cache->deleteScope();

// Both of these result in a miss because the scope has a new
// value so the keys are now prefixed with 'e093f71e-'.
$value = $cache->get('key'); // 'e093f71e-key' => false
$value2 = $cache->get('key2'); // 'e093f71e-key2' => false
{% endhighlight %}

We have found this to be particularly useful for scoping keys to code deploys.
We simply put any caches that should be invalidated under the `'deploy'` scope which is deleted anytime we deploy code.
You can also have dynamic scopes such as `"post-{$postid}"` which can be cleared anytime a specific post is modified.
This is an implementation of [generational caching] which we make [heavy use of][caching thesis] throughout our application.

Cache scopes help with cache invalidation but unfortunately don't make [naming things] any easier.

# Helper Functions

Matryoshka adds a few helper functions to make common operations easier.
`getAndSet` makes populating the cache dead simple:

{% highlight php startinline=true %}
// Calls the provided callback if the key is not found and sets
// it in the cache before returning the value to the caller.
$value = $cache->getAndSet('key', function() {
return 'value';
});
{% endhighlight %}

Similarly, `getAndSetMultiple` makes doing multi-gets significantly easier:

{% highlight php startinline=true %}
// Array of key => id. The ids can be anything used to identify
// the resource that the key represents.
$keys = [
'key-1-a' => [1, 'a'],
'key-2-b' => [2, 'b']
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of using a simpler example here. id2a almost looks like a random hash at first glance.
Perhaps:

$keys = [
   'key1' => 1,
   'key2' => 2
];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had id1 and id2 before but I like having an array so it's obvious that it's purely for the callback and Matryoshka doesn't care what it is. I guess the key needs to be updated to make the example more compelling. How about:

$keys = [
   'key-1-a' => [1, 'a'],
   'key-2-b' => [2, 'b']
];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's better!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with 37965d3.

// Calls the provided callback for any missed keys so the missing
// values can be generated and set before returning them to the
// caller. The values are returned in the same order as the
// provided keys.
$values = $cache->getAndSetMultiple($keys, function($missing) {
// Use the ids to fill in the missing values.
foreach ($missing as $key => $primaryKey) {
$missing[$key] = getValueFromDb($primaryKey);
}

// Return the new values to be cached and merged with the hits.
return $missing;
});
{% endhighlight %}

# Try it out!

You can install Matryoshka with composer from [Packagist] or by cloning the [repo][Matryoshka] into your project.
A complete list of backends as well as more examples are available in the [readme].
[memcached], specifically the [Memcache extension], is the only supported caching daemon right now but adding others is very easy.
We encourage you to try it out and contribute any caching techniques that you find useful in your own applications.

Happy caching!

[memcached]: http://memcached.org/
[Matryoshka]: https://github.com/iFixit/Matryoshka
[readme]: https://github.com/iFixit/Matryoshka#readme
[Scope.php]: https://github.com/iFixit/Matryoshka/blob/master/library/iFixit/Matryoshka/Scope.php
[naming things]: http://martinfowler.com/bliki/TwoHardThings.html
[Packagist]: https://packagist.org/packages/ifixit/matryoshka
[Memcache extension]: http://php.net/manual/en/book.memcache.php
[generational caching]: https://signalvnoise.com/posts/3113-how-key-based-cache-expiration-works
[caching thesis]: http://digitalcommons.calpoly.edu/theses/1002/
4 changes: 4 additions & 0 deletions stylesheets/styles.css
Expand Up @@ -77,6 +77,10 @@ code, pre, .gist{
line-height:1.4em;
}

p code {
font-size: 14px;
}

pre {
padding:8px 15px;
background: #f8f8f8;
Expand Down