Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
---
title: "Save Redis memory with large list packing"
date: "2013-08-31 18:54 CEST"
tags: redis, performance, ruby
published: true
---
Redis is great for storing certain types of data that do not do well in a relational database.
It is great for storing caches and statistics too.
However, the storage capacity of redis is capped by RAM size, so care is required when storing large amounts of data in it.
I will show you a trick to reduce memory usage and improve the speed of the redis *list* type for storing series-type data (e.g. time series).
Redis [packs](https://github.com/antirez/redis/blob/unstable/src/ziplist.c) small lists of ints, reducing their memory usage by 10-40%.
The exact list size threshold at which redis stops compressing is defined by the `list-max-ziplist-entries` setting and defaults to 512.
Here is the trick: we will transparently partition a list into many small lists of `list-max-ziplist-entries` size.
By storing lists under the threshold, we take advantage of redis' ability to pack them.
READMORE
Here is a memory usage benchmark with random int values between 0 and 10:
<%= article_image_tag 'bm-512.png', class: 'img-fluid' %>
To benchmark access speed let's try some random accesses on a large list:
~~~
bm_speed: 3,000,000 keys, 100 random range reads
user system total real
regular 0.380000 0.040000 0.420000 ( 0.686875)
partitioned 0.390000 0.060000 0.450000 ( 0.482544)
~~~
Even despite Ruby overhead we still have both wall-clock time and memory savings.
A regular redis list is a linked list, so getting ranges out of it is slow.
However, when we use this technique the list is partitioned, and it only takes `O(1)` to get to the correct partition.
As you increase `list-max-ziplist-entries`, memory savings increase and range queries get slower.
I published a [Ruby implementation][redis-stats-int_series.rb] of this approach for int and time series.
Are you doing something similar? I am curious to hear other approaches.
*Thanks [@ddtrejo](https://twitter.com/ddtrejo) for your edits and suggestions!*
#### Notes
* Packing efficiency depends on the data. Benchmark with your real data.
* You can implement a similar trick for hashes and sets, redis will compress them based on the respective `-ziplist-entries` values.
* Tested on redis 2.6.14, installed via homebrew on OS X 10.8.3.
#### See also
* [Discussion on Hacker News](https://news.ycombinator.com/item?id=6307769).
* Instagram dev blog post on [storing key-value pairs using partitioned hashes][instagram-partitioned-hashes].
* [Redis in Action](http://www.manning.com/carlson/) book by [@dr_josiah](https://twitter.com/dr_josiah).
[instagram-partitioned-hashes]: http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs
[redis-stats-int_series.rb]: https://github.com/glebm/redis_stats/blob/master/lib/redis_stats/int_series.rb