WORK IN PROGRESS
Redis keeps all its data in memory, and so it is important to optimize memory usage. This document explains several strategies to reduce memory overheads. This document builds up on the official memory optimization notes on redis.io, so you should read that document first.
Redis uses a special, memory-efficient encoding if a list, sorted set or hash contains meets the following criteria -
The default settings in redis.conf are a good start, but it is possible to achieve greater memory savings by adjusting them according to your data. The memory analyzer helps you find the best value for these parameters based on your data. Run the analyzer on your dump.rdb file, and then open the generated csv file.
Start by filtering on type=list and encoding=linkedlist. If there are no matching rows, or the number of filtered rows is very small, you already are using memory efficiently.
Otherwise, look at the columns
len_largest_element, and compare them to the settings
list-max-ziplist-value. Try to find a value for these two settings such that most linkedlists are converted to ziplists.
But remember, this is a balancing act - you don't want to set an extremely large value, because then you would start consuming a lot of CPU cycles.
Repeat the steps for hashmaps and sorted sets.
Always use integer ids for objects. Avoid GUIDs or other unique strings to identify an object.
IDs for objects are likely to be used in other data structures such as lists, sets and sorted sets. When you use integers, Redis can use a compact representation to save memory. Some examples of how Redis optimizes :
In short, use Integers whenever you can. Also, see : http://stackoverflow.com/questions/10109921/username-or-id-for-keys-in-redis/10110407#10110407
IntSet is essentially a sorted array of numbers. Inserting or retrieving a number takes O(log N) operations - its basically a binary search. But because this is only used for small sets, the amortized cost of the operation is still O(1)
Int Sets come in 3 flavours - 16bits, 32 bits or 64 bits. If you only use 16 bit numbers, the 16 bit version will be used. If one of your elements in greater than 2^15, Redis will automatically use 32 bits for each number. Similarly, when an element exceeds 3^31, Redis will convert the array to a 64 bit array.
In contrast, the regular implementation of Set uses a Hashtable. This has a lot of memory overheads.
To save memory, store integers in your sets. That way, Redis will automatically use the most memory efficient data structure.
String data type has an overhead of about about 90 bytes on a 64 bit machine. In other words, calling
set foo bar uses about 96 bytes, of which 90 bytes is overhead.
You should use the String data type only if :
If you are not doing any of the above, use hashes instead. See this Instagram article on how they stored hundreds of millions of key value pairs in Redis.
Lets say you want to store user details in Redis. The logical data structure is a Hash. For example, you would do this -
hmset user:123 id 123 firstname Sripathi lastname Krishnan location Mumbai twitter srithedabbler.
Now, Redis 2.6 will store this internally as a Zip List; you can confirm by running
debug object user:123 and look at the encoding field. In this encoding, key value pairs are stored sequentially, so the user object we created above would roughly look like this
["firstname", "Sripathi", "lastname", "Krishnan", "location", "Mumbai", "twitter", "srithedabbler"]
Now, if you create a second user, the keys will be duplicated. If you have a million users, well, its a big waste repeating the keys again.
To get around this, we can borrow a concept from Python - NamedTuples. A NamedTuple is simply a read-only list, but with some magic to make that list look like a dictionary.
Your application needs to maintain a mapping from field names to indexes. So, "firstname" => 0, "lastname" => 1 and so on. Then, you simply create a list instead of a hash, like this -
lpush user:123 Sripathi Krishnan Mumbai srithedabbler. With the right abstractions in your application, you can save significant memory.
Don't use this technique if -
The memory analyzer output has a column "bytes_saved_if_converted_to_list" against every hash. Sum up this column and see how much memory you'd save if you make that change. If it is significant and worth the added complexity, go ahead and make that change.
Last edited by sripathikrishnan,