Hash Field Expiration - Draft #13172

moticless · 2024-03-26T13:04:51Z

Abstract

This proposal advocates for enhancing Redis expiration functionality by introducing the option to set expiration on individual hash fields. In addition, it challenges the existing keyspace expiration implementation by presenting an alternative data structure known as ebuckets, initially designed for HFE usage. ebuckets combine different recurring ideas that were raised up by the community, in the context of keyspace, potentially extending their relevance to hash fields as well. The new DS is optimized for active expiration, sorted by TTL, boasting a reduced memory cost of approximately ~20 bytes of metadata per item compared to the existing ~40 bytes in the dict for expiry. It uses an embedded expiry struct in the registered item which saves another pointer and gives along the way an O(1) TTL lookup.

In that context, we also introduce a new structure, named MSTR, which stands for immutable STRing with Metadata. It comes as a replacement for SDS-fields in a hash. Whereas SDS only allows manipulation of strings, with MSTR we can add and remove metadata attached to immutable strings generically.

Overview

Redis is often used as a cache, and as such, one of its main features is being able to expire keys based on timeout. Over the years there was a recurring demand for more granular expiration control, such as to expire members in a set [1,#135], or members in sorted set [1,2], and particularly there are requests for expiration of fields of a hash [1,2,3,#167,#1042]. Several initiatives have been undertaken to address these needs, taking the form of third-party modules [1,2] and Redis forks [1].

The key aspects of this feature include:

Active Expiration - Propose adequate DS to support deletion of expired fields across hashes, as well as having the ability to delete expired fields of specific hash. The DS must be scalable starting from the first item, concise in memory consumption, optimized for active expiration operation, and adapted to work with existing hash implementation of Redis.
Lazy expiration - Every time a hash is being accessed, allegedly it is needed to remove all expired fields. This approach, even though it brings the object to the desired state, might, in extreme cases, impact the latency. Unlike EXPIRE command for keyspace which only needs to decide whether a given object is expired. Here we need to fine-tune a few hash commands for corresponding action to be taken.
API - The API is aligned with the spirit of the existing EXPIRE feature for keys, however, some commands adopt a slightly different approach.

Active Expiration

When designing a data structure for hash field expiration, it might be a good idea to examine first the current Redis implementation of key expiration, considering its pros and cons. By evaluating the existing approach, we can determine whether adopting a similar strategy would be beneficial. This examination may not only aid in designing an effective data structure for hash field expiration but potentially, may also improve the mechanism of Redis for key expiration. Therefore, the terms “key” and “field” (or “hash field”) might be used interchangeably. In addition, even if eventually it might be implemented only for hash field expiration, it allows us to challenge the current implementation and provide a good reference.

Redis uses a hash table DS to implement key expiration, wherein the insertion of a new key with TTL into the table is determined by the key's hash value. The pros and cons are:

Pros:

Search, insert, and delete takes on average O(1)

Cons:

Inefficient active expiration that requires to scan the entire hash
Rehashing operation is time and memory-consuming.
Memory overhead (~40 bytes per item).

As can be seen, using a hash-table for expiration brings with it one major advantage of O(1) access time to a given item, but on the other hand it raises several severe issues regarding active expiration (as highlighted in previous issues [#8700,#10802,#9480]). An alternative DS obviously should be sorted by expiration time for efficient active expiration, and be able to scale from a few items to millions. RAX is a candidate that keeps reoccurring in that context [1,#9480], as it is already available and well tested. There are more advanced algorithms for caching and support for expiration [1,2] but they might require major change and they don't scale from a few items for the case of hash fields.

One might claim that the downside of using RAX is log(n) access time. This claim can be mitigated by keeping no more than 6 bytes of the expiration time (which is sufficient until the date of 02 August, 10889) and as a result the depth of the RADIX tree will be limited to 6 levels. Another concern might be that RAX is wasting too much memory for metadata . This can be mitigated as well by holding a segment of multiple items, instead of holding a single item at the leaf of the tree. Those segments will be optimized in memory such that the items themselves will embed the structure of the segment as a linked list. Furthermore, it also reduces the frequency of updates of the rax tree and resolves conflicts of items with the same expiration time.

Having said that, the idea to embed metadata attached to the hash fields (or keys) is problematic as long as the hash fields, or keys, are of type SDS. Hence, as an alternative, we introduce a new structure, named MSTR, which stands for immutable STRing with Metadata. Whereas SDS allows to keep only strings, with MSTR we can add and remove metadata attached to the hash field (or key) string in a generic way. It might seem overkill to have general purpose MSTR just for TTL, but it is likely that other features down the road, require attaching their own metadata to hash fields (or keys) as well.

New DS: ebuckets

ebuckets, which stands for expiry-buckets, is used to store items that are set with expiration-time. It supports the basic API of add, remove, and active expiration. Its implementation is based on a RAX tree, or a plain linked list when small.
The expiration time of an item is used as the key to traverse the RAX tree.

The ebuckets data structure is organized hierarchically as follows:

ebuckets: This is the top-level data structure, based on RAX. Each leaf in the RAX tree is a bucket.
bucket: Each bucket represents an interval in time and contains one or more segments. The key in the RAX tree for each bucket represents low bound expiration time for the items within this ebucket and the key of the next bucket represents the upper bound expiration time.
segment: Each segment within a bucket can hold up to EB_SEG_MAX_ITEMS (currently it is 16) items as a linked list. If needed to add an item to a full segment, then the segment will try to split the bucket. To avoid wasting memory, it is a singly linked list (single pointer to next item). The list is cyclic to allow efficient removal of items from the middle of the segment without necessarily traversing the RAX tree.
item: Each item that is stored in ebuckets should embed the ExpireMeta structure and supply getter function. ExpireMeta holds the expiration time of the item and a few more fields that are used to maintain the segments data-structure (Described at the end of this doc).

The following diagram summarizes those ideas of using RAX, segments, items and embedded ExpireMeta. It also gives a hint to MSTR layout and how hash of type dict is going to integrate with it:

Splitting bucket

Each segment can hold up-to EB_SEG_MAX_ITEMS items. On insertion of a new item, it will try to split the segment. Here is an example For adding another item to segment number 1 that already reached its maximum capacity which will cause to split of the segment and in turn split of the bucket as well to a finer grained ranges:

       BUCKETS                             BUCKETS
      [ 00-10 ] -> size(Seg0) = 11   ==>  [ 00-10 ] -> size(Seg0) = 11
      [ 11-76 ] -> size(Seg1) = 16        [ 11-36 ] -> size(Seg1) = 9
                                          [ 37-76 ] -> size(Seg2) = 7

Extending bucket

In the example above, the reason it wasn't split evenly, is that the segment must have been holding items with the same TTL and they must reside together in the same bucket after the split. Which brings us to another important point. If there is a segment that reaches its maximum capacity and all the items have the same expiration-time key, then we cannot split the bucket but aggregate all the items, with the same expiration time key, by allocating an extended-segment and chain it to the first segment in the visited bucket. In that sense, extended segments will only hold items with the same expiration-time key.

The following diagram describes in more details the way to extend a full segment and chain a new segment:

Memory evaluation

Optimal memory utilization occurs when multiple items share the same expiration time, as they can be stored efficiently within a single bucket utilizing extended segments, akin to a cost-effective linked list structure. Initially, let's disregard this scenario and assume that each item possesses a unique expiration time, resulting in all buckets operating without extended segments.

Note that:

In any use-case, each item holds additional 16 bytes of embedded ExpireMeta.
The header of the segment is 16 bytes (pointer to first item in bucket and a counter)
Rax meta-data per leaf/segment/bucket can be roughly estimated around 40 bytes (bucket key is only 6 bytes)

Use case: ebuckets contains few items. No more than `EB_LIST_MAX_ITEMS`

When ebuckets contains no more than EB_LIST_MAX_ITEMS (=16) items, it is optimized not to allocate any memory. It just uses embedded ExpireMeta to maintain its own data-structure as a plain list. Therefore, memory overhead is 16 bytes per item.

Use case: Most items are removed via Active-Expire

If most of the items are being removed by active expiration, then sizes of the segments (buckets) will be an outcome of split and will hold at least 8 items and no more than 16 (=EB_SEG_MAX_ITEMS). So, it is expected to have an average of 12 items per bucket.

Average of 12 items in bucket: 16 + 16/12 + 40/12 = 20.6 bytes

Use case: Items removed NOT only by Active-Expire

If expected to have many "non-sequential" removal of items from buckets, then many buckets can shrink down to 4 items in a bucket. Below it, ebuckets will try merge the bucket with adjacent one. So, it is expected to have an average of 10 items per segment bucket.

Average of 10 items in bucket: 16 + 16/10 + 40/10 = 21.6 bytes

Worst case of 4 items in bucket: 16 + 16/4 + 40/4 = 30 bytes

Use case: All items with same expiration-time

In that case ebuckets will have a single bucket of type extended-segments with an unbounded number of items in it. ExpireMeta becomes the dominant value and NextSegHdr, of size 24 bytes, will be allocated per segment. Assuming 16 items per segment, we get:

16 + 24/16 = 17.5 bytes

Performance Evaluation

This section involves evaluating the insertion performance of 10 million items into ebuckets and then actively expiring all of them.
The expiration time distribution is uniformly spread, ranging from 1 second to 30 days. Note that it doesn't make much sense to make active expiration on more than a day, but it is provided for completeness.
Tested on my busy laptop: 12th Gen Intel(R) Core(TM) i7-1260P
malloc and free of the items themselves are not measured (Active-expire just removes the items from ebuckets DS).
"Ebuckets Mem Usage" counts rax memory along with segment headers.
The notion of dry-run to active-expire only evaluates the number of items that are expired and not the actual removal of the items (will be further discussed later on).

| Metric                             | 1 sec distribution | 1 minute distribution | 1 hour distribution | 1 day distribution | 1 month distribution |
|------------------------------------|--------------------|-----------------------|---------------------|--------------------|----------------------|
| Total items                        | 10,000,000         | 10,000,000            | 10,000,000          | 10,000,000         | 10,000,000           |
| Total buckets                      | 1,000              | 60,000                | 892,855             | 887,587            | 886,997              |
| Total segments                     | 625,472            | 653,186               | 892,855             | 887,587            | 886,997              |
| Average items per bucket           | 10,000.00          | 166.67                | 11.20               | 11.27              | 11.27                |
| Average items per segment          | 15.99              | 15.31                 | 11.20               | 11.27              | 11.27                |
| Average segments per bucket        | 625.47             | 10.89                 | 1.00                | 1.00               | 1.00                 |
| Ebuckets Mem (rax+seg headers)     | 14677.04 KBytes    | 16365.99 KBytes       | 36831.51 KBytes     | 40628.52 KBytes    | 50469.71 KBytes      |
| Ebuckets Mem per Bucket            | 15029 Bytes        | 279 Bytes             | 42 Bytes            | 46 Bytes           | 58 Bytes             |
| Ebuckets Mem per Item              | 1 Bytes            | 1 Bytes               | 3 Bytes             | 4 Bytes            | 5 Bytes              |
| Total Mem per Item (+ExpireMeta)   | 1 + 16 = 17 Bytes  | 1 + 16 = 17 Bytes     | 3 + 16 = 19 Bytes   | 4 + 16 = 20 Bytes  | 5 + 16 = 21 Bytes    |
| Time elapsed ebuckets creation     | 0.900223 seconds   | 3.443955 seconds      | 9.126559 seconds    | 9.753085 seconds   | 9.376918 seconds     |
| Time elapsed active-expire dry-run | 0.000042 seconds   | 0.002826 seconds      | 0.050068 seconds    | 0.068424 seconds   | 0.056705 seconds     |
| Time elapsed active-expire         | 0.919034 seconds   | 1.005693 seconds      | 1.375322 seconds    | 1.433403 seconds   | 1.461880 seconds     |

To run the benchmark (currently, it is configured to 1 month distribution):

make REDIS_CFLAGS='-DREDIS_TEST -DEB_TEST_BENCHMARK' && ./src/redis-server test ebuckets

MSTR (immutable string with metadata)

SDS string is widely used across the system and serves as a general purpose container to hold data. The need to optimize memory and aggregate strings along with metadata and store it into Redis data-structures as single bulk keep reoccur. One thought might be to attach metadata to SDS. The trouble is that SDS is a mutable string in its nature, with a wide API (split, join, etc.). Pushing metadata logic into SDS will make it very fragile, and complex to maintain.

As an alternative, we introduce a new concept of immutable strings, simplified, with limited API, and with the option to attach metadata. This idea isn’t new and was suggested in different contexts. The new thing here is defining it as infrastructure to different kinds of use cases. One use case can be attaching TTL to hash fields. Another use case can be attaching TTL and encoding to keys. Etc.

The representation of the string, without any metadata, in its basic form, resembles SDS but without the API to manipulate the string. The following diagram shows the memory layout of mstring (mstrhdr8) when no metadata is attached:

     +----------------------------------------------+
     | mstrhdr8                       | c-string |  |
     +--------------------------------+-------------+
     |8b   |2b     |1b      |5b       |?bytes    |8b|
     | Len | Type  |m-bit=0 | Unused  | String   |\0|
     +----------------------------------------------+
                                      ^
                                      |
  mstrNew() returns pointer to here --+

If metadata flag is set, depicted in diagram above as m-bit in the diagram, then the header will be preceded with additional 16 bits of metadata flags such that if i'th bit is set, then corresponding i'th metadata structure is attached to the mstring. The metadata layout and their sizes are defined by mstrKind structure (More below). The following diagram shows the memory layout of MSTR (mstrhdr8) when 3 bits in mFlags are set to indicate that 3 fields of metadata are attached to the mstring at the beginning.

      +-------------------------------------------------------------------------------+
      | METADATA FIELDS       | mflags | mstrhdr8                       | c-string |  |
      +-----------------------+--------+--------------------------------+-------------+
      |?bytes |?bytes |?bytes |16b     |8b   |2b     |1b      |5b       |?bytes    |8b|
      | Meta3 | Meta2 | Meta0 | 0x1101 | Len | Type  |m-bit=1 | Unused  | String   |\0|
      +-------------------------------------------------------------------------------+
                                                                        ^
                                                                        |
                            mstrNewWithMeta() returns pointer to here --+

Note: Initially MSTR was designed to support implict conversion from SDS to MSTR. That is, the layout of MSTR without any metadata attached is aligned with SDS layout. But later on it was decided to discard this approach and to strive for explicit conversion. This decision can be challenged in the future as the code evolves.

Kinds of MSTR

The MSTR allows defining different kinds (classes) of mstrings, each with its own unique metadata layout. For example, in case of hash fields, all instances of it can optionally have TTL metadata attached to it. This is achieved by prototyping once mstrKind structure that defines the metadata layout and metadata sizes of this specific kind.

Here is mstrKind prototype for hash fields:

typedef enum HfieldMetaFlags {
    HFIELD_META_EXPIRE = 0,
} HfieldMetaFlags;

mstrKind mstrFieldKind = {
        .name = "hField",
        .metaSize[HFIELD_META_EXPIRE] = sizeof(ExpireMeta),
};

Note that each hash field instance still has the freedom to optionally attach the expiration metadata to it. Most of them will start their life without any metadata attached.

In the future, the keys of Redis keyspace can be another kind of MSTR that aggregate TTL, LRU, or even dictEntry metadata embedded into a single allocation. Here is a general idea of how it might look like:

typedef enum HkeyMetaFlags {
    HKEY_META_VAL_REF_COUNT  = 0, // refcount
    HKEY_META_EXPIRE         = 1, // TTL and more
    HKEY_META_VAL_REF        = 2, // Val referenced
    HKEY_META_TYPE_ENC_LRU   = 3, // TYPE + LRU + ENC
    HKEY_META_DICT_ENT_NEXT  = 4, // Next dict entry
    HKEY_META_VAL_EMBED8     = 5, // Val embedded 8 bytes, 
    HKEY_META_VAL_EMBED16    = 6, // Val embedded 16 bytes, 
} HkeyMetaFlags;

mstrKind hkeyKind = {
    .name = "hkey",
    .metaSize[HKEY_META_VAL_REF_COUNT] = 4,
    .metaSize[HKEY_META_EXPIRE]        = sizeof(ExpireMeta),
    .metaSize[HKEY_META_VAL_REF]       = 8,
    .metaSize[HKEY_META_TYPE_ENC_LRU]  = 8,
    .metaSize[HKEY_META_DICT_ENT_NEXT] = 8,
    .metaSize[HKEY_META_VAL_EMBED8]    = 8,
    .metaSize[HKEY_META_VAL_EMBED16]   = 16,
};

This idea is further elaborated in Appendix B.

Alignment issues

There are two types of alignments to take into consideration:

Alignment of the metadata - As the metadatas layout are reversed to their enumeration, it is recommended to put metadata with "better" alignment first in memory layout (enumerated last) and the worst, or those that simply don't require any alignment will be last in memory layout (enumerated first). This is similar to the applied consideration when defining a new struct in C. Note also that each metadata might either be attached to MSTR or not which complicates a little the design phase of a new mstrKind. In the example above, HKEY_META_VAL_REF_COUNT, with the worst alignment of 4 bytes, is enumerated first, and therefore, will be last in memory layout.
Alignment of returned MSTR pointer - Few optimizations in Redis rely on the fact that SDS address is always an odd pointer. We can achieve the same with a little effort. It was already taken care that all headers of type mstrhdrX has odd size. With that in mind, if a new kind of MSTR is required to be limited to odd addresses, then we must make sure that sizes of all related metadatas that are defined in mstrKind are even in size.

Put the pieces together

Each hash instance will maintain its own set of HFEs in its private ebuckets. It can be attached to dict structure as metadata. In order to support active expiration across hash instances, hashes with associated HFE will be also registered in global ebuckets, per db, with expiration time value that reflects their next minimum time to expire. The global active expiration of HFE will be triggered from the activeExpireCycle() function and will invoke "local" HFE Active expiration for each hash instance that has expired fields.

In addition, current implementation of hashes by dict will be modified from using SDS as fields to hfield , which is kind of MSTR, in order to be able to attach ExpireMeta structure to the hash field. A field of hash can initially be allocated without ExpireMeta and following setting of TTL, it will be reallocated with ExpireMeta and will be added to the ebuckets of the hash. If the new TTL of the hash field is also the new minimum TTL of the hash, then the hash will be updated in the global ebuckets as well.

Lazy Expiration

Conceptually, every time before a hash object is accessed, it is needed to delete all expired fields. This approach, even though will bring the object to its desired state, might freeze the main thread for a long time in case there considerable amount of items to expire in a short period of time. As an alternative, we can refine the action taken before each command, to better suit and reduce the risk of stucking with a long expiry operation.

Extend expireIfNeeded() to support HFE

Today function expireIfNeeded() handles the lazy-expiration logic of keys. The main way this function is called is via lookupKey*() family of functions.

It is required to extend it to identify expiration of hashs in case all their hash fields are with TTL and all got expired. Which means that the hash itself should be expired as well.

A new function hashTypeIsEmpty() will be introduced to t_hash.c. It will apply the following steps (return 0 if false):

dbSize = dictSize(d);
If dict doesn't have HFE metadata attached or it is trash, then return dbSize == 0;
If the next hash field to be expired is greater than current time, then return 0;.
If size-of hash's HFE DS (ebuckets) is less than dbSize, then return 0;
If maximum expiration time in ebuckets is less than current time return 1.
Otherwise, return false.

Commands that require careful execution

Here is non-finite list of commands that requires careful execution:

HGETALL

Will not actually expire fields, but rather just filter them from the reply.

HLEN (OPEN)

As we would like to return the correct value that doesn't count expired fields, we can remove expired fields before running the command.

Alternatively, ebuckets can support dry-run for active-expire in which it can scan the buckets, starting from the first, until it reaches a bucket with an expiration time that is not below “now” and aggregate along the way the number of items that are expired. Note that it doesn’t need to iterate the items in each bucket except maybe the last one, in case it is not of type extended segment (i.e. not more than 16 items). This operation is highly efficient, typically completing no more than 50msec for 10 million items.

HLEN command will return hash size minus exExpireDryRun().

HRANDFIELD (OPEN)

For now, we shall remove all expired fields and then run the command.

This function can be further optimized.

HSCAN

Will lazy-expire (if master, otherwise will just filter out) fields (may result in an empty array) - like SCAN.

This function can be further optimized.

API

HEXPIRE group

For each specified field: set the expiration time in sec/msec/unix-sec/unix-msec:

HEXPIRE key seconds [NX | XX | GT | LT] <num-fields> <field [field ...]>
HPEXPIRE key milliseconds [NX | XX | GT | LT] <num-fields> <field [field ...]>
HEXPIREAT key unix-time-seconds [NX | XX | GT | LT] <num-fields> <field [field ...]>
HPEXPIREAT key unix-time-milliseconds [NX | XX | GT | LT] <num-fields> <field [field ...]>

(Similar to EXPIRE, PEXPIRE, EXPIREAT, PEXPIREAT)

HPERSIST command

For each specified field: remove the expiration time.

HPERSIST key <num-fields> <field [field ...]>

HEXPIRETIME group

For each specified field: get the expiration time in sec/msec:

HEXPIRETIME key <num-fields> <field [field ...]>
HPEXPIRETIME key <num-fields> <field [field ...]>

(Similar to EXPIRETIME, PEXPIRETIME)

HTTL group

For each specified field: get the remaining time to live in sec/msec:

HTTL key <num-fields> <field [field ...]>
HPTTL key <num-fields> <field [field ...]>

(Similar to TTL, PTTL)

HGETF command (TODO)

For each specified field get its value and optionally set the field's expiration time in sec/msec /unix-sec/unix-msec:

HGETF key [NX | XX | GT | LT] [EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT unix-time-milliseconds | PERSIST] <FIELDS count field [field ...]>

(Similar to GETEX)

HSETF command (TODO)

For each specified field value pair: set field to value and optionally set the field's expiration time in sec/msec /unix-sec/unix-msec:

HSETF key [DC] [DCF | DOF] [NX | XX | GT | LT] [GETNEW | GETOLD] [EX seconds | PX milliseconds | EXAT unix-time-seconds | PXAT unix-time-milliseconds | KEEPTTL] <FVS count field value [field value …]>

Inspired by both HSET and SET.

Appendix A: Structure ExpireMeta

ExpireMeta struct should be embedded inside the item that needs to be stored in ebuckets:

typedef struct ExpireMeta { 
   /* 48bits of unix-time in msec.  This value is sufficient to represent, in 
    * unix-time, until the date of 02 August, 10889
    */
   uint32_t expireTimeLo;              /* Low bits of expireTime. */
   uint16_t expireTimeHi;              /* High bits of expireTime. */

   unsigned int lastInSegment    : 1;  /* Last item in segment. If set, then 'next' will
                                          point to NextSegHdr, unless lastItemBucket=1
                                          then it will point to segment header of the
                                          current segment. */
   unsigned int firstItemBucket  : 1;  /* First item in bucket. This flag assist
                                          to manipulate extended segments directly
                                          without the need to traverse from start
                                          the rax tree  */
   unsigned int lastItemBucket   : 1;  /* Last item in bucket. This flag assist
                                          to manipulate extended segments directly
                                          without the need to traverse from start
                                          the rax tree  */
   unsigned int numItems         : 5;  /* Only first item in segment will maintain
                                          this value. */

   unsigned int trash            : 1;  /* This flag indicates whether the ExpireMeta
                                          associated with the item is leftover.
                                          There is always a potential to reuse the
                                          item after removal/deletion. Note that,
                                          the user can still safely O(1) TTL lookup
                                          a given item and verify whether attached
                                          TTL is valid or leftover. See function
                                          ebGetExpireTime(). */

   unsigned int userData         : 3;  /* ebuckets can be used to store in same
                                          instance few different types of items,
                                          such as, listpack and hash. This field
                                          is reserved to store such identification
                                          associated with the item and can help
                                          to distinct on delete or expire callback.
                                          It is not used by ebuckets internally and
                                          should be maintained by the user */

   unsigned int reserved         : 4;

   void *next;                       /* - If not last item in segment then next
                                          points to next eItem (lastInSegment=0).
                                        - If last in segment but not last in
                                          bucket (lastItemBucket=0) then it
                                          points to next segment header.
                                        - If last in bucket then it points to
                                          current segment header (Can be either
                                          of type FirstSegHdr or NextSegHdr). */
} ExpireMeta; /* 16 bytes */

Appendix B: Enhance EXPIRE keyspace (Idea for future work)

Now that we have a functioning MSTR and ebuckets for hash fields, we can consider challenging the existing keyspace expiration implementation with the same approach and even take it further.

As a first step, we can try ebuckets instead of hashtable DS for EXPIRE. This also requires to modify SDS-keys to be based on kind of MSTR and attaching it ExpireMeta, just like we did for hash fields in this suggestion.

The next step might be to consider even to merge dictEntry and robj into keys which are kind of MSTR, after the previous step.

TODOs

As listpack not support yet, take care to config set hash-max-listpack-entries 0
before trying HFE API.

src/t_hash.c

src/mstr.h

src/networking.c

src/mstr.c

src/mstr.h

src/mstr.c

src/server.c

src/rax.h

src/t_hash.c

src/server.h

src/t_hash.c

src/ebuckets.c

sundb · 2024-04-13T15:19:52Z

we need to update db->hexpire when executing writable hash command or other commands that may affect hash.

flushdb, del, move, .etc
defrag
hset, hdel, and other may change the fields.
module API like HashSet.

sundb · 2024-04-13T15:24:34Z

I'm wondering why we introduced list ebucket which is only one per db.
It doesn't save a lot of memory overhead, but introduced move code complexity.

moticless · 2024-04-13T22:07:53Z

we need to update db->hexpire when executing writable hash command or other commands that may affect hash.
flushdb, del, move, .etc
defrag
hset, hdel, and other may change the fields.
module API like HashSet.

Hello @sundb,

I think that I took care to handle flush a/sync, del, move and cover with tests. Pls, lmk if something is missing.
defrag is something that i leftover in TODOs. Pls lmk if there is something that i need to handle early on.
HSET, HDEL are handled and got covered as well with tests.
module API uses same common API. It supposed to be ok. I didn't added tests for it. not yet.

You can review test tests at file: hash-field-expire.tcl. Thank you.

moticless · 2024-04-13T22:32:19Z

I'm wondering why we introduced list ebucket which is only one per db.
It doesn't save a lot of memory overhead, but introduced move code complexity.

@sundb , I am not sure that i understand the comment. There is no relation between the fact that there is only one per db and the fact that it adds "complexity". In fact the code is rather simple, doing unregister-and-register, from source to destination, to db->hexpires.

BTW, currently there is no real reason why db->expires is factored to be per slot, only as a preparation for future flush async per slot.

sundb · 2024-04-14T10:30:09Z

I think that I took care to handle flush a/sync, del, move and cover with tests. Pls, lmk if something is missing.

defrag is something that i leftover in TODOs. Pls lmk if there is something that i need to handle early on.

HSET, HDEL are handled and got covered as well with tests.

module API uses same common API. It supposed to be ok. I didn't added tests for it. not yet.

ohh, sorry, listpack-related code forgets to handle them.

src/ebuckets.c

tests/unit/type/hash-field-expire.tcl

src/t_hash.c

tests/unit/scan.tcl

src/db.c

src/dict.c

src/expire.c

src/ebuckets.c

src/commands/hpersist.json

Hash Field Expiration - Draft

14af031

sundb reviewed Mar 27, 2024

View reviewed changes

src/t_hash.c Show resolved Hide resolved

moticless marked this pull request as ready for review March 27, 2024 09:48

sundb reviewed Mar 28, 2024

View reviewed changes

kvstoreIteratorNext() wrongly reset iterator twice

f24444b

sundb reviewed Mar 29, 2024

View reviewed changes

src/server.c Show resolved Hide resolved

sundb reviewed Mar 29, 2024

View reviewed changes

src/rax.h Outdated Show resolved Hide resolved

moticless added 4 commits March 29, 2024 12:49

PR fixes

3270c4e

Fix printf of int64 to be portable

74f0ca6

Remove unused var

68cf357

fix unused function

dcf23aa

This was referenced Mar 31, 2024

Implement Expire on hash #167

Closed

Allow to set an expiration on hash field #1042

Closed

Insisting on allow hash field expiration #6620

Open

Can redis support expire time of the field in a hash table? #3192

Open

tezc reviewed Apr 1, 2024

View reviewed changes

src/t_hash.c Outdated Show resolved Hide resolved

tezc reviewed Apr 1, 2024

View reviewed changes

src/t_hash.c Outdated Show resolved Hide resolved

moticless added 2 commits April 2, 2024 17:33

PR fixes; hscan skip expired; OPEN: hrand can return expired

702c6eb

spell check

b07a195

sundb reviewed Apr 3, 2024

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

tezc reviewed Apr 5, 2024

View reviewed changes

src/t_hash.c Show resolved Hide resolved

moticless added 2 commits April 7, 2024 09:40

Add active-expire policy

f0c0481

Merge branch 'unstable' into hash-field-expiry

b2a4c38

moticless changed the base branch from unstable to hash-field-expiry-integ April 7, 2024 11:42

Fix HFE's reference to key (the same sds that is stored in keyspace)

ffac5a7

fadidahanna reviewed Apr 10, 2024

View reviewed changes

src/ebuckets.c Outdated Show resolved Hide resolved

sundb mentioned this pull request Apr 14, 2024

POC listpack hfe tezc/redis#21

Closed

tezc mentioned this pull request Apr 15, 2024

Hash Field Expiration - listpack support #13209

Merged

sundb reviewed Apr 16, 2024

View reviewed changes

src/ebuckets.c Outdated Show resolved Hide resolved

sundb reviewed Apr 16, 2024

View reviewed changes

tests/unit/type/hash-field-expire.tcl Outdated Show resolved Hide resolved

tests/unit/type/hash-field-expire.tcl Outdated Show resolved Hide resolved

moticless added 3 commits April 16, 2024 13:52

HLEN count expired. expireIfNeeded() reverted to ignore HFEs

fb921eb

PR fixes

b06208c

Support low precision buckets

b2a9222

tezc reviewed Apr 17, 2024

View reviewed changes

tests/unit/type/hash-field-expire.tcl Outdated Show resolved Hide resolved

tezc previously approved these changes Apr 17, 2024

View reviewed changes

Remove configuration: hash-field-expiry-bits

f1f715a

moticless dismissed tezc’s stale review via f1f715a April 18, 2024 06:05

sanitize minor fixes

676188a

tezc reviewed Apr 18, 2024

View reviewed changes

src/t_hash.c Outdated Show resolved Hide resolved

sundb reviewed Apr 18, 2024

View reviewed changes

tests/unit/scan.tcl Outdated Show resolved Hide resolved

sundb reviewed Apr 18, 2024

View reviewed changes

src/db.c Outdated Show resolved Hide resolved

src/dict.c Outdated Show resolved Hide resolved

src/dict.c Outdated Show resolved Hide resolved

src/expire.c Outdated Show resolved Hide resolved

src/expire.c Outdated Show resolved Hide resolved

moticless added 4 commits April 18, 2024 11:07

PR fixes

f0c2bf9

Fix mstrNewCopy() - if mstr to copy not include any metadata

5735743

Add try malloc to mstr

120e01d

Add hash-field-expire.tcl to test_helper.tcl

7088951

tezc previously approved these changes Apr 18, 2024

View reviewed changes

hash-field-expire tagging with external:skip needs:debug

20bb8c6

moticless dismissed tezc’s stale review via 20bb8c6 April 18, 2024 12:11

sundb reviewed Apr 18, 2024

View reviewed changes

src/ebuckets.c Outdated Show resolved Hide resolved

src/ebuckets.c Outdated Show resolved Hide resolved

refine ebuckets perf test

35ed9be

sundb approved these changes Apr 18, 2024

View reviewed changes

moticless merged commit c18ff05 into redis:hash-field-expiry-integ Apr 18, 2024
14 checks passed

tishun mentioned this pull request Apr 19, 2024

Hash Field Expiration Support redis/lettuce#2834

Open

sundb reviewed Apr 22, 2024

View reviewed changes

src/commands/hpersist.json Show resolved Hide resolved

PragmaTwice mentioned this pull request Apr 22, 2024

Add support of hash field expiration apache/kvrocks#2269

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hash Field Expiration - Draft #13172

Hash Field Expiration - Draft #13172

moticless commented Mar 26, 2024 •

edited

sundb commented Apr 13, 2024

sundb commented Apr 13, 2024

moticless commented Apr 13, 2024

moticless commented Apr 13, 2024 •

edited

sundb commented Apr 14, 2024

Hash Field Expiration - Draft #13172

Hash Field Expiration - Draft #13172

Conversation

moticless commented Mar 26, 2024 • edited

Abstract

Overview

Active Expiration

New DS: ebuckets

Splitting bucket

Extending bucket

Memory evaluation

Use case: ebuckets contains few items. No more than EB_LIST_MAX_ITEMS

Use case: Most items are removed via Active-Expire

Use case: Items removed NOT only by Active-Expire

Use case: All items with same expiration-time

Performance Evaluation

MSTR (immutable string with metadata)

Kinds of MSTR

Alignment issues

Put the pieces together

Lazy Expiration

Extend expireIfNeeded() to support HFE

Commands that require careful execution

HGETALL

HLEN (OPEN)

HRANDFIELD (OPEN)

HSCAN

API

HEXPIRE group

HPERSIST command

HEXPIRETIME group

HTTL group

HGETF command (TODO)

HSETF command (TODO)

Appendix A: Structure ExpireMeta

Appendix B: Enhance EXPIRE keyspace (Idea for future work)

TODOs

sundb commented Apr 13, 2024

sundb commented Apr 13, 2024

moticless commented Apr 13, 2024

moticless commented Apr 13, 2024 • edited

sundb commented Apr 14, 2024

moticless commented Mar 26, 2024 •

edited

Use case: ebuckets contains few items. No more than `EB_LIST_MAX_ITEMS`

moticless commented Apr 13, 2024 •

edited