Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The fixed-size allocation in v7 restricts many previously possible use cases #208

Closed
MartinKolarik opened this issue Mar 9, 2022 · 9 comments · Fixed by #210
Closed

Comments

@MartinKolarik
Copy link

I've already seen #193 which explains why max option is mandatory in v7 and the explanation made me wonder if the allocation is done to the max capacity right away, or in some increments. Looking at the code, it seems that it really allocates all memory based on max right away, which seems rather unfortunate (and something that should be also documented).

I understand that the module is called "LRU" cache, so it makes some sense for it to enforce max capacity (without which it isn't really LRU), but since it historically provided a lot of extra functionality on top, and still provides the maxAge option, another common usage pattern I've seen is using only maxAge without setting max for time-based caching. Particularly in apps where you need a mix of LRU and time-based caching, it's very useful to have both provided by the same module with the same API.

A simple workaround many people will probably think of is setting a very high value for max but that's not good because of the upfront allocation. To keep the advantages of the new approach, would it make sense to implement optional resizing so that people could opt to dynamic sizing with occasional resize cost (similar to how e.g. c++ containers work)?

@isaacs
Copy link
Owner

isaacs commented Mar 10, 2022

I could see how that would be useful, and it would be interesting and fun to implement. However, it's also quite a footgun.

The performance improvements in version 7 comes largely from the fact that allocation is done all up front, and thus gc is kept to a minimum in hot paths, there are always a deterministic number of reads and writes, never any run-time pauses, and so on. (This is actually somewhat a problem with c++ containers for performance critical applications, as well.) I'm not willing to give up the performance benefits of fixed-size allocation by default for users calling it with a set max, since that is the primary purpose here, so any resizing has to be opt-in.

There are two general approaches I can think of that could be explored: resizing and chaining. In both cases, we'll definitely want to set an "absolute max" size, at which point we refuse to keep growing, and should require that there be a time limit, so that we can have some sense of when to downsize (otherwise the cache size ends up sitting forever at the maximum, so why not just use a big max and get the speed benefit, you know?)

The largest possible absolute max would be 2^32, or else unsigned integer indexes don't work anymore. And whatever the absolute max is set to, we'll have to use a big enough uint type to handle it. (Eg, new LRU({max: 257}) doesn't just require a few bytes more than new LRU({ max: 256 }), but rather roughly double, because it has to use Uint16Array instead of Uint8Array.) (We do currently use Array when the size is above 2^32, but tbh, that use case is bonkers.)

  1. resizing: whenever we would normally call evict(), we instead double this.max, re-allocate the next and prev typed arrays, free stack, and key and value arrays. Copy all the existing contents of the arrays, and voila, we've got more space.
  2. chaining: when we resize up, instead of allocating a whole new set of arrays, we just alloc a single new set of arrays of this.max size, and add it to the set. Then we have to do math on the indexes. So if the initial size is 5, then we add another set of arrays of size 5, then a set of arrays of size 10, I'd want to be able to keep the indexes consistent, so we'd have to know indexes 0-4 are in the first list, 5-9 in the second, 10-19 in the third. Ok, not so bad.

Resizing down is trickier in either case. Let's say whenever this.size drops to the current (expanded) max/4, we drop the size down to max/2 (so we don't just have to do another expand right away when the next set happens -- this can be tunable). In that case, we have to have a way to walk all the indexes greater than this number, and have a way to reposition them so that the order of the linked list is retained, but with the data at a new index. (Ie, move the key/value position, and set const n = this.next[oldindex], p = this.prev[oldIndex]; this.next[p] = newIndex; this.prev[n] = newIndex; this.next[newIndex] = n; this.prev[newIndex] = p, and this.keyMap.set(key, newIndex).)

Then, once it's compressed (which will be O(n) at best, no way around that), then in the resize case we alloc new smaller arrays and cut over, and in the chain case, we just pop the largest item off the stack.

The chaining approach sounds more promising, but it does add a lot of complexity (which can't really be avoided, even in the non-resizing case, unless it's just a whole different module). That complexity will slow it down, because now we are adding an extra bit of math every time we access an index, instead of just getting an integer and knowing that's the final answer. So that's really not a good idea either, I guess.

Another approach to the downsizing (which isn't nearly as space-efficient) in the chaining case, would be to keep a count of how many entries are indexed in each range, and drop the buffer at the end of the list only when it is empty. We'd still have to remove those indexes from the free stack, of course, but it would be zero-copy.

Yet another approach, which would not be as fast, I guess, would be to just not use typedarrays at all, don't do any up-front allocation, and rely on the fact that normal javascript Arrays can grow unbounded. That saves most of the complexity, but at the cost of predictably fast behavior. (When you hit the limit of V8's Array optimization, it'll get surprisingly slow.) And you'll most likely want to have autoPrune: true set, so that old items are preemptively removed, unless you want to just eat all the memory forever. Could do that right now, I guess, because neither the resizing or

TBH, if you really want a boundless time-limited cache, that's maybe a different thing, and you should use a module purpose built for that. Probably the cleanest way to do this would be to implement a pure time-based cache, with no max size and no up-front allocation, autoPrune always on, and Arrays instead of uint typed arrays for index tracking. It won't be "blazing fast cache to speed up anything", but it'd still perhaps be "pretty good cache to speed up ttl-based things". Maybe lru-cache needs a sibling ;)

@isaacs
Copy link
Owner

isaacs commented Mar 10, 2022

Supporting unbound behavior isn't actually that hard, now that I really look at it. Just have to use Array instead of TypedArray everywhere, and it means evict() never gets called. Massive footgun, though.

Maybe this is a dumb question, but in those cases where you'd be having an unbound storage space cache with a ttl, is the recency of use order information actually important? Because this would be about as good performance as you could hope for (significantly better than lru-cache, in fact), and seem to do everything you'd need:

class TTLMap extends Map {
  constructor ({ ttl }) {
    if (!ttl || typeof ttl !== 'number' || ttl !== Math.floor(ttl) || ttl < 1) {
      throw new TypeError('ttl must be positive integer')
    }
    super()
    this.ttl = ttl
    this.timers = new Map()
  }
  set (k, v, ttl = this.ttl) {
    if (this.timers.has(k)) {
      clearTimeout(this.timers.get(k))
    }
    const timer = setTimeout(() => this.delete(k), ttl)
    if (timer && timer.unref) timer.unref()
    this.timers.set(k, timer)
    return super.set(k, v)
  }
  delete (k) {
    if (this.timers.has(k)) {
      clearTimeout(this.timers.get(k))
      this.timers.delete(k)
    }
    return super.delete(k)
  }
}

@MartinKolarik
Copy link
Author

Maybe this is a dumb question, but in those cases where you'd be having an unbound storage space cache with a ttl, is the recency of use order information actually important? Because this would be about as good performance as you could hope for (significantly better than lru-cache, in fact), and seem to do everything you'd need:

That's a very good question, actually. I'd say that in those cases, the order doesn't really matter and this simple TTLMap would work just fine. I suppose the main reason for using lru-cache in these cases was that it was already used in the projects for actual LRU stuff too, and could do this in the previous versions.

@MartinKolarik
Copy link
Author

I suppose that makes it clear that the best solution is really pointing to another module for that use case, too bad ttl-cache is already taken on npm 😄

@isaacs
Copy link
Owner

isaacs commented Mar 10, 2022

Thinking about this a little bit more...

I think if you are primarily concerned with tracking age of entries, then wouldn't you want for (const value in cache.values()) to iterate over them in order of age? (Or more precisely, in order of remaining time to live, I suppose?)

I was thinking maybe an optimal data structure would be a B-tree mapping the expiration time to the entries expiring at at that time, and then remembered that V8 sorts objects with integer keys numerically.

> a = { 3: 'c', 2: 'b' }
{ '2': 'b', '3': 'c' }
> a[1] = 'a'
'a'
> a
{ '1': 'a', '2': 'b', '3': 'c' }

That's (i believe?) an unspecified impl detail, but it's been around long enough it'll probably break the internet if they ever change it.

So with that, you could keep a null-proto object mapping {[expiration time]: [keys, ...]}, and only create one purge timer per expiration time, or even debounce purges on some set interval, and use it for iteration ordering as well.

*keys () {
  for (const [exp, keys]] of Object.entries(this.expirations)) {
    for (const k of keys) {
      yield k
    }
  }
}
purgeStale () {
  for (const [exp, keys] of Object.keys(this.expirations)) {
    if (exp < now()) {
      for (const k of keys) {
        this.delete(k)
      }
    }
  }
}

@MartinKolarik
Copy link
Author

I think if you are primarily concerned with tracking age of entries, then wouldn't you want for (const value in cache.values()) to iterate over them in order of age? (Or more precisely, in order of remaining time to live, I suppose?)

I don't think I ever needed that but it does make sense. Particularly for purging - having a separate timer per key is something I wasn't really sure about performance-wise but I see lru-cache also does this, at least with ttlAutopurge. Without benchmarking, my assumption is that interval-based purging may have less overhead if you can afford to keep stale items around for a while?

@isaacs
Copy link
Owner

isaacs commented Mar 10, 2022

Without benchmarking, my assumption is that interval-based purging may have less overhead if you can afford to keep stale items around for a while?

I wouldn't do it as a fixed interval, but more like a debounce. So if you know a purge is coming soon-ish, you just wait for that, but only schedule when needed. It's a bit of extra bookkeeping, so I skipped it, but if someone asks, it's not too hard to implement, and might improve things somewhat.

It's not so bad to assign a bunch of timers to fire at the same ms, because Node stacks them up internally anyway under a single Timer handle. I guess if you have enough of them to matter, you probably have other problems 😅

isaacs added a commit that referenced this issue Mar 10, 2022
This also prevents setting size=0 if maxSize is set, since that is a
recipe for disaster.

At least one of max, maxSize, or ttl MUST be set, to prevent unbounded
growth of the cache.  And really, without ttlAutopurge, it's effectively
unsafe and unbounded in that case anyway, *especially* if allowStale is
set.  This is potentially "unsafe at any speed" territory, so it emits a
process warning in that case.

If max is not set, then regular Array is used to track items, without
setting an initial Array capacity.  This will often perform much worse,
but in many cases, it's not so bad.  Bigger hazard is definitely
unbounded memory consumption.

Fix: #208
isaacs added a commit that referenced this issue Mar 10, 2022
This also prevents setting size=0 if maxSize is set, since that is a
recipe for disaster.

At least one of max, maxSize, or ttl MUST be set, to prevent unbounded
growth of the cache.  And really, without ttlAutopurge, it's effectively
unsafe and unbounded in that case anyway, *especially* if allowStale is
set.  This is potentially "unsafe at any speed" territory, so it emits a
process warning in that case.

If max is not set, then regular Array is used to track items, without
setting an initial Array capacity.  This will often perform much worse,
but in many cases, it's not so bad.  Bigger hazard is definitely
unbounded memory consumption.

Fix: #208
isaacs added a commit that referenced this issue Mar 17, 2022
This also prevents setting size=0 if maxSize is set, since that is a
recipe for disaster.

At least one of max, maxSize, or ttl MUST be set, to prevent unbounded
growth of the cache.  And really, without ttlAutopurge, it's effectively
unsafe and unbounded in that case anyway, *especially* if allowStale is
set.  This is potentially "unsafe at any speed" territory, so it emits a
process warning in that case.

If max is not set, then regular Array is used to track items, without
setting an initial Array capacity.  This will often perform much worse,
but in many cases, it's not so bad.  Bigger hazard is definitely
unbounded memory consumption.

Fix: #208
@isaacs isaacs closed this as completed in c29079d Mar 17, 2022
@isaacs
Copy link
Owner

isaacs commented Apr 7, 2022

If what you really want is a TTL cache that is more focused on time than storage space and recency of use, you may be interested in using https://www.npmjs.com/package/@isaacs/ttlcache

I haven't benchmarked it extensively, but it was possible to make the overall algorithm much simpler, by throwing away all of the use-recency tracking, and leveraging the fact that V8 has some absurdly good optimizations for null-prototype objects with integer-numeric keys, which it keeps sorted.

@MartinKolarik
Copy link
Author

Looks good, will give it a try in a few places!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants