Feature request: Caching #16

mgrider · 2022-09-05T20:28:24Z

Library Performance

I just want to get a discussion started here around the topic. For my use (a 2-player abstract strategy game with an AI opponent), I need to call some functions on this library hundreds (maybe thousands) of times per second. In general, most of these calls are re-creating their return values every time, even though they are always the same as long as the set of Cell objects hasn't changed. For those functions, the return value can be cached to gain massive speed-up, which I would like the library to do.

Some cache examples

`allCellsCoordinates()`

This function was taking up the second most time in my project, and currently looks like this:

    public func allCellsCoordinates() -> Set<CubeCoordinates> {
        return Set(self.cells.map { $0.coordinates })
    }

My initial instinct was to use a standard for in loop to improve the speed, like this:

    public func allCellsCoordinates() -> Set<CubeCoordinates> {
        var  set = Set<CubeCoordinates>()
        for cell in cells {
            set.insert(cell.coordinates)
        }
        return set
    }

I wrote the following tests:

    func testAllCellCoordinatesMap() throws {
        let grid = HexGrid(
            shape: GridShape.hexagon(20))
        measure() {
            for _ in 0...10000 {
                _ = grid.allCellsCoordinatesMap()
            }
        }
    }

    func testAllCellCoordinatesUncached() throws {
        let grid = HexGrid(
            shape: GridShape.hexagon(20))
        measure() {
            for _ in 0...10000 {
                _ = grid.allCellsCoordinatesUncached()
            }
        }
    }

The original (.map version) runs in 2.522 sec, and the second almost twice as fast at 1.803 sec. Oddly enough, when I search for whether .map is slow in swift, the consensus in various articles seems to be that it's actually faster than a standard for in loop, so I'm not sure exactly what's happening here. It occurred to me shortly after the re-write that I should use a cache variable for this value (as I'd already done for the primary bottleneck, which I'll describe next).

That function now looks like this:

    /// A cache variable for all coordinates in all cells.
    private var cacheAllCellsCoordinates = Set<CubeCoordinates>()

    /// Coordinates of all available grid cells
    /// - Returns: `Set<CubeCoordinates>`
    public func allCellsCoordinates() -> Set<CubeCoordinates> {
        if !cacheAllCellsCoordinates.isEmpty {
            return cacheAllCellsCoordinates
        } else {
            var  set = Set<CubeCoordinates>()
            for cell in cells {
                set.insert(cell.coordinates)
            }
            cacheAllCellsCoordinates = set
            return set
        }
    }

With a similar test to the ones above, (running it 10,000 times), it takes 0.003 sec, and is no longer a noticeable function when running in the profiler.

`neighborsCoordinates(for coordinates:)`

As I said, this function was the primary bottleneck in my AI code (after a bit of refactoring to use coordinates everywhere instead of cells--which I needed to do anyway). Here's how it looked originally:

    public func neighborsCoordinates(for coordinates: CubeCoordinates) throws -> Set<CubeCoordinates> {
        return try Math.neighbors(for: coordinates).filter { isValidCoordinates($0) }
    }

I decided to implement a cache solution right away, and that looks like this:

    /// A cache/LUT variable. The thinking here is, unless our `Cell` set changes, the neighbors are always going to be the same.
    private var cacheNeighborsCoordinatesForCoordinates = [CubeCoordinates: Set<CubeCoordinates>]()

    /// Get all available neighbor coordinates for specified coordinates
    /// - Parameter coordinates: `CubeCoordinates`
    /// - Returns: `Set<CubeCoordinates>`
    /// - Throws: `InvalidArgumentsError` in case underlying cube coordinates initializer propagate the error.
    public func neighborsCoordinates(for coordinates: CubeCoordinates) throws -> Set<CubeCoordinates> {
        if let neighbors = cacheNeighborsCoordinatesForCoordinates[coordinates] {
            return neighbors
        } else {
            let neighbors = try Math.neighbors(for: coordinates).filter { isValidCoordinates($0) }
            cacheNeighborsCoordinatesForCoordinates[coordinates] = neighbors
            return neighbors
        }
    }

I also added a new function to invalidate the cache variables that gets called after updatePixelDimensions in the cells didSet:

    /// This gets called whenever the `Cell` set is updated.
    private func invalidateCacheVariables() {
        cacheAllCellsCoordinates = Set<CubeCoordinates>()
        cacheNeighborsCoordinatesForCoordinates = [CubeCoordinates: Set<CubeCoordinates>]()
    }

I wrote some tests for this as well, you can see them on my fork. I've commented out the original test, because even running the loop (calling it on all the cell coordinates in a hex20 grid 10 times), takes longer than I want to wait. (More than a few minutes.)

alternatives / discussion

I can imagine some legitimate reasons you might object to these sorts of changes in the library, so I wanted to outline the changes I've already made before I tackle any others. (For my purposes, it may already be "fast enough", but some additional testing will be needed.) I do feel like, if some function return values are cached, maybe as many as can be should be. That's the main reason I'm creating this issue. Should I go ahead and write cache variables for the others where it makes sense?

If you aren't interested in this type of improvement, then I will definitely not bother until I need it. (And in that case, I'd probably just maintain my own fork of the library for my own use.

The text was updated successfully, but these errors were encountered:

fananek · 2022-09-05T22:18:43Z

It is a good point. In AI opponent programming, some kind of caching is inevitable. Especially for recursive algorithms. Even though caching wasn't originally planned to be part of the library, it could be an interesting feature.

While caching brings performance benefits, it also raises some issues. We should think a bit about:

what shall be cached
cache invalidation
memory usage
thread safety
cache storage (memory, disk)

If we want to implement caching mechanism, I would use some more generic and reusable solution. For example, LRU/LFU algorithm sounds like a potential fit. It uses a priority queue which is already in place (used by an a-star algorithm)

Here is one more reading. Just for reference.

mgrider · 2022-09-06T14:44:31Z

Hi František, Yes, I see the benefits of using LRU/LFU immediately. I'll update my implementation to use one of those, and reply here when I have something to look at and/or review. Thanks!

mgrider · 2022-09-07T04:45:45Z

Looking a bit closer at this... it's not so simple as I'd at first imagined. The values that we'll be caching are different types of thing. I could imagine using a cache object for the neighbors example above, for instance, but then you'd need a different cache for allCellsCoordinates. I could imagine making it generic somehow, so both would be stored in the same cache, but then you've got wildly different "sized" objects stored. I'm not really sure what makes sense, at this point.

I did find an additional article with a more complete example (but pretty similar to the second reference you posted above).

fananek added the enhancement New feature or request label Sep 5, 2022

fananek changed the title ~~Feature request: Performance~~ Feature request: Caching Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Caching #16

Feature request: Caching #16

mgrider commented Sep 5, 2022

fananek commented Sep 5, 2022 •

edited

Loading

mgrider commented Sep 6, 2022

mgrider commented Sep 7, 2022

Feature request: Caching #16

Feature request: Caching #16

Comments

mgrider commented Sep 5, 2022

Library Performance

Some cache examples

allCellsCoordinates()

neighborsCoordinates(for coordinates:)

alternatives / discussion

fananek commented Sep 5, 2022 • edited Loading

mgrider commented Sep 6, 2022

mgrider commented Sep 7, 2022

`allCellsCoordinates()`

`neighborsCoordinates(for coordinates:)`

fananek commented Sep 5, 2022 •

edited

Loading