Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Caching #16

Open
mgrider opened this issue Sep 5, 2022 · 3 comments
Open

Feature request: Caching #16

mgrider opened this issue Sep 5, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@mgrider
Copy link
Contributor

mgrider commented Sep 5, 2022

Library Performance

I just want to get a discussion started here around the topic. For my use (a 2-player abstract strategy game with an AI opponent), I need to call some functions on this library hundreds (maybe thousands) of times per second. In general, most of these calls are re-creating their return values every time, even though they are always the same as long as the set of Cell objects hasn't changed. For those functions, the return value can be cached to gain massive speed-up, which I would like the library to do.

Some cache examples

allCellsCoordinates()

This function was taking up the second most time in my project, and currently looks like this:

    public func allCellsCoordinates() -> Set<CubeCoordinates> {
        return Set(self.cells.map { $0.coordinates })
    }

My initial instinct was to use a standard for in loop to improve the speed, like this:

    public func allCellsCoordinates() -> Set<CubeCoordinates> {
        var  set = Set<CubeCoordinates>()
        for cell in cells {
            set.insert(cell.coordinates)
        }
        return set
    }

I wrote the following tests:

    func testAllCellCoordinatesMap() throws {
        let grid = HexGrid(
            shape: GridShape.hexagon(20))
        measure() {
            for _ in 0...10000 {
                _ = grid.allCellsCoordinatesMap()
            }
        }
    }

    func testAllCellCoordinatesUncached() throws {
        let grid = HexGrid(
            shape: GridShape.hexagon(20))
        measure() {
            for _ in 0...10000 {
                _ = grid.allCellsCoordinatesUncached()
            }
        }
    }

The original (.map version) runs in 2.522 sec, and the second almost twice as fast at 1.803 sec. Oddly enough, when I search for whether .map is slow in swift, the consensus in various articles seems to be that it's actually faster than a standard for in loop, so I'm not sure exactly what's happening here. It occurred to me shortly after the re-write that I should use a cache variable for this value (as I'd already done for the primary bottleneck, which I'll describe next).

That function now looks like this:

    /// A cache variable for all coordinates in all cells.
    private var cacheAllCellsCoordinates = Set<CubeCoordinates>()

    /// Coordinates of all available grid cells
    /// - Returns: `Set<CubeCoordinates>`
    public func allCellsCoordinates() -> Set<CubeCoordinates> {
        if !cacheAllCellsCoordinates.isEmpty {
            return cacheAllCellsCoordinates
        } else {
            var  set = Set<CubeCoordinates>()
            for cell in cells {
                set.insert(cell.coordinates)
            }
            cacheAllCellsCoordinates = set
            return set
        }
    }

With a similar test to the ones above, (running it 10,000 times), it takes 0.003 sec, and is no longer a noticeable function when running in the profiler.

neighborsCoordinates(for coordinates:)

As I said, this function was the primary bottleneck in my AI code (after a bit of refactoring to use coordinates everywhere instead of cells--which I needed to do anyway). Here's how it looked originally:

    public func neighborsCoordinates(for coordinates: CubeCoordinates) throws -> Set<CubeCoordinates> {
        return try Math.neighbors(for: coordinates).filter { isValidCoordinates($0) }
    }

I decided to implement a cache solution right away, and that looks like this:

    /// A cache/LUT variable. The thinking here is, unless our `Cell` set changes, the neighbors are always going to be the same.
    private var cacheNeighborsCoordinatesForCoordinates = [CubeCoordinates: Set<CubeCoordinates>]()

    /// Get all available neighbor coordinates for specified coordinates
    /// - Parameter coordinates: `CubeCoordinates`
    /// - Returns: `Set<CubeCoordinates>`
    /// - Throws: `InvalidArgumentsError` in case underlying cube coordinates initializer propagate the error.
    public func neighborsCoordinates(for coordinates: CubeCoordinates) throws -> Set<CubeCoordinates> {
        if let neighbors = cacheNeighborsCoordinatesForCoordinates[coordinates] {
            return neighbors
        } else {
            let neighbors = try Math.neighbors(for: coordinates).filter { isValidCoordinates($0) }
            cacheNeighborsCoordinatesForCoordinates[coordinates] = neighbors
            return neighbors
        }
    }

I also added a new function to invalidate the cache variables that gets called after updatePixelDimensions in the cells didSet:

    /// This gets called whenever the `Cell` set is updated.
    private func invalidateCacheVariables() {
        cacheAllCellsCoordinates = Set<CubeCoordinates>()
        cacheNeighborsCoordinatesForCoordinates = [CubeCoordinates: Set<CubeCoordinates>]()
    }

I wrote some tests for this as well, you can see them on my fork. I've commented out the original test, because even running the loop (calling it on all the cell coordinates in a hex20 grid 10 times), takes longer than I want to wait. (More than a few minutes.)

alternatives / discussion

I can imagine some legitimate reasons you might object to these sorts of changes in the library, so I wanted to outline the changes I've already made before I tackle any others. (For my purposes, it may already be "fast enough", but some additional testing will be needed.) I do feel like, if some function return values are cached, maybe as many as can be should be. That's the main reason I'm creating this issue. Should I go ahead and write cache variables for the others where it makes sense?

If you aren't interested in this type of improvement, then I will definitely not bother until I need it. (And in that case, I'd probably just maintain my own fork of the library for my own use.

@fananek
Copy link
Owner

fananek commented Sep 5, 2022

It is a good point. In AI opponent programming, some kind of caching is inevitable. Especially for recursive algorithms. Even though caching wasn't originally planned to be part of the library, it could be an interesting feature.

While caching brings performance benefits, it also raises some issues. We should think a bit about:

  • what shall be cached
  • cache invalidation
  • memory usage
  • thread safety
  • cache storage (memory, disk)

If we want to implement caching mechanism, I would use some more generic and reusable solution. For example, LRU/LFU algorithm sounds like a potential fit. It uses a priority queue which is already in place (used by an a-star algorithm)

Here is one more reading. Just for reference.

@fananek fananek added the enhancement New feature or request label Sep 5, 2022
@mgrider
Copy link
Contributor Author

mgrider commented Sep 6, 2022

Hi František, Yes, I see the benefits of using LRU/LFU immediately. I'll update my implementation to use one of those, and reply here when I have something to look at and/or review. Thanks!

@mgrider
Copy link
Contributor Author

mgrider commented Sep 7, 2022

Looking a bit closer at this... it's not so simple as I'd at first imagined. The values that we'll be caching are different types of thing. I could imagine using a cache object for the neighbors example above, for instance, but then you'd need a different cache for allCellsCoordinates. I could imagine making it generic somehow, so both would be stored in the same cache, but then you've got wildly different "sized" objects stored. I'm not really sure what makes sense, at this point.

I did find an additional article with a more complete example (but pretty similar to the second reference you posted above).

@fananek fananek changed the title Feature request: Performance Feature request: Caching Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants