Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory utilization grows significantly beyond max_capacity() under load #72

Closed
chriskyndrid opened this issue Jan 18, 2022 · 8 comments
Closed
Assignees
Labels
bug Something isn't working enhancement New feature or request
Milestone

Comments

@chriskyndrid
Copy link

Thank you for the excellent work on this interesting project. I've implemented Moka in a project I'm working on to utilize as a memory based cache. I'm passing a struct to bincode, encoding it, and storing it the cache, then running the reverse as needed. The system is a web services that utilizes Actix. In practice things are working fine.

However, when I ran a load generating program wrk against a sample call, the memory utilization of the cache increases considerably and does not appear to respect the max_capacity() setting. It does not decrease after load generation completes and subsequent wrk tests continue to increase the memory footprint likely indefinitely.

As an example:

lazy_static! {
    static ref MOKA_CACHE: MCache<Uuid, CacheItem> = MCache::builder()
        .weigher(|_key, value: &CacheItem| -> u32 {
            //let size = size_of_val(&*value.data) as u32;
            let size = value.data.len().try_into().unwrap_or(u32::MAX);
            size
        })
        .max_capacity(
                 256
                * 1024
                * 1024
        )
        .build();
}

1:
At the start of the load generation:

Top Indicates around 15M of memory utilization:

6488 erp_service_acco 0.0 00:01.19 29 0 50 15M

2:
Executing a wrk request yields

wrk -d 10 -t 8 -c 100 http://localhost:8080/v1/account/customer/test/1

Running 10s test @ http://localhost:8080/v1/account/customer/test/1
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.16ms    1.79ms  48.82ms   99.58%
    Req/Sec    11.23k     0.92k   13.45k    83.66%
  902635 requests in 10.10s, 314.20MB read
Requests/sec:  89370.39
Transfer/sec:     31.11MB

Top now indicates around 449M of memory utilization

6488 erp_service_acco 0.0 01:19.42 45 0 66 449M

3
Executing a longer wrk request again

wrk -d 100 -t 8 -c 100 http://localhost:8080/v1/account/customer/test/1
Running 2m test @ http://localhost:8080/v1/account/customer/test/1
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.97ms   12.33ms 306.14ms   92.16%
    Req/Sec     7.77k     4.35k   14.75k    71.28%
  6184606 requests in 1.67m, 2.10GB read
Requests/sec:  61818.19
Transfer/sec:     21.52MB

Top now indicates around 933M of memory utilization

6488 erp_service_acco 0.0 15:53.12 46 0 67 933M

Memory utilization continues to grow into the GB+ range. Based on the max_capacity setting above and the documentation in the readme, I would expect the footprint to be around 256MiB + Actix.

In order to remove Actix as a potential component here, I ran these same tests without any calls against my code that interacts with Moka. Actix does not grow beyond a 21MB footprint.

Data is being inserted into the cache like so:

        let encoded = bincode::serde::encode_to_vec(&cache_data, config)?;
        let cache_item = CacheItem::new(encoded, 120)?;
        cache.insert(key.clone(), cache_item).await;

and retrieved like so:

  let cache_item = match cache.get(key) {
            None => {
                /* removed for simplicity */
            }
            Some(item) => {
                /* removed for simplicity */
                item
            }
        };

        let (decoded, _len): (T, usize) =
            bincode::serde::decode_from_slice(&cache_item.data[..], config)?;

I also repeated the tests without the interaction with bincode.

Is there something I'm misunderstanding on how to restrict the memory utilization of Moka via the max_capacity() (or another setting) configuration?

Any thoughts on this would be greatly appreciated.

@chriskyndrid
Copy link
Author

Here is test code that should allow you to reproduce what I assume is a memory leak somewhere:

pub type AsyncResult<T> = std::result::Result<T, Box<dyn std::error::Error + Send + Sync>>;

use std::error::Error;
use std::fmt;
use actix_web::{App, HttpResponse, HttpServer, Scope, web};
use bincode::config::Configuration;
use moka::future::Cache;
use serde::de::DeserializeOwned;
use serde::Serialize;
use uuid::Uuid;

#[macro_use]
extern crate actix_web;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate serde;

lazy_static! {
    static ref MOKA_CACHE: Cache<Uuid, Vec<u8>> = Cache::builder()
        .weigher(|_key, value: &Vec<u8>| -> u32 {
            value.len().try_into().unwrap_or(u32::MAX)
        })
        .max_capacity(
                256
                * 1024
                * 1024
        )
        .build();
}

#[derive(Debug)]
struct MyError(String);

impl fmt::Display for MyError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self.0)
    }
}

impl Error for MyError {}

pub fn api_config_v1(cfg: &mut web::ServiceConfig) {
    cfg.service(web::scope("/v1").service(api_config_v1_members()));
}

pub fn api_config_v1_members() -> Scope {
    web::scope("/test").service(
        web::scope("/moka")
            .service(leak)
    )
}

#[actix_web::main]
async fn main() -> AsyncResult<()> {
    let app = HttpServer::new(move || {
        App::new()
            .configure(api_config_v1)
    }).bind("127.0.0.1:8080")?;

    app.run().await?;

    Ok(())
}

#[get("/leak")]
pub async fn leak() -> HttpResponse {
    let cache_item = TestStruct {
        name: "Some name".to_owned()
    };
    let id = Uuid::new_v4();
    set_data::<TestStruct>(&id, &cache_item).await;
    get_data::<TestStruct>(&id).await;

    let response = TestResponse {
        complete: true
    };

    HttpResponse::Ok()
        .content_type("application/json")
        .body(serde_json::to_string(&response).unwrap())
}

#[derive(Eq, PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct TestStruct {
    name: String,
}

#[derive(Serialize)]
pub struct TestResponse {
    complete: bool,
}

pub async fn set_data<T>(key: &Uuid, cache_data: &T) -> AsyncResult<bool>
    where
        T: Send + Sync + Serialize,
{
    let config = Configuration::standard();
    let cache = MOKA_CACHE.clone();
    let encoded = bincode::serde::encode_to_vec(&cache_data, config)?;
    cache.insert(key.clone(), encoded).await;
    Ok(true)
}

pub async fn get_data<T>(key: &Uuid) -> AsyncResult<T>
    where
        T: Send + Sync + DeserializeOwned,
{
    let config = Configuration::standard();
    let cache = MOKA_CACHE.clone();
    let cache_item = match cache.get(key) {
        None => {
            return Err(Box::new(MyError("Cache item not found".into())));
        }
        Some(item) => {
            item
        }
    };

    let (decoded, _len): (T, usize) =
        bincode::serde::decode_from_slice(&cache_item[..], config)?;

    Ok(decoded)
}

and cargo.toml

[package]
name = "moka_memory_leak_example"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
actix-web = {version = "4.0.0-beta.20", features=["rustls"]}

[dependencies.moka]
version = "0.7.1"
features = ["future"]

[dependencies.lazy_static]
version = "1.4.0"

[dependencies.bincode]
features = ["serde"]
git = "https://github.com/bincode-org/bincode"

[dependencies.serde]
version = "1"
features = ["derive", "serde_derive"]

[dependencies.serde_json]
version = "1"

[dependencies.uuid]
version = "0.8"
features = ["serde", "v4", "v5"]

[dependencies.tokio]
version = "1.15.0"
features = ["full"]

and for generating the work, you can use wrk

wrk -d 100 -t 8 -c 100 http://localhost:8080/v1/test/moka/leak

On my machine this results in 2.53GB memory utilization.

@tatsuya6502 tatsuya6502 self-assigned this Jan 18, 2022
@tatsuya6502
Copy link
Member

Hi. Thank you for reporting the issue with the test program. I ran the program and generated loads by running wrk for 30 minutes. I confirmed the issue and will investigate further.

Here are what I have done so far:

Test 1:

Ran wrk for 30 minutes and recorded the memory consumption of the program by running top command every 15 seconds. Found that the RES value stopped growing after it reached to 12.5GiB.

Result 1: (Click to expand)
$ head -7 top.log
top - 19:36:57 up 1 day, 12:06,  2 users,  load average: 0.81, 0.45, 0.38
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.6 us,  0.0 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  32003.3 total,  25387.5 free,   5327.9 used,   1288.0 buff/cache
MiB Swap:  16384.0 total,  16384.0 free,      0.0 used.  26103.0 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
$ rg -N gh top.log
1127142 tatsuya   20   0  210788   2992   2484 S   0.0   0.0   0:00.22 gh72
1127142 tatsuya   20   0  210788   2992   2484 S   0.0   0.0   0:00.22 gh72
1127142 tatsuya   20   0  489784 128980   2824 S  45.7   0.4   0:07.09 gh72
1127142 tatsuya   20   0 1012516 667972   2824 S 202.6   2.0   0:37.50 gh72
1127142 tatsuya   20   0 1400896   1.1g   2824 S 203.1   3.6   1:08.00 gh72
1127142 tatsuya   20   0 2189380   1.7g   2824 S 203.0   5.4   1:38.47 gh72
1127142 tatsuya   20   0 2582596   2.2g   2824 S 203.2   7.0   2:08.99 gh72
1127142 tatsuya   20   0 3041348   2.6g   2824 S 203.4   8.5   2:39.52 gh72
1127142 tatsuya   20   0 3631164   3.1g   2824 S 203.1  10.0   3:10.03 gh72
1127142 tatsuya   20   0 4024380   3.6g   2824 S 203.4  11.5   3:40.56 gh72
1127142 tatsuya   20   0 4483132   4.1g   2824 S 203.2  13.1   4:11.08 gh72
1127142 tatsuya   20   0 5335104   4.8g   2824 S 205.4  15.4   4:41.91 gh72
1127142 tatsuya   20   0 5793856   5.3g   2824 S 205.7  16.9   5:12.81 gh72
1127142 tatsuya   20   0 6522948   5.9g   2824 S 205.0  18.9   5:43.58 gh72
1127142 tatsuya   20   0 7571524   6.8g   2824 S 207.9  21.9   6:14.81 gh72
1127142 tatsuya   20   0 7964740   7.3g   2824 S 206.0  23.5   6:45.73 gh72
1127142 tatsuya   20   0 8620100   7.8g   2824 S 205.9  25.0   7:16.65 gh72
1127142 tatsuya   20   0 9078852   8.3g   2824 S 205.7  26.6   7:47.53 gh72
1127142 tatsuya   20   0 9472068   8.8g   2824 S 206.0  28.1   8:18.47 gh72
1127142 tatsuya   20   0 9826.1m   9.3g   2824 S 205.7  29.6   8:49.35 gh72
1127142 tatsuya   20   0    9.9g   9.6g   2828 S 210.3  30.6   9:20.94 gh72
1127142 tatsuya   20   0   10.0g   9.6g   2828 S 216.0  30.7   9:53.36 gh72
1127142 tatsuya   20   0   10.0g   9.7g   2828 S 216.2  31.0  10:25.84 gh72
1127142 tatsuya   20   0   10.2g   9.8g   2828 S 215.7  31.3  10:58.21 gh72
1127142 tatsuya   20   0   10.2g   9.9g   2828 S 215.6  31.6  11:30.59 gh72
1127142 tatsuya   20   0   10.3g  10.0g   2828 S 216.9  31.9  12:03.14 gh72
1127142 tatsuya   20   0   10.4g  10.1g   2828 S 216.6  32.2  12:35.68 gh72
1127142 tatsuya   20   0   10.5g  10.2g   2828 S 216.5  32.6  13:08.18 gh72
1127142 tatsuya   20   0   10.7g  10.3g   2828 S 216.6  33.0  13:40.72 gh72
1127142 tatsuya   20   0   10.8g  10.4g   2828 S 216.0  33.4  14:13.14 gh72
1127142 tatsuya   20   0   10.9g  10.6g   2828 S 216.1  33.8  14:45.60 gh72
1127142 tatsuya   20   0   11.0g  10.7g   2828 S 217.4  34.2  15:18.25 gh72
1127142 tatsuya   20   0   11.2g  10.8g   2828 S 216.7  34.6  15:50.78 gh72
1127142 tatsuya   20   0   11.3g  10.9g   2828 S 216.7  34.9  16:23.33 gh72
1127142 tatsuya   20   0   11.4g  11.0g   2828 S 217.1  35.3  16:55.92 gh72
1127142 tatsuya   20   0   11.5g  11.2g   2828 S 217.5  35.7  17:28.59 gh72
1127142 tatsuya   20   0   11.7g  11.3g   2828 S 217.6  36.1  18:01.25 gh72
1127142 tatsuya   20   0   11.7g  11.4g   2828 S 217.4  36.4  18:33.91 gh72
1127142 tatsuya   20   0   11.8g  11.5g   2828 S 217.7  36.8  19:06.59 gh72
1127142 tatsuya   20   0   12.0g  11.6g   2828 S 218.8  37.2  19:39.46 gh72
1127142 tatsuya   20   0   12.0g  11.7g   2828 S 219.8  37.5  20:12.45 gh72
1127142 tatsuya   20   0   12.2g  11.8g   2828 S 220.9  37.8  20:45.63 gh72
1127142 tatsuya   20   0   12.3g  11.9g   2828 S 224.1  38.1  21:19.26 gh72
1127142 tatsuya   20   0   12.3g  12.0g   2828 S 235.0  38.4  21:54.55 gh72
1127142 tatsuya   20   0   12.4g  12.1g   2828 S 241.2  38.6  22:30.75 gh72
1127142 tatsuya   20   0   12.5g  12.1g   2828 S 246.9  38.8  23:07.84 gh72
1127142 tatsuya   20   0   12.5g  12.2g   2828 S 253.7  39.0  23:45.92 gh72
1127142 tatsuya   20   0   12.6g  12.2g   2828 S 260.4  39.2  24:25.03 gh72
1127142 tatsuya   20   0   12.7g  12.3g   2828 S 267.0  39.3  25:05.10 gh72
1127142 tatsuya   20   0   12.7g  12.3g   2828 S 271.1  39.4  25:45.82 gh72
1127142 tatsuya   20   0   12.7g  12.3g   2828 S 275.0  39.4  26:27.10 gh72
1127142 tatsuya   20   0   12.7g  12.3g   2828 S 277.5  39.5  27:08.78 gh72
1127142 tatsuya   20   0   12.7g  12.4g   2828 S 279.3  39.5  27:50.71 gh72
1127142 tatsuya   20   0   12.7g  12.4g   2828 S 280.7  39.6  28:32.87 gh72
1127142 tatsuya   20   0   12.7g  12.4g   2828 S 282.1  39.6  29:15.22 gh72
1127142 tatsuya   20   0   12.7g  12.4g   2828 S 282.6  39.6  29:57.66 gh72
1127142 tatsuya   20   0   12.7g  12.4g   2828 S 283.5  39.6  30:40.22 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 283.8  39.7  31:22.84 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 284.3  39.7  32:05.52 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 284.4  39.7  32:48.23 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 285.0  39.7  33:31.01 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 285.1  39.7  34:13.83 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 285.5  39.7  34:56.69 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 285.7  39.7  35:39.60 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.5  39.8  36:22.60 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 285.7  39.8  37:05.51 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.5  39.8  37:48.52 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 285.5  39.8  38:31.40 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.7  39.8  39:14.44 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 284.5  39.8  39:57.17 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.2  39.8  40:40.13 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.3  39.8  41:23.28 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.1  39.8  42:06.38 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.8  39.8  42:49.46 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.7  39.8  43:32.50 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.4  39.8  44:15.51 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.3  39.8  44:58.63 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.0  39.8  45:41.74 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.9  39.8  46:24.80 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.8  39.8  47:07.88 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.9  39.8  47:50.95 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.7  39.8  48:34.17 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.3  39.8  49:17.30 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.1  39.8  50:00.27 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.2  39.8  50:43.38 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 288.1  39.8  51:26.65 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 286.0  39.8  52:09.58 gh72
1127142 tatsuya   20   0   12.8g  12.4g   2828 S 287.7  39.8  52:52.79 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.1  39.8  53:35.89 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 286.6  39.8  54:18.94 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.9  39.8  55:02.15 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 284.8  39.8  55:44.92 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.0  39.8  56:28.00 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.9  39.8  57:11.24 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.1  39.8  57:54.33 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 286.0  39.8  58:37.28 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.5  39.8  59:20.44 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 286.8  39.9  60:03.51 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.5  39.9  60:46.66 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 286.1  39.9  61:29.63 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 288.5  39.9  62:12.94 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.1  39.9  62:56.06 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 284.4  39.9  63:38.75 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 288.1  39.9  64:22.03 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 288.7  39.9  65:05.37 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 283.8  39.9  65:48.00 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.8  39.9  66:31.20 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.2  39.9  67:14.34 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 288.1  39.9  67:57.58 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 286.3  39.9  68:40.58 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.5  39.9  69:23.74 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 286.4  39.9  70:06.76 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 288.7  39.9  70:50.09 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 285.8  39.9  71:33.02 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 291.0  39.9  72:16.70 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 282.4  39.9  72:59.12 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.5  39.9  73:42.28 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 286.0  39.9  74:25.23 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.9  39.9  75:08.45 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.9  39.9  75:51.69 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 288.0  39.9  76:34.92 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 287.9  39.9  77:18.16 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S 208.6  39.9  77:49.47 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S   0.0  39.9  77:49.47 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S   0.1  39.9  77:49.49 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S   0.1  39.9  77:49.50 gh72
1127142 tatsuya   20   0   12.8g  12.5g   2828 S   0.0  39.9  77:49.50 gh72

Test 2:

Ran wrk again for 30 minutes, but for this time, Changed the cache_item from this:

    let cache_item = TestStruct {
        // name.len() is 9 bytes.
        name: "Some name".to_owned()
    };

to this:

    let cache_item = TestStruct {
        // name.len() is now 90 bytes.
        name: String::from_utf8_lossy(&[b'a'; 90]).into_owned()
    };

The RES value (maybe) stopped growing after it reached to 5.7GiB.

Result 2: (Click to expand)
$ head -7 top2.log
top - 20:50:15 up 1 day, 13:20,  2 users,  load average: 0.35, 0.26, 0.38
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  1.7 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  32003.3 total,  25387.0 free,   5329.9 used,   1286.4 buff/cache
MiB Swap:  16384.0 total,  16384.0 free,      0.0 used.  26102.7 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
$ rg -N gh top2.log
1139173 tatsuya   20   0  210792   3036   2532 S   0.0   0.0   0:00.19 gh72
1139173 tatsuya   20   0  210792   3036   2532 S   0.0   0.0   0:00.19 gh72
1139173 tatsuya   20   0  614260 175548   2960 S  54.2   0.5   0:08.32 gh72
1139173 tatsuya   20   0 1173884 845576   2960 S 203.2   2.6   0:38.84 gh72
1139173 tatsuya   20   0 1596484   1.3g   2964 S 208.7   4.1   1:10.16 gh72
1139173 tatsuya   20   0 1727556   1.4g   2964 S 214.4   4.6   1:42.37 gh72
1139173 tatsuya   20   0 1858628   1.5g   2964 S 215.9   4.9   2:14.77 gh72
1139173 tatsuya   20   0 1989700   1.6g   2964 S 219.2   5.3   2:47.70 gh72
1139173 tatsuya   20   0 1989700   1.7g   2964 S 262.8   5.4   3:27.15 gh72
1139173 tatsuya   20   0 1989700   1.7g   2964 S 279.1   5.5   4:09.07 gh72
1139173 tatsuya   20   0 1989700   1.7g   2964 S 281.4   5.5   4:51.31 gh72
1139173 tatsuya   20   0 1989700   1.7g   2964 S 280.4   5.5   5:33.43 gh72
1139173 tatsuya   20   0 1995856   1.7g   2964 S 282.1   5.5   6:15.78 gh72
1139173 tatsuya   20   0 2008168   1.8g   2964 S 280.2   5.6   6:57.87 gh72
1139173 tatsuya   20   0 2161812   1.8g   2964 S 281.1   5.8   7:40.06 gh72
1139173 tatsuya   20   0 2183356   1.8g   2964 S 278.6   5.9   8:21.90 gh72
1139173 tatsuya   20   0 2210032   1.9g   2964 S 279.3   6.1   9:03.82 gh72
1139173 tatsuya   20   0 2314556   2.0g   2964 S 274.0   6.3   9:44.97 gh72
1139173 tatsuya   20   0 2249020   1.9g   2964 S 217.5   6.0  10:17.61 gh72
1139173 tatsuya   20   0 2249020   1.9g   2964 S 215.9   6.1  10:50.04 gh72
1139173 tatsuya   20   0 2380092   2.1g   2964 S 216.0   6.7  11:22.46 gh72
1139173 tatsuya   20   0 2642236   2.3g   2964 S 215.0   7.2  11:54.75 gh72
1139173 tatsuya   20   0 2773308   2.4g   2964 S 216.1   7.8  12:27.18 gh72
1139173 tatsuya   20   0 3035452   2.6g   2964 S 216.8   8.4  12:59.74 gh72
1139173 tatsuya   20   0 3166524   2.8g   2964 S 219.5   8.9  13:32.69 gh72
1139173 tatsuya   20   0 3297596   2.9g   2964 S 225.4   9.4  14:06.54 gh72
1139173 tatsuya   20   0 3428668   3.1g   2964 S 241.8   9.8  14:42.83 gh72
1139173 tatsuya   20   0 3494204   3.1g   2964 S 264.0  10.0  15:22.48 gh72
1139173 tatsuya   20   0 3559740   3.1g   2964 S 279.1  10.1  16:04.38 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 283.8  10.1  16:47.00 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 285.3  10.1  17:29.83 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 285.4  10.1  18:12.69 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 286.0  10.1  18:55.62 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 286.9  10.2  19:38.71 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 286.2  10.2  20:21.67 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 287.1  10.2  21:04.79 gh72
1139173 tatsuya   20   0 3559740   3.2g   2964 S 286.7  10.2  21:47.82 gh72
1139173 tatsuya   20   0 3563840   3.2g   2964 S 286.3  10.2  22:30.82 gh72
1139173 tatsuya   20   0 3572040   3.2g   2964 S 285.5  10.3  23:13.68 gh72
1139173 tatsuya   20   0 3569988   3.2g   2964 S 287.3  10.3  23:56.83 gh72
1139173 tatsuya   20   0 3582288   3.2g   2964 S 285.3  10.4  24:39.65 gh72
1139173 tatsuya   20   0 3588436   3.3g   2964 S 286.1  10.4  25:22.62 gh72
1139173 tatsuya   20   0 3662172   3.3g   2964 S 285.9  10.5  26:05.54 gh72
1139173 tatsuya   20   0 3662172   3.3g   2964 S 283.4  10.5  26:48.11 gh72
1139173 tatsuya   20   0 3731808   3.3g   2964 S 284.3  10.5  27:30.78 gh72
1139173 tatsuya   20   0 3740008   3.3g   2964 S 285.4  10.6  28:13.65 gh72
1139173 tatsuya   20   0 3752308   3.3g   2964 S 285.4  10.7  28:56.49 gh72
1139173 tatsuya   20   0 3760508   3.3g   2964 S 286.0  10.7  29:39.44 gh72
1139173 tatsuya   20   0 3760508   3.4g   2964 S 285.2  10.7  30:22.25 gh72
1139173 tatsuya   20   0 3781004   3.4g   2964 S 286.0  10.9  31:05.21 gh72
1139173 tatsuya   20   0 3785104   3.4g   2964 S 286.3  10.9  31:48.18 gh72
1139173 tatsuya   20   0 3793304   3.4g   2964 S 285.0  11.0  32:30.99 gh72
1139173 tatsuya   20   0 3864988   3.5g   2964 S 285.0  11.1  33:13.77 gh72
1139173 tatsuya   20   0 3881388   3.5g   2964 S 284.7  11.2  33:56.53 gh72
1139173 tatsuya   20   0 3967424   3.6g   2964 S 284.4  11.4  34:39.22 gh72
1139173 tatsuya   20   0 3981772   3.6g   2964 S 286.8  11.5  35:22.30 gh72
1139173 tatsuya   20   0 3987920   3.6g   2964 S 284.7  11.5  36:05.04 gh72
1139173 tatsuya   20   0 4008420   3.7g   2964 S 284.3  11.7  36:47.74 gh72
1139173 tatsuya   20   0 4088304   3.7g   2964 S 285.3  11.8  37:30.57 gh72
1139173 tatsuya   20   0 4164088   3.7g   2964 S 287.2  11.9  38:13.70 gh72
1139173 tatsuya   20   0 4180484   3.8g   2964 S 283.7  12.1  38:56.29 gh72
1139173 tatsuya   20   0 4131224   3.8g   2964 S 258.5  12.0  39:35.12 gh72
1139173 tatsuya   20   0 3823936   3.4g   2964 S 218.2  10.8  40:07.87 gh72
1139173 tatsuya   20   0 3823936   3.4g   2964 S 216.5  10.8  40:40.39 gh72
1139173 tatsuya   20   0 3823936   3.4g   2964 S 215.0  10.8  41:12.66 gh72
1139173 tatsuya   20   0 3823936   3.4g   2964 S 215.0  10.8  41:44.96 gh72
1139173 tatsuya   20   0 3823936   3.4g   2964 S 214.6  10.8  42:17.20 gh72
1139173 tatsuya   20   0 3823936   3.4g   2964 S 215.0  10.8  42:49.47 gh72
1139173 tatsuya   20   0 3823936   3.4g   2964 S 214.8  10.8  43:21.74 gh72
1139173 tatsuya   20   0 3955008   3.5g   2964 S 216.5  11.4  43:54.24 gh72
1139173 tatsuya   20   0 4086080   3.7g   2964 S 215.1  11.9  44:26.55 gh72
1139173 tatsuya   20   0 4348224   3.9g   2964 S 215.8  12.5  44:58.94 gh72
1139173 tatsuya   20   0 4479296   4.1g   2964 S 216.0  13.1  45:31.38 gh72
1139173 tatsuya   20   0 4741440   4.3g   2964 S 215.9  13.6  46:03.78 gh72
1139173 tatsuya   20   0 4872512   4.4g   2964 S 216.6  14.2  46:36.32 gh72
1139173 tatsuya   20   0 5003584   4.6g   2964 S 218.1  14.7  47:09.06 gh72
1139173 tatsuya   20   0 5265728   4.8g   2964 S 219.5  15.3  47:42.03 gh72
1139173 tatsuya   20   0 5396800   4.9g   2964 S 221.7  15.8  48:15.30 gh72
1139173 tatsuya   20   0 5527872   5.1g   2964 S 223.4  16.3  48:48.86 gh72
1139173 tatsuya   20   0 5658944   5.2g   2964 S 229.4  16.7  49:23.29 gh72
1139173 tatsuya   20   0 5790016   5.3g   2964 S 236.4  17.0  49:58.79 gh72
1139173 tatsuya   20   0 5921088   5.4g   2964 S 249.4  17.3  50:36.23 gh72
1139173 tatsuya   20   0 5921088   5.5g   2964 S 261.9  17.5  51:15.56 gh72
1139173 tatsuya   20   0 6052160   5.5g   2964 S 271.1  17.6  51:56.28 gh72
1139173 tatsuya   20   0 6052160   5.5g   2964 S 277.3  17.7  52:37.90 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 280.6  17.8  53:20.04 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 282.0  17.8  54:02.37 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 282.0  17.9  54:44.72 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 283.7  17.9  55:27.30 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 283.4  17.9  56:09.86 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 285.2  17.9  56:52.67 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 287.0  17.9  57:35.77 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 286.1  17.9  58:18.72 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 285.3  17.9  59:01.57 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 285.4  17.9  59:44.41 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 285.1  18.0  60:27.23 gh72
1139173 tatsuya   20   0 6052160   5.6g   2964 S 288.0  18.0  61:10.46 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 288.3  18.0  61:53.77 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 284.8  18.0  62:36.52 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 287.4  18.0  63:19.68 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 287.2  18.0  64:02.79 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 286.8  18.0  64:45.87 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 288.6  18.0  65:29.19 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 284.6  18.0  66:11.93 gh72
1139173 tatsuya   20   0 6117696   5.6g   2964 S 288.8  18.0  66:55.28 gh72
1139173 tatsuya   20   0 6191428   5.6g   2964 S 288.0  18.1  67:38.54 gh72
1139173 tatsuya   20   0 6191428   5.6g   2964 S 284.0  18.1  68:21.17 gh72
1139173 tatsuya   20   0 6191428   5.6g   2964 S 288.7  18.1  69:04.54 gh72
1139173 tatsuya   20   0 6191428   5.6g   2964 S 288.7  18.1  69:47.87 gh72
1139173 tatsuya   20   0 6199624   5.7g   2964 S 286.6  18.1  70:30.92 gh72
1139173 tatsuya   20   0 6195524   5.7g   2964 S 287.3  18.1  71:14.04 gh72
1139173 tatsuya   20   0 6195524   5.7g   2964 S 284.6  18.1  71:56.78 gh72
1139173 tatsuya   20   0 6195524   5.7g   2964 S 282.2  18.1  72:39.14 gh72
1139173 tatsuya   20   0 6195524   5.7g   2964 S 290.3  18.1  73:22.75 gh72
1139173 tatsuya   20   0 6195524   5.7g   2964 S 286.3  18.1  74:05.72 gh72
1139173 tatsuya   20   0 6203720   5.7g   2964 S 283.8  18.2  74:48.35 gh72
1139173 tatsuya   20   0 6203720   5.7g   2964 S 289.5  18.2  75:31.81 gh72
1139173 tatsuya   20   0 6203720   5.7g   2964 S 285.9  18.2  76:14.75 gh72
1139173 tatsuya   20   0 6199620   5.7g   2964 S 282.0  18.2  76:57.08 gh72
1139173 tatsuya   20   0 6207816   5.7g   2964 S 283.9  18.3  77:39.72 gh72
1139173 tatsuya   20   0 6216012   5.7g   2964 S 286.1  18.3  78:22.66 gh72
1139173 tatsuya   20   0 6216012   5.7g   2964 S 287.8  18.3  79:05.89 gh72
1139173 tatsuya   20   0 6216012   5.7g   2964 S 194.2  18.3  79:35.04 gh72
1139173 tatsuya   20   0 6216012   5.7g   2964 S   0.1  18.3  79:35.05 gh72
1139173 tatsuya   20   0 6216012   5.7g   2964 S   0.1  18.3  79:35.06 gh72

Note

Please note that the cache stores both key and value in the internal concurrent hash map, so it is expected that the cache will consume more memory than 256GiB in your program. (But not 12.5GiB)

The weigher returns value.len(), which is 10 bytes. So the max_capacity 256MiB means the cache will store up to ~27 million entries.

$ julia -q

# value.len() is 10 bytes.
julia> num_entries = 256 * 1024^2 / 10
2.68435456e7

julia> using Printf

julia> @printf("%.2f", num_entries)
26843545.60

However, each entry will be 50 bytes (or larger).

# key (UUID):
#    16 bytes
# value (Vec<u8>):
#    data: 10 bytes (or more for pre-allocated extra capacity)
#    ptr:   8 bytes 
#    cap:   8 bytes (usize)
#    len:   8 bytes (usize)

julia> entry_size = 16 + 10 + 8 * 3
50

So ~27 million entries will consume 1.25GiB or more.

julia> entry_size * num_entries / 1024^3
1.25

@chriskyndrid
Copy link
Author

Hi Tatsuya,

Thank you for the thorough information, I appreciate it, and your work on Moka and in resolving issues like this.

The approximations you list in the latter half of your response is helpful for approximating the total size of an entry. I wasn't thinking about the key size, and didn't do any further analysis than the Vec<u8> len(). I'll use the information you provided in my planning moving forward.

Thanks again! I'll continue to follow the issue.

@tatsuya6502
Copy link
Member

Hi @chriskyndrid,

I made some progress.

To investigate the issue, I modified the source codes of Moka and moka-cht (concurrent hash table) to collect and expose some statistics of the internal data structures. I also modified your code to print out these statistics every 15 seconds.

I published the entire project here:

(If you are curious, you can try running it by following the README of gh72 repository)

A sample result

Here is a result of running the gh72 program.

gh72-result-2022-01-23

To make the table smaller, I edited the source code and reduced the max capacity of the cache to 8MiB.

I ran the followings:

  1. wrk -d 5 -t 8 -c 10 http://localhost:8080/v1/test/moka/leak (5 seconds)
  2. curl http://localhost:8080/v1/test/moka/clear, which invalidates all cached entries.
  3. wrk -d 30 -t 8 -c 10 http://localhost:8080/v1/test/moka/leak. (30 seconds)
  4. curl http://localhost:8080/v1/test/moka/clear again.

For the descriptions of the columns, please see the README of gh72 repository.

Observations

I found the followings:

  1. The current internal data structures are not optimal in term of memory usage. This is causing the excessive memory usage.
    • It will get worse when:
      • The cache is configured to store millions of entries.
      • AND, the key has high cardinality. (The key has many possible values)
  2. There is no sign of memory leak. All internal data structures I checked were dropped when possible.

The memory overhead of the internal data structures

For 1., there are some areas to improve. I created a diagram of the internal structures to help us to understand (including myself):

Moka-mem-overhead-v0 7 1-2022-01-23

  • The key K (1) and value V (2) are the data that user wants to store.
  • All other data structurs are used to realize cache's main features with amotized O(1) time complexity.
    • The main structures:
      • A concurrent hash map, which allocates extra capacity to deal with entry insertions.
      • Two queues (double-ended queues), implemented by doubly linked list.
      • (Not in the diagram) A frequency sketch (population estimator, or LFU filter)
  • The numbers under each box (e.g. +8) indicate the memory overhead, the additional space needed for each entry on 64-bit platform (in bytes).
    • In GH Issue 72's case, write order queue is disabled, and the overhead per entry is 192 bytes.

Areas to improve

Here is a non-exhaustive list of areas where we could improve:

  • a. Eliminate the weak_counts:
    • The three boxes labeled ArcInner uses std::sync::Arc. They have weak_count filed, which takes 8 bytes each. However we do not use it.
    • Savings: 8 * 3 * ~27 million entries = 614MiB.
    • We could use triomphe::Arc from triomphe crate as a drop-in replacement of the std Arc.
  • b. Eliminate the enum tags:
    • Stop using enum for EntryInfo.
    • Savings: 8 * 3 * ~27 million entries = 614MiB.
  • c. Eliminate the region fields:
    • Savings: 8 * 2 * ~27 million entries = ~410MiB.
  • d. Fix a bug that the LFU filter seems twice as big as necessary.
    • For tracking ~27 milion * 10 entries (~270 million entries), I think the filter should only take 256MiB of memory, but is actually taking 512MiB.
    • This bug only exists when size-aware eviction is used.
  • e. Enhance moka-cht Hash Map to drop Bucket and Key for a removed entry as soon as possible:
    • moka-cht doc — Implementation details
      • removing an entry from the hash table results in that bucket pointer having a tombstone bit set. Insertions cannot displace a tombstone bucket unless their key compares equal, so once an entry is inserted into the hash table, the specific index it is assigned to will only ever hold entries whose keys compare equal. Without this restriction, resizing operations could result in the old and new bucket arrays being temporarily inconsistent.

      • ... Tombstone bucket pointers are typically not copied into new bucket arrays.

    • You can confirm this behavior by running the moka-gh72 program. (See a sample result above)

Conclusion

By modifying the test program to track the statistics on the internal data structure, I found some areas to improve.

I will create a plan for improve these areas. a. to d. look straightforward. e. will need some thinking.

@chriskyndrid
Copy link
Author

chriskyndrid commented Jan 23, 2022

Tatsuya,

This is an excellent synopsis of your investigation. I'm glad the issue turned out to not be a memory leak. It's clear you have a thorough understanding of your design and the intention behind the design. It's also clear you've identified some concrete areas to reduce egregious memory utilization and I'll look forward to monitoring progress on those efforts. Again, thank you for your work on Moka and focus on improvement.

The primary motivation I have for using Moka is to offload common requests that require somewhat expensive network I/O, but don't change particularly frequently. Because of the architecture of the system I'm designing, memory is somewhat constrained on various service worker nodes, so controlling the maximal cache memory utilization is important to not negatively impact the efficacy of the overall worker and it's other tasks. The workers do not have multi GB amounts of memory to allocate for the purpose of caching, so landing on an appropriate functional max_capacity is important to me. I'm really less concerned about entries, and more concerned about memory footprint....

At any rate, thanks again, and let me know when you have made some of these changes. I'll run a bunch of tests in our codebase.

Chris

@tatsuya6502 tatsuya6502 added bug Something isn't working enhancement New feature or request labels Feb 1, 2022
@tatsuya6502 tatsuya6502 added this to the v0.7.2 milestone Feb 1, 2022
tatsuya6502 added a commit that referenced this issue Feb 1, 2022
- Perform GC-only rehashing to remove tombstones.
- Shrink capacity when possible.

Relates to #72.
tatsuya6502 added a commit that referenced this issue Feb 1, 2022
…tilization

- Replace `Arc` for `Arc<ValueEntry>`, `Arc<EntryInfo>` and `Arc<value_initializer::Waiter>`.
- Add `triomphe` to the dependencies.

Relates to #72.
@tatsuya6502
Copy link
Member

Hi @chriskyndrid,

OK. I pushed some changes for the following items to the master branch:

  • a. Eliminated the weak_counts.
  • b. Eliminated the enum tags.
  • d. Fixed a bug that the LFU filter seems twice as big as necessary.
  • e. Enhanced moka-cht hash map to drop Bucket and Key for a removed entry as soon as possible.

I ran the gh72 program against two different versions of Moka and recorded the memory utilization.

moka-gh72-mem-2022-0202

Moka versions:

  • Blue line: Moka v0.7.1
  • Green line: Moka master branch (v0.7.2-devel)

Axes:

  • X-axis: Time in seconds.
  • Y-axis: Allocated memory for the application (in GiB).

Areas/Points:

  • Area A (10 minutes): Continued inserting entries with non-duplicate keys, using the following command:
    • wrk -d 600 -t 8 -c 100 http://localhost:8080/v1/test/moka/leak
  • Point B: Triggered invalidate_all, using the following command:
    • curl http://localhost:8080/v1/test/moka/clear
  • Area C (10 minutes): Continued inserting entries again.
  • Point D: Triggered invalidate_all again.

Notes:

  • I changed the max capacity from 256MiB to 64MiB. So the cache could store max 6,710,886 entries.
  • The raw data for each entry was 50 bytes.
    • 50 bytes * 6,710,886 entries ~= 320MiB.

Results:

  • Moka v0.7.1 (blue) kept the most of Buckets and Keys for removed entries. So memory utilization continued growing as they built up.
  • Moka v0.7.2-devel (green)
    • was able to drop Buckets and Keys for removed entries within half minute. So memory utilization was kept minimum.
    • had ~24% smaller per-entry memory overhead than v0.7.1.
      • 1.51GiB vs 1.88GiB, both including the raw data 320MiB => 1 - (1.51 - 0.31) / (1.88 - 0.31) => 23.57%

I hope you are happy with v0.7.2-devel's result.

You can try v0.7.2-devel by putting the following line to your Cargo.toml.

[dependencies]
moka = { git = "https://github.com/moka-rs/moka", branch = "master", features = ["future"] }

I will start to prepare for v0.7.2 release. It will take few more days. (v0.7.2 milestone)

@tatsuya6502
Copy link
Member

Released v0.7.2 with these improvements. Also updated the diagram of the internal data structure:

moka-mem-overhead-v0 7 2-2022-02-06

@chriskyndrid
Copy link
Author

Thank you for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants