Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data-size when memory off? #79

Closed
igorescobar opened this issue May 21, 2020 · 11 comments
Closed

data-size when memory off? #79

igorescobar opened this issue May 21, 2020 · 11 comments

Comments

@igorescobar
Copy link

Hi,

I'm trying to understand how this data-size + dict-size works together.

On my setup I have I have memory off and disk on. Initially, I thought that data-size was actually the max amount of data I want to store inside of dir but when I raised that to gigabytes in size (which is ok if we are talking about disk) is started to have errors like nuster_1 | [ALERT] 141/085908 (1) : Out of memory when initializing cache.

So why data-size is relevant when memory off?

@jiangwenyuan
Copy link
Owner

@igorescobar what's your dict-size and data-size value? It's unlikely to occur Out of memory when initializing cache.

See https://github.com/jiangwenyuan/nuster#global-nuster-cachenosql

A memory zone with a size of data-size + dict-size will be created.

Except for temporary data created and destroyed within a request, all cache related data including HTTP response data, keys and overheads are stored in this memory zone

So when memory of, the response is not stored in memory, but other info like key, cache entry are stored there.

@igorescobar
Copy link
Author

Oh, so even when memory off we still need to allocate some memory for data-size. Do you know how to calculate this based on number of cached files?

@jiangwenyuan
Copy link
Owner

jiangwenyuan commented May 21, 2020

Unless you know the average size of key and the length of host/path, etc, it's impossible to calculate

a request at least needs: ~300: sizeof(nst_dict_entry_t) + sizeof(key)(build from method.scheme.host.uri, default) + some extra overhead.

You can find out how much memory is used by:

https://github.com/jiangwenyuan/nuster#output

# The size of the cache memory store in bytes, approximate equals to dict-size + data-size
store.memory.cache.size:        2098200576
# The size of used memory of the cache memory store
store.memory.cache.used:        1048960

@igorescobar
Copy link
Author

Is there a way to just let Nuster use whatever he needs to use as long as it is available?

@igorescobar
Copy link
Author

I'm asking about how to calculate because Nginx for example has this keys_zone=my_cache:10m. And it gives you this to guide:

keys_zone sets up a shared memory zone for storing the cache keys and metadata such as usage timers. Having a copy of the keys in memory enables NGINX to quickly determine if a request is a HIT or a MISS without having to go to disk, greatly speeding up the check. A 1‑MB zone can store data for about 8,000 keys, so the 10‑MB zone configured in the example can store data for about 80,000 keys.

So it kinda helps you to calculate how much memory you might need based on the number of keys you have on your hash table.

@jiangwenyuan
Copy link
Owner

nginx key_zones is something like dict-size

An approximate number of keys multiplied by 8 (normally) as dict-size should be fine.

So a 1MB dict-size has a length of 1024*1024/8=131072(but you can save more than that)

But for data-size, it's different. Only pointers are store in dict while real data are stored in data.

But if you do want some estimation, how about bufsize * cache_count

@igorescobar
Copy link
Author

Do you have any idea why would Nuster stop writing to disk without giving any sort of error?

Screenshot 2020-05-21 at 11 47 16

All of the sudden it stoped writing but everything seems to be fine on the logs:

2020-05-21T11:43:04.112+01:00
[NOTICE] 141/104303 (1) : [nuster][cache] on, dict_size=524288000, data_size=524288000
2
2020-05-21T11:43:04.112+01:00
[NOTICE] 141/104303 (1) : New worker #1 (7) forked
3
2020-05-21T11:43:04.111+01:00
<133>May 21 10:43:03 nuster[1]: Proxy fe started.
4
2020-05-21T11:43:04.111+01:00
<133>May 21 10:43:03 nuster[1]: Proxy be started.
5
2020-05-21T11:43:03.057+01:00
<133>May 21 10:43:01 nuster[1]: Proxy fe started.
6
2020-05-21T11:43:03.057+01:00
<133>May 21 10:43:01 nuster[1]: Proxy be started.
7
2020-05-21T11:43:03.057+01:00
[NOTICE] 141/104301 (1) : New worker #1 (7) forked

@igorescobar
Copy link
Author

This is my current config:

global
  # maxconn 10000
  nuster cache on data-size 500m dict-size 500m dir /nuster/cache
  log stdout local0 info
  master-worker
  # nuster manager on uri /internal/nuster purge-method PURGEX
  # debug

defaults
  log global
  mode http
  log-format '{"time":{"tr":%Tr,"tq":%Tq,"tw":%Tw,"tc":%Tc,"tt":%Tt},"haproxy":{"retries":%rc,"request":{"method":"%HM","uri":"%[capture.req.uri]","protocol":"%HV","header":{"host":"%[capture.req.hdr(0),json(utf8s)]","xforwardfor":"%[capture.req.hdr(1),json(utf8s)]","referer":"%[capture.req.hdr(2),json(utf8s)]"}},"name":{"server":"%s"},"response":{"status_code":%ST,"header":{"xrequestid":"%[capture.res.hdr(0),json(utf8s)]"}},"bytes":{"uploaded":%U,"read":%B}}}'
  option http-keep-alive

  retries 1
  option redispatch
  timeout http-keep-alive  72s
  timeout client           72s
  timeout connect          72s
  timeout server           72s
frontend fe
  bind *:3001
  default_backend be
backend be
  nuster rule r0 disk on memory off use-stale on wait on key method.scheme.host.path ttl 12d if { res.hdr(Content-Type) -m beg image/ }
  http-response set-header x-cache HIT if { nuster.cache.hit }
  server s1 localhost:3000

/nuster/cache has approx 9.1 GiB there.

@igorescobar
Copy link
Author

It was my mistake. I thought that if I had the nuster cache on on inside global section I didn't have to mention it again inside of the backend section.

Thanks for your support and your input in this @jiangwenyuan

@jiangwenyuan
Copy link
Owner

Ah, yeah, you need to enable nuster on each backend

@jiangwenyuan
Copy link
Owner

@igorescobar Turns out that Out of memory when initializing cache. is a misleading warning. It happens when failed to create dir too. I will fix that message..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants