Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] Disabling scan-frequency for Carbonserver #400

Open
loitho opened this issue Feb 3, 2021 · 15 comments
Open

[Q] Disabling scan-frequency for Carbonserver #400

loitho opened this issue Feb 3, 2021 · 15 comments
Labels

Comments

@loitho
Copy link
Contributor

loitho commented Feb 3, 2021

Hi there,

excuse the maybe naive question but, we're using go-carbon and are receiving 650 000 Metrics / Node, out of a 4 node cluster (so 2.5 Million metrics) per minute.
One issue we're facing is that the load iops peak tremendously when a scan is triggered to build the index for carbonserver.
There is around 3 750 000 metrics per node.
I then applied the following configuration for Carbonserver :

[carbonserver]
enabled = true
buckets = 10
metrics-as-counters = false
read-timeout = "60s"
write-timeout = "60s"

query-cache-enabled = true
query-cache-size-mb = 0
find-cache-enabled = true

trigram-index = false
scan-frequency = "0m0s"
trie-index = true

As you can see, I set scan-frequency to 0.
And it's working nicely, my servers don't get choked for 5 minutes (very high load during this period) trying to read all of the files.
(This behavior was happening even when only the trie index was enabled)

And so I was wondering, is this a problem to run the configuration like that ? Considering that I'm still able to have good performance from my cluster compared to before, is there a reason where I should enable this setting back ?

Kind regards,

Thomas

@loitho loitho added the question label Feb 3, 2021
@bom-d-van
Copy link
Member

Hi @loitho , if scan-frequency is set to 0. No index is built. trie-index = true is a no-op.

It's a trade-off in the current system. Without index, your queries might become slower as it falls back to using filesystem globing, but things should continue to work.

How many memory does your server have? If the memory capacity of the server is big enough, the kernel should be able cache all the file system metadata in memory and you shouldn't have too much io issues caused by scanning directories.


Not sure how many people are having with issues with file system scanning. But now with concurrent and realtime indexing support in trie-index, we should be able to support indexing without scanning.

@loitho
Copy link
Contributor Author

loitho commented Feb 3, 2021

Hi @bom-d-van thank you for your quick reply ! I understand, so basically, carbonserver is behaving like a graphite web instance and looking at the whisper file. Which is honnestly still a pretty good thing as it stops me from having to install a graphite-web (nginx + gunicorn) on each of my node.

if scan-frequency is set to 0. No index is built. trie-index = true is a no-op.

Makes sense, I didn't see any "build index time" on the graph so I assumed so :)

How many memory does your server have? if the memory capacity of the server is big enough, the kernel should be able cache all the file system metadata in memory and you shouldn't have too much io issues

Each node has 32 GB of RAM, is there a way to make sure the kernel has put the file system metadata in cache ?
Some info on the system, we're running CentOS 7.9 with the standard kernel (3.10) and with an XFS filesystem + noatime on the mount point and go-carbon 0.15.5
And the optimization suggested on the go-carbon documentation regarding memory
Just to illustrate what happens when the scan is enabled every 15 minutes , this is the load
image
and this is the IOPS (on a 12K iops disk)
image
Of course, the more read it tries to cram, the less writes and the less writes the more load there is, hte longer it takes etc ...

But now with concurrent and realtime indexing support in trie-index, we should be able to support indexing without scanning.

That would be awesome !

@bom-d-van
Copy link
Member

@loitho can you also share the graph for memory and disk write metrics? Also, with collectd, I think there are merged read/write iops as well, can you also share that. Just trying to understand more of your system resource usage level.

@bom-d-van
Copy link
Member

bom-d-van commented Feb 3, 2021

Each node has 32 GB of RAM, is there a way to make sure the kernel has put the file system metadata in cache ?

I haven't tweaked it myself, but you can try google it a bit and find some proper kernel tuning parameters. This one might do the job:

https://unix.stackexchange.com/a/76750/22938

vfs_cache_pressure

Controls the tendency of the kernel to reclaim the memory which is used for
caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

(@deniszh or @azhiltsov might have better suggestion/knowledge on this area.)

@loitho
Copy link
Contributor Author

loitho commented Feb 3, 2021

Hi again,
Sure, this is the READ / Write graph :
read is green (top), write is yellow (bottom)
image
I used the host 1 but they all are the same

For the merged IO :
image
image
Combined :
image

I asked the question and started googling as well (should have done the other way around) and found the same thread, I'm gonna poke around with this option and look for more information

@loitho
Copy link
Contributor Author

loitho commented Feb 3, 2021

Sorry, forgot the memory as well :
image
We interestingly see that the cached data has huge variations every time there is a scan

@bom-d-van
Copy link
Member

bom-d-van commented Feb 3, 2021

Hmm, I don't think I understand this memory usage pattern. Lots of memory are freed and then used as cache.

Can you also share the cache.queueWriteoutTime, persister.updateOperations, and persister.committedPoints from graphite? Want to see if there is any connections.

At the same time, you can also try enabling concurrent-index and realtime-index using the config bellow. With this config, go-carbon only keeps one copy of index in memory.

scan-frequency = "5m0s"
trie-index = true
concurrent-index = true
realtime-index = 500000

If the above config helps, you can also increase scan-frequency to 30m or more.

Also it just occur to me, 32GB of ram is big enough to keep all the dentires and inode in memory. For 650000 metrics/files, it should just be a few hundred MBs (at most 1GB). But I'm just speculating.

@loitho
Copy link
Contributor Author

loitho commented Feb 3, 2021

Thank you for your help,
I think that it's due to the variation of the number of updates; They drops due to the high number of reads that squash the number of write :
image
When the updates get lower, the number of point per update increase, the number of commited point gets lower tho':
image
And the queue writeout time :
image

We have more around 3 750 000 metrics per node due to lot of machines being autoscaled etc ...
I have found more info here too : https://unix.stackexchange.com/questions/30286/can-i-configure-my-linux-system-for-more-aggressive-file-system-caching
Thank you for your configuration, I'll try it.

You also have the "trigram-index" disabled, correct ?

@azhiltsov
Copy link
Member

Looking at IOPS/LA graph I conclude you are using the spinning disk or array, not SSD. Am I right here?
Do you run only go-carbon on the box, or is there anything else what can interfere (I do not like the memory free/cached/used graph pattern)?
How big is the [cache] max-size ?
And how much of the memory is allocated by go-carbon itself?

Normally you shouldn't see page caches to be evicted as the go-carbon performance is heavily relying on them.

@loitho
Copy link
Contributor Author

loitho commented Feb 3, 2021

Hi @azhiltsov thank you for your answer,
No, we're running 16K peak, 12K sustained IOPS SSD GP3 disks from AWS actually. I'm curious, how does the pattern tells you what type of disk we're running ?
Only go-carbon and buckyd are running on the box. I stopped buckyd to check and go-carbon that uses most of the memory :)
Max cache size is 10 Millions
When stopping the Carbon server index, all the graph gets much nicer and flatter, hence why I created this thread
here for the queue writeout time
image
And commited point :
image

PS I haven't tried the suggestion and configuration above yet as my day already ended :)

@azhiltsov
Copy link
Member

No, we're running 16K peak, 12K sustained IOPS SSD GP3 disks from AWS actually. I'm curious, how does the pattern tells you what type of disk we're running ?

This is very common observation of mine from the past (not related to go-carbon) if disk is saturated then LA is going up because your cores are waiting for IO. I might be wrong.

I think you facing two problems

  1. a throttling from AWS.
    according to this you allowed to have only up to 16000 iops with 16KiB per IO and you probably doing 512 or 1KiB IO operations which converted up to the FS block size.
  2. Lack of memory to keep your caches around.

Since the index is only needed to speed-up a queries its up to you to decide to use it or not.
If your queries are fast enough - disable the indexes and this is your first solution.

You need more speed? Get more memory. Start with 64G. If not enough bump it further. The whole performance paradigm of go-carbon is build on top of keeping as much of disk activity in page caches as possible. So you need to make sure that your caches are staying put in the memory and never evicted. This is your second solution.

Extra performance points:
We are running on enterprise grade SSD which can go up to 200K IO and we are using 4096 blocks on the XFS filesystem (which is default but worth to check)
This is our mount option (might cost you extra memory) rw,noatime,nodiratime,attr2,inode64,logbufs=8,logbsize=256k,noquota
And we are running 4.19 kernel as it was ~20%? (don't remember) faster. But this shouldn't affect the memory consumprtion, so probably irrelevant.
Also we are running quite old go-carbon 0.14 compiled with go 1.12 something, so might be, might be newer version of either golang or go-carbon doing something with memory differently, but I can not tell.

@loitho
Copy link
Contributor Author

loitho commented Feb 8, 2021

Hi, it's me again, with some news :
@azhiltsov

This is very common observation of mine from the past (not related to go-carbon) if disk is saturated then LA is going up because your cores are waiting for IO. I might be wrong.

Sorry for the dumb question but what is "LA" ?
Yeah basically we see that the limiting factor is IOPS. Fun fact, AWS throttle the instance to 12K IOPS sustained and 16K peaks for half an hour a day guaranteed.

Now, here is what I tried,

First I updated to go-carbon 0.15.6 (thank you for the fix and the ARM64 build, it'll serve us in the future !)

vm.vfs_cache_pressure = 1 on every even node of our cluster (as opposed to the 100 by default),
With the following config on our cluster :

scan-frequency = "30m0s"
trie-index = true
concurrent-index = true
realtime-index = 500000

Our machines are m5.2xlarge on AWS with GP3 disks with 16K IOPS / 200 MBps
here is the result on file scan time. Know that node 1 and node 2 are doing the exact same thing (receiving same metrics etc ...):
image
We see that the initial crawl for data is slower, but, that once it's done the query are faster (nearly 2x)

Queue writeout time doesn't have any spike, pretty good ! :
image

Let's check the load :
image
Okay we see that with the vfs pressure to 1, the cluster seem to have less load spikes on each scan (after a huge load for the scan), but the average load is a tad bit higher.
What about IOPS ?
image
Seems like the higher load is due to more read. We see the read spike on the machine with the default configuration, but once the spike has gone, there is nearly 0 reads.

The memory interestingly shows that we indeed are storing more of the folder and file tree information in memory :
image

Update per second also greatly improves, because there isn't an IO spike anymore, the update per seconds doesn't drop:
image

So, everything is perfect ?

Well ... not really, first of all after 24 hours of runtime, for some reason, some of the nodes started having huge load and reading a lot, for seemingly no reason ? (maybe the kernel removed from memory the index, I don't know)

I think if you have a lot of memory, and disks with more sustained IOPS, and probably a better Kernel than the 3.10 from our machines, it might makes sense for you to try the setting.
As one would say, "100% of winners have tried their luck !"

Then reconfigured everything to default and I tried to have just an index every 6 Hours :
image

Looks pretty good ! And it suits better my disks, as they're meant to have a 30 Mn Burst period every 24 hours.

Conclusion :

First of all, thank you all again for your help.
vfs_cache_pressure is an interesting setting to play with, definitely check it out.

I had a final question, since realtime-index uptade the index well ... in real time, is there any point to regularly run the scan ?
Is the scan only there to update when files are deleted from the disk ? (If that's true, then I could push the scan time even higher as my cluster is cleaned up only once every day.)

Kind regards,

@bom-d-van
Copy link
Member

It's a nice and detailed report. So most of our reasoning appears to be correct.

after 24 hours of runtime, for some reason, some of the nodes started having huge load and reading a lot, for seemingly no reason

Does that coincide with the clean-up on the clusters?

I had a final question, since realtime-index uptade the index well ... in real time, is there any point to regularly run the scan ?
Is the scan only there to update when files are deleted from the disk ? (If that's true, then I could push the scan time even higher as my cluster is cleaned up only once every day.)

Yes, it's for deletions. Eventually we can add a delete api in go-carbon, along with realtime-index and concurrent-index, we can stop disk scanning completely.

All the new logics that we introduced are incremental and slowly evolving. So it might seem odd looking the implementation now. go-carbon starts without in-memory indexing, then it has trigram-index, then trie-index, now with concurrent-index and realtime-index.


One last tip, since you prefer to have reduced disk io caused by indexing. You might also want to try this feature out. file-list-cache caches the disk scan result at the specified filepath. This means that after restart, it doesn't re-scan the whole disk immediately, but trying to read the whole file list from the cached file.

# Cache file list scan data in the specified path. This option speeds
# up index building after reboot by reading the last scan result in file
# system instead of scanning the whole data dir, which could takes up
# most of the indexing time if it contains a high number of metrics (10
# - 40 millions). go-carbon only reads the cached file list once after
# reboot and the cache result is updated after every scan. (EXPERIMENTAL)
file-list-cache = ""

@loitho
Copy link
Contributor Author

loitho commented Feb 8, 2021

Yeah, you were pretty much spot on ! (not that I doubted it, but if anyone stumble on the thread, he'll have some good information backed by graph :) )

Does that coincide with the clean-up on the clusters?

No sadly, it didn't, that's why I found it so odd, nothing very interesting in the logs too.

I think the implementation makes sense, I'm just trying to understand it fully as well as its limitations :)

You might also want to try this feature out. file-list-cache

Ah yes, I read about it and immediately started using it when I changed the cluster configuration for the one proposed by you, it works flawlessly it's really awesome !

So it might seem odd looking the implementation now

It makes sense from an evolutionary stand point, as you gradually add functionalities, but I think that with a bit more precision in the documentation, like explaining the interaction between "old and new functions" like the fact that when enabling realtime-index you can actually bump up the scan frequency because then, its only purpose is to purge the deleted file from the index. Would be interesting

Would you mind if I made a PR to add those informations to the documentation ?

@bom-d-van
Copy link
Member

Would you mind if I made a PR to add those informations to the documentation ?

Yep, it a good idea. Thanks in advance! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants