Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A concurrent version of trie index #334

Merged
merged 10 commits into from Oct 20, 2020
Merged

Conversation

bom-d-van
Copy link
Member

@bom-d-van bom-d-van commented Feb 11, 2020

This is a follow up to #332

With this change, go-carbon no longer requires two versions of trie tree in memory. So memory overhead could be cut down even further in theory. More measurement needed to have some numbers.

This could also make new metrics available in "realtime" without waiting for re-scanning-and-indexing to be finished.

carbon/config.go Outdated Show resolved Hide resolved
@deniszh
Copy link
Member

deniszh commented Jul 8, 2020

Is this PR still relevant, @bom-d-van @azhiltsov ?

@azhiltsov
Copy link
Member

We are currently running two types or indexes in production

  1. Trie + trigram (majority of cases and gives quite significant performance boost on hosts storing 5KK+ metrics)
  2. Pure trigram for clusters badly affected by improper metric namespace design + a lot of queries like foo.{sigma,delta}bar.*.bla.*bla.car

Since this feature is promising to cut the memory footprint and also remove the need in metric scan every 5 minutes or so, I see a lot value in it. However I do not know how much of the time and effort from the side of @bom-d-van will be required to finish it.

@bom-d-van
Copy link
Member Author

@deniszh @azhiltsov Thanks for the input. If there is a chance of adoption. I am happy to pull it off.

Last time when I was testing it, there seemed to be some memory leaks going on. Again, gonna take some time to have updates. I think I was attacking too many issues/features in the go graphite stack at the same time. Sorry if it bothers you. (I was constantly looking for interesting stuff to work on but new ideas keep showing up).

And big thanks for @deniszh for doing all the following up and tough works.

@deniszh
Copy link
Member

deniszh commented Jul 9, 2020

Again, much appreciated to @bom-d-van for pushing this forward, no rush, let's return to this later. Thanks!

* file nodes are no longer a global variable as pruning required a mark value (extra memory costs: o(n))
* pruning childrens on live trie nodes rather than the walking copy (this was causing memory leak as many old nodes are properly pruned)
* rename trieNode.mark to gen
* prune: fix merging logics
* insert: fix incorrect gen syncing for splits
* imporve prune and real test (check both dump and node count after prune)
* fix ci errors
* handle potential index out of bound in all the trie walking funcs
* refactor test logs
@bom-d-van bom-d-van marked this pull request as ready for review October 20, 2020 06:59
@bom-d-van
Copy link
Member Author

bom-d-van commented Oct 20, 2020

There are three new features introduced in this PR, two for trie index: realtime-index and concurrent-index. one for both index: file-list-cache.

edit: comment repost: #374 (comment)

@bom-d-van bom-d-van marked this pull request as draft October 20, 2020 08:05
@bom-d-van bom-d-van self-assigned this Oct 20, 2020
@bom-d-van bom-d-van marked this pull request as ready for review October 20, 2020 08:28
README.md Outdated Show resolved Hide resolved
deploy/go-carbon.conf Outdated Show resolved Hide resolved
@deniszh deniszh removed the WIP label Oct 20, 2020
@deniszh
Copy link
Member

deniszh commented Oct 20, 2020

Thanks, merging this, @bom-d-van !

@deniszh deniszh merged commit 65e955e into go-graphite:master Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants