Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trie Index + DFA for supporting globbing/wildcard queries #303

Merged
merged 51 commits into from Nov 4, 2019

Conversation

@bom-d-van
Copy link
Contributor

bom-d-van commented Jul 8, 2019

DFA implementation inspired by rsc: https://swtch.com/~rsc/regexp/regexp1.html

An alternative metric indexing implementation to trigram index. This implementation is built to support 10+ millions metrics per go-carbon instance (which becomes feasible after cwhisper support).

Currently only benchmark against a subset of our production queries, but already shows good potential. 92% of the queries is 1.34 - 3300.71 times faster. And the (1.02 - 18.29 times) slower queries is because of using leading star in a query nodes, which is possible to be re-written to become faster than the original query in trigram.

But still, more tests and changes are on the way.

Linking relevant issues that might be able to be resolved by this new implementation:
#202
#301

@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 8, 2019

cannot apply patch to target branch

carbon/config.go Outdated Show resolved Hide resolved
@dgryski

This comment has been minimized.

Copy link

dgryski commented Jul 8, 2019

Is there a point where the trigram index is better or is the trie index always superior?

@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 9, 2019

cannot apply patch to target branch

@azhiltsov

This comment has been minimized.

Copy link
Collaborator

azhiltsov commented Jul 9, 2019

@dgryski,
@gksinghjsr wrote a performance test for the trigram lib.
He probably could run it against trie to compare them in the same scenarios.

@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 11, 2019

cannot apply patch to target branch

5 similar comments
@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 11, 2019

cannot apply patch to target branch

@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 15, 2019

cannot apply patch to target branch

@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 21, 2019

cannot apply patch to target branch

@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 21, 2019

cannot apply patch to target branch

@codelingo

This comment has been minimized.

Copy link

codelingo bot commented Jul 23, 2019

cannot apply patch to target branch

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
carbonserver/carbonserver.go Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@gaoxp

This comment has been minimized.

Copy link

gaoxp commented Aug 14, 2019

hi,when i use trie index instead of trigram index i find a problem, when i find metrics like a.b.*.*.*.c, goroutine which serve this request will panic,do you have same problem?

also i find when go-carbon run one day or two day, carbonserver accour too many close-wait tcp connection, i don't understand why so many close-wait connections.

@azhiltsov

This comment has been minimized.

Copy link
Collaborator

azhiltsov commented Aug 14, 2019

also i find when go-carbon run one day or two day, carbonserver accour too many close-wait tcp connection, i don't understand why so many close-wait connections.

Trie enabled version of go-carbon is not in production in the Booking currently. We are running 9b6b133 (current master) and don;t observe this problem. Can you confirm that you observe the same problem with 9b6b133 and if so - open a separate issue?

@bom-d-van

This comment has been minimized.

Copy link
Contributor Author

bom-d-van commented Aug 14, 2019

hi,when i use trie index instead of trigram index i find a problem, when i find metrics like a.b...*.c, goroutine which serve this request will panic,do you have same problem?

@gaoxp Is it possible for you to share us the stack trace?

@gaoxp

This comment has been minimized.

Copy link

gaoxp commented Aug 15, 2019

hi,when i use trie index instead of trigram index i find a problem, when i find metrics like a.b...*.c, goroutine which serve this request will panic,do you have same problem?

@gaoxp Is it possible for you to share us the stack trace?

hi,when i use trie index instead of trigram index i find a problem, when i find metrics like a.b...*.c, goroutine which serve this request will panic,do you have same problem?

@gaoxp Is it possible for you to share us the stack trace?

I try to fix this problem, the stack trace may different from original code,but it is sure that use trie index will cause this panic problem.
STACK TRACE:
2019/08/14 14:00:43 http: panic serving 10.120.197.17:38442: runtime error: invalid memory a
ddress or nil pointer dereference
goroutine 569 [running]:
net/http.(*conn).serve.func1(0xc4202180a0)
/home/yd/golang/src/net/http/server.go:1726 +0xd0
panic(0xbcc120, 0x122ef90)
/home/yd/golang/src/runtime/panic.go:505 +0x229
github.com/lomik/go-carbon/carbonserver.(*CarbonserverListener).expandGlobsTrie(0xc4202daa00
, 0xc42a41c0c0, 0x32, 0x1, 0xc42046a1b0, 0x9, 0x9, 0xc42a41c0c0, 0x32, 0x0, ...)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/carbonserver/trie
.go:862 +0x217
github.com/lomik/go-carbon/carbonserver.(*CarbonserverListener).expandGlobs(0xc4202daa00, 0x
c42a41c0c0, 0x32, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/carbonserver/carb
onserver.go:840 +0x33d
github.com/lomik/go-carbon/carbonserver.(*CarbonserverListener).findMetrics(0xc4202daa00, 0x
c420405ce0, 0xbf4d06c2f720586b, 0x1b1d3386e, 0x1246780, 0x2, 0xc420212a40, 0x1, 0x1, 0x0, ..
.)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/carbonserver/find
.go:213 +0x159
github.com/lomik/go-carbon/carbonserver.(*CarbonserverListener).findHandler(0xc4202daa00, 0x
d95460, 0xc42cd1c020, 0xc42020e300)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/carbonserver/find
.go:143 +0x2059
github.com/lomik/go-carbon/carbonserver.(*CarbonserverListener).(github.com/lomik/go-carbon/
carbonserver.findHandler)-fm(0xd95460, 0xc42cd1c020, 0xc42020e300)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/carbonserver/carb
onserver.go:1189 +0x48
net/http.HandlerFunc.ServeHTTP(0xc42040e040, 0xd95460, 0xc42cd1c020, 0xc42020e300)
/home/yd/golang/src/net/http/server.go:1947 +0x44
github.com/lomik/go-carbon/carbonserver.TraceHandler.func1(0xd989a0, 0xc420286000, 0xc42020e
200)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/carbonserver/carb
onserver.go:170 +0x2c1
github.com/lomik/go-carbon/vendor/github.com/dgryski/httputil.TimeHandler.func1(0xd989a0, 0x
c420286000, 0xc42020e200)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/vendor/github.com
/dgryski/httputil/times.go:26 +0x7f
github.com/lomik/go-carbon/vendor/github.com/dgryski/httputil.TrackConnections.func1(0xd989a
0, 0xc420286000, 0xc42020e200)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/vendor/github.com
/dgryski/httputil/track.go:40 +0xea
net/http.HandlerFunc.ServeHTTP(0xc42040e060, 0xd989a0, 0xc420286000, 0xc42020e200)
/home/yd/golang/src/net/http/server.go:1947 +0x44
net/http.(*ServeMux).ServeHTTP(0xc420406030, 0xd989a0, 0xc420286000, 0xc42020e200)
/home/yd/golang/src/net/http/server.go:2337 +0x130
github.com/lomik/go-carbon/vendor/github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1
(0xd989a0, 0xc420286000, 0xc42020e200)
/home/gaoxiaopeng/go-carbon/_vendor/src/github.com/lomik/go-carbon/vendor/github.com
/NYTimes/gziphandler/gzip.go:293 +0x263
net/http.HandlerFunc.ServeHTTP(0xc420406120, 0xd989a0, 0xc420286000, 0xc42020e200)
/home/yd/golang/src/net/http/server.go:1947 +0x44
net/http.serverHandler.ServeHTTP(0xc420434000, 0xd989a0, 0xc420286000, 0xc42020e200)
/home/yd/golang/src/net/http/server.go:2694 +0xbc
net/http.(*conn).serve(0xc4202180a0, 0xd997a0, 0xc4200d72c0)
/home/yd/golang/src/net/http/server.go:1830 +0x651
created by net/http.(*Server).Serve
/home/yd/golang/src/net/http/server.go:2795 +0x27b

@gaoxp

This comment has been minimized.

Copy link

gaoxp commented Aug 15, 2019

also i find when go-carbon run one day or two day, carbonserver accour too many close-wait tcp connection, i don't understand why so many close-wait connections.

Trie enabled version of go-carbon is not in production in the Booking currently. We are running 9b6b133 (current master) and don;t observe this problem. Can you confirm that you observe the same problem with 9b6b133 and if so - open a separate issue?

also i find when go-carbon run one day or two day, carbonserver accour too many close-wait tcp connection, i don't understand why so many close-wait connections.

Trie enabled version of go-carbon is not in production in the Booking currently. We are running 9b6b133 (current master) and don;t observe this problem. Can you confirm that you observe the same problem with 9b6b133 and if so - open a separate issue?

I don't observe the same problem with [9b6b133],just for trie-index version.

@bom-d-van

This comment has been minimized.

Copy link
Contributor Author

bom-d-van commented Aug 15, 2019

@gaoxp I am unable to reproduce your problem. But from the stack trace, it looks like trieIdx isn't initialized.

@gaoxp

This comment has been minimized.

Copy link

gaoxp commented Aug 15, 2019

ur pro

@gaoxp I am unable to reproduce your problem. But from the stack trace, it looks like trieIdx isn't initialized.

ok, I use trigram index instead of trie index when i find a.b.*.*.*.f,and now it runs normal,thank you

@bom-d-van bom-d-van force-pushed the bom-d-van:trie-index2 branch 3 times, most recently from 291ffc1 to a28376a Aug 22, 2019
@gaoxp

This comment has been minimized.

Copy link

gaoxp commented Aug 30, 2019

hello @bom-d-van, can you add fallback to filesystem logic when trie index not finish initial,becase i find when trie index not initialled, carbonserver not work well, and it will take some time to complete trie index initial,during this time, i can't see any metric use grafana.

@bom-d-van

This comment has been minimized.

Copy link
Contributor Author

bom-d-van commented Aug 30, 2019

@gaoxp it's a valid point, I will add the relevant logics.

@bom-d-van

This comment has been minimized.

Copy link
Contributor Author

bom-d-van commented Sep 2, 2019

hello @bom-d-van, can you add fallback to filesystem logic when trie index not finish initial,becase i find when trie index not initialled, carbonserver not work well, and it will take some time to complete trie index initial,during this time, i can't see any metric use grafana.

@gaoxp I have pushed a simple fix for it. Though this is a problem shared by both trigram and trie indexing.

@bom-d-van

This comment has been minimized.

Copy link
Contributor Author

bom-d-van commented Sep 2, 2019

Is there a point where the trigram index is better or is the trie index always superior?

@dgryski after some production testing and more understandings that I gained while working on this, my answer to this question boils down to it depends.

There are at least one types of queries that I know of trigram performs better, that is if there is keywords extracted from the query that is able to narrow down the result. (For example, abc.keyword1.xxx.yyy.zzz, and there are a high numbers of metrics matching this pattern abc..xxx.yyy.zzz), because there are leading stars in the query, trie+dfa would have too go through every one of them. (thus we incorporated some levels of trigrams in trie nodes and performs some early checks to make it works a little bit better for trie+dfa, (however, I think it might not be a very good and generic solution))

However, there is some clear advantage on trie+dfa, which is less memory consumption (around 1/5 of trigram) and faster indexing time (also around 1/5 of trigram). And in the majorities of the queries that I found in our production environment, trie+dfa is a clear winner. (in our latest test on one of the cluster, trie+dfa's 99th percentile of request time is around 1/5 of trigram)

@azhiltsov is helping on testing it on more our prod clusters to have more understandings and feedback.

@bom-d-van

This comment has been minimized.

Copy link
Contributor Author

bom-d-van commented Sep 2, 2019

@azhiltsov @lomik @dgryski I think this PR is mature enough for a formal review now. There might be unknown bugs and more optimizations could be done, but we could address them when we run into them.

@bom-d-van bom-d-van marked this pull request as ready for review Sep 2, 2019
@bom-d-van bom-d-van force-pushed the bom-d-van:trie-index2 branch from 5daee55 to 18d7057 Sep 2, 2019
@azhiltsov azhiltsov changed the title WIP: Trie Index + DFA for supporting globbing/wildcard queries Trie Index + DFA for supporting globbing/wildcard queries Sep 3, 2019
@bom-d-van bom-d-van force-pushed the bom-d-van:trie-index2 branch from 0eec86a to b887bb0 Oct 4, 2019
@dgryski

This comment has been minimized.

Copy link

dgryski commented Oct 10, 2019

Does this need fuzz tests for the query engine?

@bom-d-van

This comment has been minimized.

Copy link
Contributor Author

bom-d-van commented Oct 10, 2019

@dgryski It's a nice idea. Given both metric paths and queries are user inputs, fuzz test might help!

@bom-d-van bom-d-van force-pushed the bom-d-van:trie-index2 branch from 3a68064 to a89cca6 Oct 17, 2019
… in expandGlobs

* should consider making a fuzzer test soon!
* panic crash in expandGlobs seems to be introduced by commit c8c4b39
@azhiltsov azhiltsov merged commit 24b937f into lomik:master Nov 4, 2019
1 check was pending
1 check was pending
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.