built-in execution of graphite requests #575

Dieterbe · 2017-03-22T12:02:52Z

this is a prototype. the basics seem to work, but needs more testing and more polish. I also have yet to seriously start on adding a couple functions.

in particular, the parsing and expr tree seems good, it responds with correct looking output for basic (non-function) requests, and the proxying to graphite-api seems to work too. but:

the proxying doesn't handle POST yet
output goes a second too far

Dieterbe · 2017-03-24T11:42:36Z

now the proxy works. if you run launch.sh docker-dev-direct-single-tenant you'll have everything setup to test easily. you can load the metrictank dashboard directly from graphite-api or MT (spoiler alert they look identical)
To my pleasant surprise, I also saw http: proxy error: context canceled messages when loading the MT dashboard and reloading it while requests were still pending. not sure if cancellation propagation works properly through the entire path, but this is something independent of the proxy that we'll have to address at some point, and it looks like out of the box, much of it already works.

I also created a dashboard to see the stats, it looks like so:

two important notes here:

only the first unsupported function is reported. not all of the functions in a given request that we don't support. this simplifies the request parsing
i added a provision to mark certain functions as stable or not. this allows us to have code active to support certain functions, and test them out with real data, but still mark them as unstable and issue proxy requests to graphite.

next up: testing functions, writing some more. and coming up with a nice solution of the slice allocations (copy on write vs copy first and modify in place)

Dieterbe · 2017-03-28T13:11:07Z

OK I think I got everything working properly now. supported functions are alias, sumSeries/sum and averageSeries/avg. the docker-dev-direct-single-tenant docker env makes it easy to test MT and compare to graphite. i added a bunch more unit tests also.

Notes:

I'm not happy with the addition of the QueryPatt, QueryFrom and QueryTo fields to models.Series
it could be cleaned up by embedding the type Req from expr/plan.go (and possibly change models.Req to embed that as well), this would make the code more elegant, as in most places we deal with these 3 attributes as one (e.g. in mergeSeries).
But we also have to think about how this will go to make it a standalone library that grafana can also consume. we'll have to make models.Series reusable or i'm also thinking about making the expr libary taking an interface (with a few methods, Len, Next point etc) instead of a concrete type like models.Series . there might be some performance overhead though)
graphite has from exclusive and until inclusive. MT's native render api (exposed to graphite-api) has from inclusive and to/until exclusive. so now we're extending that api and when treating it as graphite, our from/to are off by one second.

TODO:

docs
performance validation

see grafana/metrictank#575

Dieterbe · 2017-04-12T21:59:38Z

now did some performance testing as well. see the script in the last commit.
looks good to me. although going through graphite was surprisingly slow and I have no idea why. graphite shouldn't be this slow. but anyway the benchmarks show MT is always faster and proxy overhead is not significant.

~/g/s/g/r/m/c/mt-index-cat ❯❯❯ ../../docker/docker-dev-direct-single-tenant/bench.sh
run this from inside a directory that has the mt-index-cat binary
before running this, make sure you run the following command and you saw the 'now doing realtime' message
fakemetrics -carbon-tcp-address localhost:2003 -statsd-addr localhost:8125 -shard-org -keys-per-org 100 -speedup 200 -stop-at-now -offset 1h && echo "now doing realtime" && fakemetrics -carbon-tcp-address localhost:2003 -statsd-addr localhost:8125 -shard-org -keys-per-org 100
and verify http://localhost:3000/dashboard/db/fake-metrics-data
press any key to continue

>>>> 1A: MT simple series requests
Requests      [total, rate]            500, 10.02
Duration      [total, attack, wait]    49.90810306s, 49.899999825s, 8.103235ms
Latencies     [mean, 50, 95, 99, max]  8.495676ms, 8.506675ms, 10.064634ms, 22.706379ms, 52.404426ms
Bytes In      [total, mean]            34625372, 69250.74
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:500  
Error Set:

>>>> 1B: graphite simple series requests
Requests      [total, rate]            500, 10.02
Duration      [total, attack, wait]    49.946537572s, 49.899999912s, 46.53766ms
Latencies     [mean, 50, 95, 99, max]  30.297993ms, 28.507309ms, 48.112856ms, 65.825307ms, 68.895464ms
Bytes In      [total, mean]            38120154, 76240.31
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:500  
Error Set:

>>>> 2A MT sumSeries(patterns.*) (no proxying)
Requests      [total, rate]            6250, 250.04
Duration      [total, attack, wait]    25.00198183s, 24.995999831s, 5.981999ms
Latencies     [mean, 50, 95, 99, max]  5.483174ms, 4.315464ms, 18.445995ms, 30.602304ms, 43.466792ms
Bytes In      [total, mean]            347411119, 55585.78
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:6250  
Error Set:

>>>> 2B same but lower load since graphite cant do much
Requests      [total, rate]            125, 5.04
Duration      [total, attack, wait]    24.807791656s, 24.799999789s, 7.791867ms
Latencies     [mean, 50, 95, 99, max]  9.601537ms, 8.472735ms, 22.204463ms, 23.863076ms, 42.246818ms
Bytes In      [total, mean]            7663595, 61308.76
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:125  
Error Set:

>>>> 2C graphite sumSeries(patterns.*)
Requests      [total, rate]            250, 5.02
Duration      [total, attack, wait]    50.099044783s, 49.799999926s, 299.044857ms
Latencies     [mean, 50, 95, 99, max]  152.935257ms, 34.399036ms, 970.421343ms, 1.334126438s, 1.715109579s
Bytes In      [total, mean]            17044595, 68178.38
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:250  
Error Set:

>>>> 3A MT load needing proxying
Requests      [total, rate]            250, 5.02
Duration      [total, attack, wait]    49.83281577s, 49.799999888s, 32.815882ms
Latencies     [mean, 50, 95, 99, max]  34.041134ms, 31.943184ms, 53.609206ms, 68.950639ms, 71.411507ms
Bytes In      [total, mean]            19122578, 76490.31
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:250  
Error Set:

>>>> 3B graphite directly to see proxying overhead
Requests      [total, rate]            250, 5.02
Duration      [total, attack, wait]    49.834155561s, 49.799999746s, 34.155815ms
Latencies     [mean, 50, 95, 99, max]  33.519563ms, 31.67443ms, 51.330246ms, 70.272948ms, 71.695535ms
Bytes In      [total, mean]            19146048, 76584.19
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:250  
Error Set:

here's the corresponding snapshot https://snapshot.raintank.io/dashboard/snapshot/Jh5Zl91d5gCIYXoUtsCQ1Owu6mW0C0As

mostly just parsing of expressions

* track the query pattern used along with a req, so tie back the results to the query later. * series also needs to keep track of the query that caused the series to be created, so that the processing functions can efficiently retrieve the series for the given input parameters * adjust merging so that we only merge series if they correspond to the same request from the same functions.

similar to the change we made to graphite-raintank, sometimes (especially in dev) it's nice to run MT such that it always assumes orgId 1, without requiring something like tsdbgw in front to do auth.

requests we can now successfully use MT directly from browser, with dynamic proxying!

* this way all data gets saved back to the buffer pool, and only once. whether the data was in the input data, whether the data made it through to the end (e.g. individual inputs to a sum generally don't unless there's only 1 input), or whether the data was generated by a processing function * in the future we'll be able to use this as temporary result cache so that steps with some shared logic don't need to redo the processing.

replay · 2017-04-13T22:46:36Z

api/graphite.go

+	out = mergeSeries(out)
+
+	// instead of waiting for all data to come in and then start processing everything, we could consider starting processing earlier, at the risk of doing needless work
+	// if we need to cancel the request due to a fetch error


assuming that there shouldn't be too many fetch errors i'd say that would probably make sense. but i don't think it should be a blocker for merging this.

yeah. but that would complicate things. building a processing system that is pipelined. more of a long term / low prio item IMHO.

replay · 2017-04-13T23:16:13Z

api/graphite_proxy.go

+	graphiteProxy.ModifyResponse = func(resp *http.Response) error {
+		// if kept, would be duplicated. and duplicated headers are illegal)
+		resp.Header.Del("access-control-allow-credentials")
+		resp.Header.Del("Access-Control-Allow-Origin")


it looks a little confusing that the header on :43 is with lower case and :44 is with uppercase letters. .Del() should be case insensitive anyway

replay · 2017-04-13T23:39:22Z

expr/plan.go

+func (p Plan) Clean() {
+	for _, series := range p.input {
+		for _, serie := range series {
+			pointSlicePool.Put(serie.Datapoints[:0])


I don't understand that [:0]. Isn't that always just creating an empty list?

it creates an empty list by creating a slice value, backed by (using a pointer to) the same underlying array. so when we pull the value out of the pool next time, we have a slice of len zero that we can immediately start adding elements to without allocations, because we'll reuse the same backing array.

btw i think we'll have to switch to putting pointers to slices in the pool. see https://github.com/dominikh/go-tools/tree/master/cmd/staticcheck#sa9000--storing-non-pointer-values-in-syncpool-allocates-memory

replay · 2017-04-13T23:41:29Z

api/init.go

+	"gopkg.in/raintank/schema.v1"
+)
+
+const defaultPointSliceSize = 2000


The idea of the pool is very nice, but i think the limit of 2000 seems a little arbitrary. Would it make sense to make that based on some setting like f.e. maxPointsPerReqHard

maxPointsPerReqHard is per request. but a pointslice is a slice of points for a given series.
you're right that it's somewhat arbitrary. you want it low enough to not waste too much memory allocating slices with space that we don't use, and high enough to avoid expensive re-allocation and copying as we need to expand a slice as it gets filled; though the cost of this gets amortized as we can reuse the slice (incl backing array) over time so starting out a little low is not so problematic.
it's hard to generalize how many points will typically be returned for any given series, across different requests, but I think this is a good enough value that we don't have to worry about for a while.

we currently keep meters of the number of series per request and number of points per request (reqRenderSeriesCount, reqRenderPointsFetched). I think we should also keep counters of pointsFetched and number of series fetched.

the current metrics give us good insight into query complexity, but the counters would provide a good measure of overall request workload.

replay · 2017-04-13T23:49:25Z

api/middleware/capturebody.go

+)
+
+func CaptureBody(c *Context) {
+	body, _ := ioutil.ReadAll(c.Req.Request.Body)


this seems like a potential error should be handled somehow so we don't pass an "invalid" body into the request handlers

replay · 2017-04-14T00:04:24Z

docker/docker-cluster/metrictank.ini

@@ -132,6 +132,10 @@ key-file = /etc/ssl/private/ssl-cert-snakeoil.key
 max-points-per-req-soft = 1000000
 # limit of number of datapoints a request can return. Requests that exceed this limit will be rejected. (0 disables limit)
 max-points-per-req-hard = 20000000
+# require x-org-id authentication to auth as a specific org. otherwise orgId 1 is assumed")


there's a ) without (

replay · 2017-04-14T00:17:12Z

expr/expr.go

+	case etString:
+		return fmt.Sprintf("%sexpr-string %q", space, e.valStr)
+	}
+	return "HUH-SHOULD-NEVER-HAPPEN"


lol, "this is not a bug, it's a feature"

replay · 2017-04-14T03:13:50Z

expr/func_avgseries.go

+		Datapoints: out,
+		Interval:   series[0].Interval,
+	}
+	cache[Req{}] = append(cache[Req{}], output)


the idea to cache that here is cool. but i can't see where this cache is read, am i missing something?
the only place where i can see this is accessed is in Plan.Clean(), but there it just extracts empty slices from it.

ah, maybe the variable should rather be named something like pool instead of cache and actually the output elements are only appended to it to reuse the allocated memory later, but not to reuse the generated values, right?

see the comment for the generated property of type Plan, which is what is passed in here. you're right that right now we don't have caching implemented yet, but i do foresee this to be the main use in the future (as explained in that comment).

Dieterbe · 2017-04-18T00:22:45Z

thanks for all the good feedback @replay . now see the few commits i pushed.

please let me know any final remarks, otherwise this will get merged tomorrow! cc @woodsaj

woodsaj · 2017-04-18T17:32:29Z

expr/funcs.go

+)
+
+type Func interface {
+	Signature() ([]argType, []argType)


why are their two argType slices? one for input, one for output?

woodsaj · 2017-04-18T17:35:55Z

expr/funcs.go

+	Signature() ([]argType, []argType)
+	// what can be assumed to have been pre-validated: len of args, and basic types (e.g. seriesList)
+	Init([]*expr) error                                                  // initialize and validate arguments (expressions given by user), for functions that have specific requirements
+	Depends(from, to uint32) (uint32, uint32)                            // allows a func to express its dependencies


how are two uint32 numbers a dependency?

couldn't find a better way to phrase it, but the idea is that an invocation like movingAverage(foo, "5min") with from=x and to=y can declare it "depends" on data from=x-5min to y as input.

so shouldnt the function just be called TsRange(oringalFrom, originalTo uint32) (uint32, uint32)

woodsaj · 2017-04-18T17:36:54Z

expr/funcs.go

+	// what can be assumed to have been pre-validated: len of args, and basic types (e.g. seriesList)
+	Init([]*expr) error                                                  // initialize and validate arguments (expressions given by user), for functions that have specific requirements
+	Depends(from, to uint32) (uint32, uint32)                            // allows a func to express its dependencies
+	Exec(map[Req][]models.Series, ...interface{}) ([]interface{}, error) // execute the function with its arguments. functions and names resolved to series, other expression types(constant, strings, etc) are still the bare expressions that were also passed into Init()


What is this function supposed to return?

i'll push a commit that explains all these functions better

Dieterbe · 2017-04-18T18:23:18Z

update re graphite perf trouble, for those who are interested:

the benchmark script calls mt-index-cat in a way where it generates queries with * at random places. if you're unlucky, the '*' is at the end which expands into 100 series which have to be summed. I consider this a fair, realistic and diverse workload, but it seems graphite handles it poorly, to the extent that a lot of the requests it needs to process are affected by one slow query.
however, even excluding that particular pattern, and keeping it to requests that only do sums over 1 series sometimes result in fast and sometimes in slow requests: see
gentle-load-for-graphite.txt
in particular the 2nd run which has mean and median of 8s.
there was no performance increase by running the currently stable graphite-MT combo.

UPDATE: same requests (with ending wildcard removed), against MT gives:

cat gentle-load-for-graphite.txt | vegeta attack -rate 10 -duration 10s | vegeta report
Requests      [total, rate]            100, 10.10
Duration      [total, attack, wait]    9.909619504s, 9.899999659s, 9.619845ms
Latencies     [mean, 50, 95, 99, max]  8.093534ms, 8.580528ms, 13.198168ms, 14.393088ms, 14.799109ms
Bytes In      [total, mean]            6019911, 60199.11
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  100.00%
Status Codes  [code:count]             200:100  
Error Set:

replay

Looked at the commits that you pushed after my bunch of comments. That all looks good, but i still added one comment

replay · 2017-04-19T14:34:26Z

api/graphite.go

-				}
-				for _, def := range metric.Defs {
-					locatedDefs[r][def.Id] = locatedDef{def, s.Node}
+				for _, archive := range metric.Defs {


Do you not need to skip the non-leaf nodes?

not really . leaf property is just a shorthand for len(metrics.Def) == 0

introduced via 929513c / 174 became obsolete in b3d3831 / #575 / 0.7.1-61-gbe64803e we no longer really log requests ourselves anymore. i imagine at some point we probably will, at which point we can re-introduce this setting (our qa tool `unused` notified that this variable wasn't being read anywhere)

introduced via 929513c / #174 became obsolete in b3d3831 / #575 / 0.7.1-61-gbe64803e we no longer really log requests ourselves anymore. i imagine at some point we probably will, at which point we can re-introduce this setting (our qa tool `unused` notified that this variable wasn't being read anywhere)

Dieterbe force-pushed the expr branch 2 times, most recently from f407997 to c5cc8bc Compare March 24, 2017 11:32

Dieterbe force-pushed the expr branch 5 times, most recently from 92a8ffb to 7264244 Compare March 28, 2017 08:33

Dieterbe requested review from replay and woodsaj March 28, 2017 13:11

Dieterbe force-pushed the expr branch from 36254c1 to 4a7d028 Compare March 28, 2017 13:24

Dieterbe added a commit to raintank/graphite-metrictank that referenced this pull request Apr 12, 2017

use /get instead of /render

440ad71

see grafana/metrictank#575

Dieterbe mentioned this pull request Apr 12, 2017

use /get instead of /render raintank/graphite-metrictank#45

Merged

Dieterbe force-pushed the expr branch from 0009f39 to cd10c70 Compare April 12, 2017 21:53

Dieterbe added 14 commits April 12, 2017 18:27

graphite expression parser

b3d3831

mostly just parsing of expressions

proxy render requests if we can't fulfill them.

ce63cf8

make multi-tenancy optional

ca5e60f

similar to the change we made to graphite-raintank, sometimes (especially in dev) it's nice to run MT such that it always assumes orgId 1, without requiring something like tsdbgw in front to do auth.

add env like docker-dev but for direct-to-MT DS in single tenant mode

4872064

add avg func

b241a02

fix duplicate access control headers leading to browser blocking

3cd26ea

requests we can now successfully use MT directly from browser, with dynamic proxying!

stats and cleaner code

7d026d9

add render requests proxying dashboard

f7fadc7

use copy-on-write and shared pointSlicePool

fa6c383

alias func

ecab590

unit tests for sum

f5a50ea

better errors, fix tests, cleaner output, fix bugs

e55f39e

document functions

b415597

replay reviewed Apr 13, 2017

View reviewed changes

replay reviewed Apr 14, 2017

View reviewed changes

Dieterbe added 4 commits April 17, 2017 19:35

simplify 2 loops into one

ead6de0

consistent casing

ae3184b

properly handle failure to read request body

237fd00

formatting fix

df3bdc7

woodsaj reviewed Apr 18, 2017

View reviewed changes

Dieterbe force-pushed the expr branch from 23f7dc6 to d3b8604 Compare April 18, 2017 21:20

Dieterbe added 2 commits April 18, 2017 17:26

clarify a bunch of things

75bbe2b

simplify the expr type and be more explicit

27fa28a

Dieterbe force-pushed the expr branch from d3b8604 to 27fa28a Compare April 18, 2017 21:28

replay approved these changes Apr 19, 2017

View reviewed changes

replay reviewed Apr 19, 2017

View reviewed changes

Dieterbe merged commit be64803 into master Apr 19, 2017

Dieterbe mentioned this pull request May 9, 2017

Exp3 refactor exp #631

Closed

Dieterbe deleted the expr branch September 18, 2018 09:00

Dieterbe mentioned this pull request Apr 10, 2019

remove log-min-dur #1275

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

built-in execution of graphite requests #575

built-in execution of graphite requests #575

Dieterbe commented Mar 22, 2017 •

edited

Loading

Dieterbe commented Mar 24, 2017 •

edited

Loading

Dieterbe commented Mar 28, 2017

Dieterbe commented Apr 12, 2017 •

edited

Loading

replay Apr 13, 2017 •

edited

Loading

Dieterbe Apr 13, 2017

replay Apr 13, 2017

replay Apr 13, 2017

replay Apr 13, 2017 •

edited

Loading

Dieterbe Apr 17, 2017

Dieterbe Apr 18, 2017

replay Apr 13, 2017

Dieterbe Apr 17, 2017

woodsaj Apr 18, 2017 •

edited

Loading

replay Apr 13, 2017

replay Apr 14, 2017

replay Apr 14, 2017

replay Apr 14, 2017 •

edited

Loading

replay Apr 14, 2017 •

edited

Loading

Dieterbe Apr 18, 2017

Dieterbe commented Apr 18, 2017

woodsaj Apr 18, 2017

woodsaj Apr 18, 2017

Dieterbe Apr 18, 2017

woodsaj Apr 18, 2017

woodsaj Apr 18, 2017

Dieterbe Apr 18, 2017

Dieterbe commented Apr 18, 2017 •

edited

Loading

replay left a comment •

edited

Loading

replay Apr 19, 2017 •

edited

Loading

Dieterbe Apr 19, 2017

built-in execution of graphite requests #575

built-in execution of graphite requests #575

Conversation

Dieterbe commented Mar 22, 2017 • edited Loading

Dieterbe commented Mar 24, 2017 • edited Loading

Dieterbe commented Mar 28, 2017

Dieterbe commented Apr 12, 2017 • edited Loading

replay Apr 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Apr 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

woodsaj Apr 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Apr 14, 2017 • edited Loading

Choose a reason for hiding this comment

replay Apr 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Apr 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Apr 18, 2017 • edited Loading

replay left a comment • edited Loading

Choose a reason for hiding this comment

replay Apr 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Mar 22, 2017 •

edited

Loading

Dieterbe commented Mar 24, 2017 •

edited

Loading

Dieterbe commented Apr 12, 2017 •

edited

Loading

replay Apr 13, 2017 •

edited

Loading

replay Apr 13, 2017 •

edited

Loading

woodsaj Apr 18, 2017 •

edited

Loading

replay Apr 14, 2017 •

edited

Loading

replay Apr 14, 2017 •

edited

Loading

Dieterbe commented Apr 18, 2017 •

edited

Loading

replay left a comment •

edited

Loading

replay Apr 19, 2017 •

edited

Loading