fix(models): filter out reserved tag keys in points parser #16412

GeorgeMac · 2020-01-06T10:40:49Z

This is an improvement to the validation done when parsing points. Certain tag keys are reserved and should not be used. This adds a validation step to the points parsing in the models package.
This step throws an error whenever any of the following tag keys are used:
"\xff"
"\x00"
"_measurement"
"_field"
"time"

Before this change was introduced the parsing benchmarks looked like this:

goos: darwin
goarch: amd64
pkg: github.com/influxdata/influxdb/models
BenchmarkParsePointsTagsSorted2-16       	 2000000	       664 ns/op	  76.76 MB/s
BenchmarkParsePointsTagsSorted5-16       	 2000000	       855 ns/op	  97.07 MB/s
BenchmarkParsePointsTagsSorted10-16      	 1000000	      1206 ns/op	 118.54 MB/s
BenchmarkParsePointsTagsUnSorted2-16     	 2000000	       789 ns/op	  64.57 MB/s
BenchmarkParsePointsTagsUnSorted5-16     	 1000000	      1117 ns/op	  74.26 MB/s
BenchmarkParsePointsTagsUnSorted10-16    	 1000000	      1838 ns/op	  77.78 MB/s
PASS
ok  	github.com/influxdata/influxdb/models	11.309s

Afterwards they look like this:

goos: darwin
goarch: amd64
pkg: github.com/influxdata/influxdb/models
BenchmarkParsePointsTagsSorted2-16       	 2000000	       745 ns/op	  68.38 MB/s
BenchmarkParsePointsTagsSorted5-16       	 2000000	       993 ns/op	  83.52 MB/s
BenchmarkParsePointsTagsSorted10-16      	 1000000	      1475 ns/op	  96.94 MB/s
BenchmarkParsePointsTagsUnSorted2-16     	 2000000	       874 ns/op	  58.33 MB/s
BenchmarkParsePointsTagsUnSorted5-16     	 1000000	      1243 ns/op	  66.73 MB/s
BenchmarkParsePointsTagsUnSorted10-16    	 1000000	      2037 ns/op	  70.18 MB/s
PASS
ok  	github.com/influxdata/influxdb/models	12.796s

So there is some slow down (roughly 10 - 15% slower). Any suggestions for improvements welcomed.

CHANGELOG.md updated with a link to the PR (not the Issue)
Well-formatted commit messages
Rebased/mergeable
Tests pass
http/swagger.yml updated (if modified Go structs or API)
Documentation updated or issue created (provide link to issue/pr)
Signed CLA (if not already signed)

stuartcarnie

@GeorgeMac this looks good to me 👍

GeorgeMac · 2020-01-08T09:25:58Z

@stuartcarnie Is there anything we need to do or be prepared for in order to roll this out safely?

e-dard

@GeorgeMac this seems like a pretty reasonable change, and the slowdown is to be expected. I'm not overly concerned about it since parsing is never really the ingest bottleneck.

It might be possible to eek a few more ns out by moving the checks into scanTagsKey and perhaps do them inline, but I wouldn't worry about that now.

One other suggestion: benchstat is good for statistically comparing results: $ go get golang.org/x/perf/cmd/benchstat.

Something like:

$ go -benchmem -count 10 -run=^$ github.com/influxdata/influxdb/models -bench ^BenchmarkParsePointsTags > /tmp/master.txt
$ go -benchmem -count 10 -run=^$ github.com/influxdata/influxdb/models -bench ^BenchmarkParsePointsTags > /tmp/gm.txt
$
$ benchstat /tmp/master.txt /tmp/gm.txt
name                         old time/op    new time/op    delta
ParsePointsTagsSorted2-8        704ns ± 1%     720ns ± 1%   +2.24%  (p=0.000 n=9+8)
ParsePointsTagsSorted5-8        893ns ± 0%     979ns ± 1%   +9.64%  (p=0.000 n=10+10)
ParsePointsTagsSorted10-8      1.21µs ± 1%    1.49µs ± 0%  +22.98%  (p=0.000 n=10+7)
ParsePointsTagsUnSorted2-8      817ns ± 1%     833ns ± 1%   +1.96%  (p=0.000 n=10+9)
ParsePointsTagsUnSorted5-8     1.12µs ± 0%    1.20µs ± 2%   +7.27%  (p=0.000 n=9+10)
ParsePointsTagsUnSorted10-8    1.86µs ± 1%    2.02µs ± 1%   +8.72%  (p=0.000 n=10+9)

name                         old speed      new speed      delta
ParsePointsTagsSorted2-8     72.3MB/s ± 0%  70.8MB/s ± 1%   -2.12%  (p=0.000 n=8+8)
ParsePointsTagsSorted5-8     92.9MB/s ± 0%  84.7MB/s ± 1%   -8.80%  (p=0.000 n=10+10)
ParsePointsTagsSorted10-8     118MB/s ± 1%    96MB/s ± 0%  -18.68%  (p=0.000 n=10+7)
ParsePointsTagsUnSorted2-8   62.4MB/s ± 1%  61.1MB/s ± 1%   -1.95%  (p=0.000 n=10+9)
ParsePointsTagsUnSorted5-8   73.8MB/s ± 1%  68.9MB/s ± 2%   -6.66%  (p=0.000 n=10+10)
ParsePointsTagsUnSorted10-8  77.0MB/s ± 1%  70.8MB/s ± 1%   -8.02%  (p=0.000 n=10+9)

name                         old alloc/op   new alloc/op   delta
ParsePointsTagsSorted2-8         368B ± 0%      368B ± 0%     ~     (all equal)
ParsePointsTagsSorted5-8         432B ± 0%      432B ± 0%     ~     (all equal)
ParsePointsTagsSorted10-8        544B ± 0%      544B ± 0%     ~     (all equal)
ParsePointsTagsUnSorted2-8       400B ± 0%      400B ± 0%     ~     (all equal)
ParsePointsTagsUnSorted5-8       496B ± 0%      496B ± 0%     ~     (all equal)
ParsePointsTagsUnSorted10-8      672B ± 0%      672B ± 0%     ~     (all equal)

name                         old allocs/op  new allocs/op  delta
ParsePointsTagsSorted2-8         4.00 ± 0%      4.00 ± 0%     ~     (all equal)
ParsePointsTagsSorted5-8         4.00 ± 0%      4.00 ± 0%     ~     (all equal)
ParsePointsTagsSorted10-8        4.00 ± 0%      4.00 ± 0%     ~     (all equal)
ParsePointsTagsUnSorted2-8       5.00 ± 0%      5.00 ± 0%     ~     (all equal)
ParsePointsTagsUnSorted5-8       5.00 ± 0%      5.00 ± 0%     ~     (all equal)
ParsePointsTagsUnSorted10-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)

e-dard · 2020-01-08T12:32:43Z

@GeorgeMac in terms of preparing for a deployment - we probably need to think about the fact this may break existing users or clients. Please check with @bpalani and explain the impact of this change.

GeorgeMac · 2020-01-08T12:42:22Z

neat will crack out benchstat! Cheers!

GeorgeMac marked this pull request as ready for review January 7, 2020 15:00

GeorgeMac requested review from e-dard and stuartcarnie January 7, 2020 15:00

stuartcarnie approved these changes Jan 8, 2020

View reviewed changes

e-dard approved these changes Jan 8, 2020

View reviewed changes

GeorgeMac added 2 commits January 24, 2020 15:49

fix(models): filter out reserved tag keys in points parser

de9c883

chore(models): update changelog to reflect reserved tag key change

c5d9cd8

GeorgeMac force-pushed the gm/filter-tags branch from 85a12dc to c5d9cd8 Compare January 24, 2020 15:51

GeorgeMac merged commit 281c08e into master Jan 24, 2020

GeorgeMac deleted the gm/filter-tags branch January 24, 2020 17:09

adrian-thurston mentioned this pull request Feb 4, 2020

Develop a ragel-based line protocol parser for use in filtering applications #16717

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): filter out reserved tag keys in points parser #16412

fix(models): filter out reserved tag keys in points parser #16412

GeorgeMac commented Jan 6, 2020 •

edited

Loading

stuartcarnie left a comment

GeorgeMac commented Jan 8, 2020

e-dard left a comment

e-dard commented Jan 8, 2020

GeorgeMac commented Jan 8, 2020 •

edited

Loading

fix(models): filter out reserved tag keys in points parser #16412

fix(models): filter out reserved tag keys in points parser #16412

Conversation

GeorgeMac commented Jan 6, 2020 • edited Loading

stuartcarnie left a comment

Choose a reason for hiding this comment

GeorgeMac commented Jan 8, 2020

e-dard left a comment

Choose a reason for hiding this comment

e-dard commented Jan 8, 2020

GeorgeMac commented Jan 8, 2020 • edited Loading

GeorgeMac commented Jan 6, 2020 •

edited

Loading

GeorgeMac commented Jan 8, 2020 •

edited

Loading