Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Validate indexable metric tags before insert into index queue #2046

Merged
merged 18 commits into from Jan 5, 2020

Conversation

robskillington
Copy link
Collaborator

What this PR does / why we need it:

This also adds a tool to fix existing file sets.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

@@ -0,0 +1,398 @@
// Copyright (c) 2018 Uber Technologies, Inc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit 19

@codecov
Copy link

codecov bot commented Nov 16, 2019

Codecov Report

Merging #2046 into master will increase coverage by 9%.
The diff coverage is 51.1%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #2046     +/-   ##
========================================
+ Coverage    63.3%   72.4%     +9%     
========================================
  Files        1001    1007      +6     
  Lines       86014   86414    +400     
========================================
+ Hits        54523   62571   +8048     
+ Misses      27580   19668   -7912     
- Partials     3911    4175    +264
Flag Coverage Δ
#aggregator 82% <ø> (+1.6%) ⬆️
#cluster 85.6% <ø> (+19%) ⬆️
#collector 64.8% <ø> (ø) ⬆️
#dbnode 79.7% <100%> (+3.9%) ⬆️
#m3em 73.2% <ø> (+0.9%) ⬆️
#m3ninx 73.9% <ø> (+2.9%) ⬆️
#m3nsch 51.1% <ø> (+22.7%) ⬆️
#metrics 17.6% <ø> (ø) ⬆️
#msg 74.7% <ø> (ø) ⬆️
#query 68.4% <50.5%> (+25.6%) ⬆️
#x 83.4% <ø> (+12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a112602...6a5832f. Read the comment docs.

@robskillington robskillington added this to Reviewing in Release v0.15 Jan 3, 2020
@robskillington robskillington moved this from Reviewing to In progress in Release v0.15 Jan 3, 2020
Makefile Outdated
@@ -89,6 +89,7 @@ TOOLS := \
clone_fileset \
dtest \
verify_commitlogs \
verify_data_files \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be added to the license check? (Also comparator if you're adding that fix)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to FOSSA, looks like comparator is already there 👍

return ErrUsingReservedFieldName
}
if !utf8.Valid(tagValue) {
return fmt.Errorf("document contains invalid field value: %v", tagValue)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; add name in error log

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

func ValidateMetricTag(tag ident.Tag) error {
tagName := tag.Name.Bytes()
tagValue := tag.Value.Bytes()
if !utf8.Valid(tagName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this a more likely error scenario than the ReservedFieldNameID case? Also can probably wait to get the value bytes until after name passes validation

"name_as_string=%s, name_as_hex=%s"+
"value_as_string=%s, value_as_hex=%s",
err,
tag.Name.Bytes(), tag.Name.Bytes(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think first one of these should be .String()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether you give it a string or []byte it acts the same thankfully - avoids allocations if leave it as []byte so I usually don't provide as a string to the formatter. e.g.:
https://play.golang.org/p/-yhIOkLan9H

if !utf8.Valid(id.Bytes()) {
return readEntryResult{invalidID: true},
fmt.Errorf("invalid id: err=%s, as_string=%s, as_hex=%x",
"non-utf8", id.Bytes(), id.Bytes())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think first should be a string

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether you give it a string or []byte it acts the same thankfully - avoids allocations if leave it as []byte so I usually don't provide as a string to the formatter. e.g.:
https://play.golang.org/p/-yhIOkLan9H

@robskillington robskillington moved this from In progress to Reviewing in Release v0.15 Jan 4, 2020
// ValidateSeriesTag validates a series tag for use with m3ninx.
func ValidateSeriesTag(tag ident.Tag) error {
tagName := tag.Name.Bytes()
tagValue := tag.Value.Bytes()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: May be better to only take the value after name validation passes?

@@ -241,6 +241,11 @@ func NewTags(values ...Tag) Tags {
return Tags{values: values}
}

// Reset resets the tags for reuse.
func (t *Tags) Reset(values []Tag) {
t.values = values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to take ownership of the tags?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up, ident.ID does the ref tracking so we are ok here.

}

// Get all fileset files.
log.Info("discovering file sets",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meganit: could build up a fields := []zapcore.Fields{zap.Strings("namespaces", namespaces)} and append to it as we go?

return nil, fmt.Errorf("could not read dir names: %v", err)
}

results := make([]string, 0, len(entries))
Copy link
Collaborator

@arnikola arnikola Jan 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do this in place without an alloc results := entries[:0] and keep the rest the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


results := make([]string, 0, len(entries))
for _, p := range entries {
if p == "." || p == ".." || p == "./.." || p == "./" || p == "../" || p == "./../" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do if p[0] == '.' { continue } ? Or is there a chance there'll be valid files starting with .?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct they could be "./foo"

return err
}

defer reader.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be deferred earlier before the error above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close() is only valid after an Open(), hence doing it below the return


check, err := readEntry(id, tags, data, checksum)
data.Finalize() // Always finalize data.
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reduce nesting with a continue?

if err == nil {
 continue
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


// NOTE: we output to a new directory so that we don't clobber files.
writeFsOpts := fsOpts.SetFilePathPrefix(opts.fixDir)
writer, err := fs.NewWriter(writeFsOpts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this defer writer.Close() and return nil at the end instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since writer.Close() does quite a few final steps, I'm going to continue to return an error if it returns an error.

To make sure we always close up I changed it to this:

	success := false
	defer func() {
		if !success {
			writer.Close()
		}
	}()

	// do work

	success = true
	return writer.Close()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍

Comment on lines 423 to 434
// Need to remove invalid tags.
tagsBuffer = tagsBuffer[:0]
for _, tag := range currTags {
if err := convert.ValidateSeriesTag(tag); err != nil {
removedTags++
continue
}
tagsBuffer = append(tagsBuffer, tag)
}

// Make sure we write out the modified tags.
writeTags = tagsBuffer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this risk 2 series with the same tagset? Should probably skip the entire series instead IMO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, can remove the series.

@arnikola arnikola moved this from Reviewing to Needs Merge in Release v0.15 Jan 5, 2020
@robskillington robskillington merged commit 5cd41e0 into master Jan 5, 2020
@robskillington robskillington deleted the r/verify-data-files-tool branch January 5, 2020 05:32
@arnikola arnikola moved this from Needs Merge to Done in Release v0.15 Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants