-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add bulk loader validator #3838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bulk validator reports all the parsing error in input files
This tool can also be used to validate input files for live loader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ A review job has been created and sent to the PullRequest network.
@animesh2049 you can click here to see the review status or cancel the code review job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some inline feedback regarding error handling improvements and consistency.
Reviewed with ❤️ by PullRequest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like all the logic contained in this PR is also available in the current bulk loader. Is there anything new that this validator provides?
Reviewable status: 0 of 4 files reviewed, 9 unresolved discussions (waiting on @animesh2049 and @manishrjain)
Yes all the logic is already there in bulk loader but since we want a different validator tool, we thought it would be good to do it this way. What are your thoughts on this @gitlw ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought Manish wanted a more rigorous validation tool. So maybe check with him first. Thanks!
Reviewable status: 0 of 4 files reviewed, 9 unresolved discussions (waiting on @animesh2049 and @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add test in systest
dgraph/cmd/bulk_validator/loader.go
Outdated
| @@ -0,0 +1,122 @@ | |||
| package bulkvalidator | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just call if validator instead of bulk_validator. This could be use for live loader data as well.
dgraph/cmd/bulk_validator/loader.go
Outdated
|
|
||
| "github.com/dgraph-io/badger/y" | ||
| "github.com/dgraph-io/dgraph/chunker" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove newline here
dgraph/cmd/validator/loader.go
Outdated
| "github.com/dgraph-io/dgraph/chunker" | ||
| "github.com/prometheus/common/log" | ||
|
|
||
| "github.com/dgraph-io/dgraph/x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this import together with other import.
dgraph/cmd/validator/loader.go
Outdated
| x.Fatalf("No data files found in %s\n", ld.opt.DataFiles) | ||
| } | ||
|
|
||
| loadType := chunker.DataFormat(files[0], ld.opt.DataFormat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check here that all the files have correct type?
dgraph/cmd/validator/loader.go
Outdated
| if err == io.EOF { | ||
| break | ||
| } else if err != nil { | ||
| x.Check(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it make sense to continue here instead?
dgraph/cmd/validator/mapper.go
Outdated
|
|
||
| for chunkBuf := range m.readerChunkCh { | ||
| if err := chunker.Parse(chunkBuf.chunk); err != nil { | ||
| m.foundError = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we syncronizing access to foundError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 11 unresolved discussions (waiting on @golangcibot, @mangalaman93, @manishrjain, and @pullrequest[bot])
dgraph/cmd/validator/mapper.go, line 29 at r2 (raw file):
Previously, pullrequest[bot] wrote…
Similar to other comment regarding fmt vs log would potentially switch this to use
logsince its being used concurrently
Done.
dgraph/cmd/validator/mapper.go, line 28 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Are we syncronizing access to foundError?
Done.
dgraph/cmd/bulk_validator/loader.go, line 56 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
printf: Println call has possible formatting directive %s (from
govet)
Done.
dgraph/cmd/validator/loader.go, line 61 at r2 (raw file):
Previously, pullrequest[bot] wrote…
Since some of the errors are being handled with
x.Checkwhich appears to uselog.Fatal, would it make sense to just uselog.Fatallninstead offmt.Printlnwith anos.Exitjust so that all of the error output is usinglog?
Done.
dgraph/cmd/validator/loader.go, line 68 at r2 (raw file):
Previously, pullrequest[bot] wrote…
Could possibly add a newline onto the end of this printf to match the others
Done.
dgraph/cmd/validator/loader.go, line 75 at r2 (raw file):
Previously, pullrequest[bot] wrote…
Might simplify this to do:
go m.run(loadType, &mapperWg)Although it actually looks like
run()does everything except initializing the chunker on a separate goroutine so it may not be necessary to do this on another goroutine at all?
Done.
dgraph/cmd/validator/loader.go, line 86 at r2 (raw file):
Previously, pullrequest[bot] wrote…
Using
loghere as well might be better since this is being done concurrently as the log package locks around writing to std output, wherefmtdoesn't.
Done.
dgraph/cmd/validator/loader.go, line 92 at r2 (raw file):
Previously, pullrequest[bot] wrote…
It looks like
x.Checkdoes alog.Fatalwhich I believe will actually exit without a panic so this will skip your deferred functions above. Just wanted to check if that is okay? There is another usage below as well.
Defer function cleanup() only closes the file. log.Fatal should kill the program and hence close the file also.
dgraph/cmd/validator/loader.go, line 12 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Move this import together with other import.
Done.
dgraph/cmd/validator/loader.go, line 63 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Should we check here that all the files have correct type?
It's just taking file type from the first input file. If all the files are not of the same type, chunker will give error while chunking.
dgraph/cmd/validator/loader.go, line 99 at r3 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
Will it make sense to continue here instead?
I don't think so, because this error is likely to come in the case of file format mismatch i.e. chunker expects different format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r3.
Reviewable status: 1 of 4 files reviewed, 12 unresolved discussions (waiting on @animesh2049, @golangcibot, @manishrjain, and @pullrequest[bot])
dgraph/cmd/validator/mapper.go, line 29 at r4 (raw file):
for chunkBuf := range m.readerChunkCh { if err := chunker.Parse(chunkBuf.chunk); err != nil { atomic.CompareAndSwapUint32(&m.foundError, 0, 1)
I think it'd be okay to not print anything in case there is no error.
dgraph/cmd/validator/mapper.go, line 30 at r4 (raw file):
if err := chunker.Parse(chunkBuf.chunk); err != nil { atomic.CompareAndSwapUint32(&m.foundError, 0, 1) glog.Errorf("Error Found in file %s: %s\n", chunkBuf.filename, err)
found*
dgraph/cmd/validator/mapper.go, line 32 at r4 (raw file):
glog.Errorf("Error Found in file %s: %s\n", chunkBuf.filename, err) }
newline
dgraph/cmd/validator/mapper.go, line 36 at r4 (raw file):
}() go func() {
comment
dgraph/cmd/validator/loader.go, line 68 at r2 (raw file):
Previously, animesh2049 (Animesh Chandra Pathak) wrote…
Done.
same here
dgraph/cmd/validator/run.go, line 59 at r4 (raw file):
} loader := newLoader(opt)
no need for assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is unnecessarily complex. Instead of trying to copy the bulk loader structure, think about what is the minimum you need to make things work nicely. Needs refactoring.
Reviewable status: 1 of 4 files reviewed, 19 unresolved discussions (waiting on @animesh2049, @golangcibot, @manishrjain, and @pullrequest[bot])
dgraph/cmd/validator/mapper.go, line 1 at r5 (raw file):
package validator
License?
dgraph/cmd/validator/mapper.go, line 11 at r5 (raw file):
) type mapper struct {
You don't need this object.
dgraph/cmd/validator/mapper.go, line 21 at r5 (raw file):
} func (m *mapper) run(inputFormat chunker.InputFormat, wg *sync.WaitGroup) {
You only need this function.
dgraph/cmd/validator/loader.go, line 1 at r5 (raw file):
package validator
License?
dgraph/cmd/validator/loader.go, line 18 at r5 (raw file):
TmpDir string NumGoroutines int CleanupTmp bool
Do you need these options?
dgraph/cmd/validator/loader.go, line 33 at r5 (raw file):
} type loader struct {
Probably don't need the loader object either.
dgraph/cmd/validator/loader.go, line 35 at r5 (raw file):
type loader struct { *state mappers []*mapper
Don't need these mapper objects.
dgraph/cmd/validator/loader.go, line 88 at r5 (raw file):
for { chunkBuf, err := chunker.Chunk(r) if chunkBuf != nil && chunkBuf.Len() > 0 {
This should happen after err has been handled, right?
dgraph/cmd/validator/run.go, line 1 at r5 (raw file):
package validator
License?
dgraph/cmd/validator/run.go, line 20 at r5 (raw file):
func init() { Validator.Cmd = &cobra.Command{ Use: "validator",
validate
dgraph/cmd/validator/run.go, line 21 at r5 (raw file):
Validator.Cmd = &cobra.Command{ Use: "validator", Short: "Validate input file",
files
dgraph/cmd/validator/run.go, line 60 at r5 (raw file):
loader := newLoader(opt) loader.mapStage()
Add a message at the end saying the validation is done or something.
bulk validator reports all the parsing error in input files
This change is
Updates #3984