Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ndt sandbox async flush cleanup #523

Merged
merged 8 commits into from Jun 15, 2018

Conversation

gfr10598
Copy link
Contributor

@gfr10598 gfr10598 commented Jun 14, 2018

This PR:

  1. Adapts BQInserter.Flush to FlushSlice, and adds support for asynchronous flushing.
  2. Adds FlushAsync() which executes the BQ flush in a goroutine.
  3. Add BQInserter.AddRow(), which adds a row without flushing.
  4. Updates SS parser to use AddRow(), and FlushAsync()
  5. Updates and improves unit tests.
  6. Tweaks organization of BQInserter fields.
  7. Tweaks parameters in test case BQInserters.

This change is Reviewable

@gfr10598 gfr10598 force-pushed the ndt-sandbox-async-flush-cleanup branch from 6f1ac85 to 827738d Compare June 14, 2018 20:17
@coveralls
Copy link
Collaborator

coveralls commented Jun 14, 2018

Pull Request Test Coverage Report for Build 3303

  • 52 of 78 (66.67%) changed or added relevant lines in 2 files are covered.
  • 16 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 67.78%

Changes Missing Coverage Covered Lines Changed/Added Lines %
parser/ss.go 2 6 33.33%
bq/insert.go 50 72 69.44%
Files with Coverage Reduction New Missed Lines %
bq/insert.go 16 60.98%
Totals Coverage Status
Change from base Build 3289: -0.02%
Covered Lines: 2049
Relevant Lines: 3023

💛 - Coveralls

@stephen-soltesz
Copy link
Contributor

Many of my questions are clarifying. Some are not.


Review status: 0 of 1 LGTMs obtained


bq/bq_test.go, line 199 at r2 (raw file):

	fakeUploader := fake.NewFakeUploader()
	in, e := bq.NewBQInserter(
		etl.InserterParams{"mlab-testing", "dataset", "test2", "", 5, 10 * time.Second, 1 * time.Second},

This struct is getting complex enough that the absence of field names obscures the readability. I looked at the version of InserterParams at HEAD to figure out what which field was the BufferSize. And, then discovered that this PR changes the order. Explicit field name declarations make this more resilient to struct changes and easier to interpret.


bq/insert.go, line 48 at r2 (raw file):

	// MaxPutRetryDelay is one minute.  So aggregate delay will be around 2 minutes.
	// Beyond this point, we likely have a serious problem so no point in continuing.
	MaxPutRetryDelay = time.Minute

It's fine if so, but just checking -- do these need to be public?


bq/insert.go, line 153 at r2 (raw file):

	// Those accesses are protected by the token.
	// We use a token instead of a mutex, because it is acquired in FlushAsync,
	// but released by the flusher goroutine.

I don't understand why mutex is unsuitable for this use case. What am I missing?


bq/insert.go, line 230 at r2 (raw file):

}

func (in *BQInserter) updateMetrics(tableBase string, err error) error {

Why add tableBase as a parameter? Is this approach used consistently within this file?


etl/etl.go, line 35 at r2 (raw file):

	InsertRow(data interface{}) error
	// InsertRows inserts multiple rows into the insert buffer.
	// Deprecated:

Deprecated in favor of what?


parser/ss.go, line 272 at r2 (raw file):

		if err == etl.ErrBufferFull {
			ss.inserter.FlushAsync()
			err = ss.inserter.InsertRow(ssTest)

Why not AddRow?


Comments from Reviewable

@gfr10598
Copy link
Contributor Author

PTAL


Review status: 0 of 1 LGTMs obtained


bq/bq_test.go, line 199 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

This struct is getting complex enough that the absence of field names obscures the readability. I looked at the version of InserterParams at HEAD to figure out what which field was the BufferSize. And, then discovered that this PR changes the order. Explicit field name declarations make this more resilient to struct changes and easier to interpret.

Ok. I'm actually conflicted about named params, because it allows you to accidentally default some fields, which has caused bugs for me several times, particularly after adding a field. I'll update these, though. Actually, I'll use a small helper function, too.


bq/insert.go, line 48 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

It's fine if so, but just checking -- do these need to be public?

Nope. Just a bad habit of mine, left over from using all caps for constants in C++.


bq/insert.go, line 153 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

I don't understand why mutex is unsuitable for this use case. What am I missing?

Hmm. I'm accustomed to mutex restrictions (or maybe conventions) requiring the same thread to lock and unlock the mutex. That is perhaps not true in golang, but it still seems natural to me, so I'm reluctant to violate that.

Actually, this discussion is interesting: golang/go#9201


bq/insert.go, line 230 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Why add tableBase as a parameter? Is this approach used consistently within this file?

Oops. That was required in an earlier draft, but is no longer. I'll restore it.


etl/etl.go, line 35 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Deprecated in favor of what?

Added comment. Thanks for pointing out.


parser/ss.go, line 272 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Why not AddRow?

Because I missed this. 8-( Too bad the deprecation warnings aren't more prominent. Actually, I'm not even sure what tool produces them.

Hmm - a new tool:
go get -u honnef.co/go/tools/cmd/staticcheck
staticcheck ./...
go get -u honnef.co/go/tools/cmd/megacheck
megacheck ./...

It doesn't warn about deprecated tags, but does generate some other useful warning we should probably clean up.


Comments from Reviewable

@stephen-soltesz
Copy link
Contributor

:lgtm:


Review status: :shipit: complete! 1 of 1 LGTMs obtained


bq/bq_test.go, line 199 at r2 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

Ok. I'm actually conflicted about named params, because it allows you to accidentally default some fields, which has caused bugs for me several times, particularly after adding a field. I'll update these, though. Actually, I'll use a small helper function, too.

In this case, I would strive for the following balance.

My understanding is that Go encourages having meaningful zero values. If zero values have caused bugs in the past, I would look to the logic that depends on structs and consider how zero values will be interpreted when first creating the code. I know that's easier to say than do consistently. As well, when structs are complex enough to require special handling to prevent zero values, maybe by assigning default values, in this case defining a New* function can solve that case. Which looks like what you've done. 👍 Thank you.


bq/insert.go, line 153 at r2 (raw file):

Previously, gfr10598 (Gregory Russell) wrote…

Hmm. I'm accustomed to mutex restrictions (or maybe conventions) requiring the same thread to lock and unlock the mutex. That is perhaps not true in golang, but it still seems natural to me, so I'm reluctant to violate that.

Actually, this discussion is interesting: golang/go#9201

Excellent. Carry on.


Comments from Reviewable

@gfr10598
Copy link
Contributor Author

Review status: :shipit: complete! 1 of 1 LGTMs obtained


bq/insert.go, line 153 at r2 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Excellent. Carry on.

Done.


Comments from Reviewable

@gfr10598 gfr10598 merged commit 2b7fde1 into integration Jun 15, 2018
@gfr10598 gfr10598 deleted the ndt-sandbox-async-flush-cleanup branch June 15, 2018 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants