Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigtable/cbt): cbt 'import' cmd to parse a .csv file and write to CBT #5072

Merged
merged 16 commits into from Nov 8, 2021

Conversation

@markduffett
Copy link
Contributor

@markduffett markduffett commented Nov 3, 2021

Add 'import' command to 'cbt' and unit/emulator tests

  • parses and imports .csv files only for now
  • should be accompanied/shortly followed by a tutorial + sample.csv
  • csv file formatting assumes a column family row is present unless flag 'column-family' is set, then it will use that family for all columns
  • configurable batch-size(500 rows) and worker threads(1) defaults

Runtime expectations:
2MB file with 12588 rows

  • 1s with 20 workers, 5s with the default 1

190MB file in 4s @ 47.5MB/s when using 'batch-size=5' 'workers=20'

  • 50%+ of 1 Node/tabletservers' CPU
  • 1288 rows, 1000 cols, 5kB data populated in 3% of cells
  • 200MB/s+ depending on mutation size and number of CBT nodes
@google-cla google-cla bot added the cla: yes label Nov 3, 2021
@crwilcox crwilcox changed the title cbt 'import' cmd to parse a .csv file and write to CBT feat(bigtable/cbt): cbt 'import' cmd to parse a .csv file and write to CBT Nov 3, 2021
@markduffett markduffett force-pushed the csv-to-cbt-importer branch 4 times, most recently from 9d8f2d3 to f767730 Nov 3, 2021
bigtable/cmd/cbt/cbt_test.go Outdated Show resolved Hide resolved
@crwilcox
Copy link
Collaborator

@crwilcox crwilcox commented Nov 4, 2021

Looks like another linter error showed up.

bigtable/cmd/cbt/cbt_test.go:384:10: if block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary)

Also, if you'd like, the script that is running is /internal/kokoro/vet.sh. You can get pretty close by running golint ./..., gofmt, goimports, though there are a few linter instances where we ignore failures (linting was added later)

@markduffett markduffett force-pushed the csv-to-cbt-importer branch from df22379 to 77b3f18 Nov 4, 2021
@markduffett markduffett force-pushed the csv-to-cbt-importer branch from 77b3f18 to e70df18 Nov 4, 2021
@markduffett markduffett requested a review from crwilcox Nov 4, 2021
bigtable/cmd/cbt/cbt.go Show resolved Hide resolved
bigtable/cmd/cbt/cbt.go Outdated Show resolved Hide resolved
bigtable/cmd/cbt/cbt_test.go Outdated Show resolved Hide resolved
bigtable/cmd/cbt/cbt_test.go Outdated Show resolved Hide resolved
bigtable/cmd/cbt/cbt_test.go Show resolved Hide resolved
@markduffett markduffett requested a review from crwilcox Nov 5, 2021
if !tc.fail && err != nil {
t.Errorf("parseCsvHeaders() failed. input:%+v, error:%s", tc, err)
continue
}
if tc.fail && err == nil {
t.Errorf("parseImportArgs() did not fail. input:%+v, error:%s", tc, err)
continue
}
if tc.fail {
continue
Copy link
Collaborator

@crwilcox crwilcox Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think more typically the tests just catch this via an OR or two separate conditionals, collapsing this a bit? I may be missing nuance here

Copy link
Contributor Author

@markduffett markduffett Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case would allow continuation since the fail is expected, but we don't want to do any comparisons below. It can't be paired with any of the above because they add errors were as this is expected.

I can put the if err == nil {t.Error} inside the tc.fail block, just wasn't sure if go-style preferred the nested ifs.

if tc.fail {
    if err == nil {
	t.Errorf("parseImportArgs() did not fail. input:%+v, error:%s", tc, err)
    }
    continue
}

Copy link
Collaborator

@crwilcox crwilcox Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this to me seems like we are trying to check if fail end err match. I don't see why they would have different messages. Also one of these has 'parseImportArgs' which I assume is leftover from a copy paste?

Are we in essence checking this

if tc.fail != (err != nil)

Alternatively, rather than storing tc.fail as a bool, store the err string to do a deeper check: possibly kind of like https://github.com/googleapis/google-cloud-go/blob/master/datastore/datastore_test.go#L1987

Copy link
Contributor Author

@markduffett markduffett Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to error matching pattern.

@crwilcox crwilcox merged commit 5a2ed6b into googleapis:master Nov 8, 2021
4 checks passed
telpirion pushed a commit that referenced this issue Jan 10, 2022
…o CBT (#5072)

Co-authored-by: Christopher Wilcox <crwilcox@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants