Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cmd/influx): influx cli can write CSV files #17398

Merged
merged 30 commits into from
Mar 30, 2020
Merged

Conversation

sranka
Copy link
Contributor

@sranka sranka commented Mar 24, 2020

Closes #17003 #17356

This PR

  • supports CSV files in 'influx write'
    • transformation of CSV data to line protocol is driven by CSV annotations
    • annotations described https://v2.docs.influxdata.com/v2.0/reference/syntax/annotated-csv/#annotations are all supported, so that flux query result can be written "as-is"
    • #datatype annotation is enhanced:
      • so you can mark non-field data with:
        • measurement, tag, time
        • ignore value in a column simply ignores the column
        • dateTime is either as a long (int64) number or RFC3339 format, this type is always used to represent time of measurement; or you can specify
          • dateTime:RFC3339 for RFC3339 time
          • dateTime:number to expect a long number
        • time value is also supported as an alias for the existing dateTime
      • a field data type can be used to let detect field's data type
      • default datatype for a column is =field= unless _field column is present (ignored then)
      • there can be at most 1 dateTime column
    • https://github.com/bonitoo-io/influxdb-csv-import/blob/master/README.md contains demonstrative CSV examples
  • adds new options to 'influx write":
  -f, --file string        The path to the file to import
      --format string      Input format, either lp (Line Protocol) or csv (Comma Separated Values). Defaults to lp unless '.csv' extension
  • a new "write dryrun" (i.e. 'write' command with 'dryrun' sub-command) can be used to validate and tune annotations in CSV files; parameters are the as in "write", but line protocol is written to stdout

  • keeps the existing functionality (writing line protocol data) without changes

    • the default input format is lp (line protocol) unless a file with ".csv" extension is supplied
  • CHANGELOG.md updated with a link to the PR (not the Issue)

  • Well-formatted commit messages

  • Rebased/mergeable

  • Tests pass

  • Documentation updated or issue created (provide link to issue/pr)

  • Signed CLA (if not already signed)

@sranka sranka changed the title feat: influx cli can write CSV files feat(cmd/influx): influx cli can write CSV files Mar 24, 2020
cmd/influx/main.go Outdated Show resolved Hide resolved
@sranka sranka requested a review from kelwang March 25, 2020 15:15
@kelwang kelwang requested a review from jsteenb2 March 25, 2020 15:20
@jsteenb2 jsteenb2 requested review from a team and sebito91 and removed request for a team March 25, 2020 15:24
cmd/influx/main.go Outdated Show resolved Hide resolved
cmd/influx/write.go Outdated Show resolved Hide resolved
@kelwang
Copy link
Contributor

kelwang commented Mar 25, 2020

with this pr, we can also close #17356

cmd/influx/write.go Outdated Show resolved Hide resolved
cmd/influx/write.go Outdated Show resolved Hide resolved
cmd/influx/write.go Outdated Show resolved Hide resolved
@sanderson
Copy link
Contributor

@sranka If using the linepart annotation in annotated CSV without the datatype annotations, should the time columns should always be an integer (not RFC3339) according to the precision the user specifies (just like line protocol) or will it convert RFC3339 timestamps into an integer for you?

@sanderson
Copy link
Contributor

@sranka Also, how do you handle string and boolean field values with the linepart annotation? Do you have to include a datatype annotation? Or is it smart enough to know?

@sranka
Copy link
Contributor Author

sranka commented Mar 26, 2020

@sanderson linepart annotation is now independent on datatype annotation, at least conceptually. linepart places column data in the line, datatype is just field data type.

Regarding the data type annotation:

  • if no data type is specified for a time column, it is detected from the data, so both number and RFC3339 string is supported, see https://github.com/influxdata/influxdb/blob/17003/writeFromCsv/write/csvTable.go#L349
  • if no data type is specified for a field column, the column data are copied to line protocol as-is
  • the datatype annotation for measurement and tag columns is ignored, the conversion to line protocol is given for these types

It might be simpler to introduce extra field, measurement or tag (and ignore) values in the datatype annotation so that the user could then supply a single annotation line (#datatype) to specify fields (with types) and non-field columns (measurement, time, tag and ignored). As I write this comment I tend to change the existing implementation this way and remove the linepart annotation. WDYT?

There is a wide area of enhancements that could be done regarding CSV, once this gets merged:

  • time formats may vary
  • header/annotation lines could be added from cli line without changing the CSV file
  • constant tags, field values or measurement name could be supplied as well
  • CSV with more time columns (such as https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases) can be supported OOTB
  • field types could be detected and enforced
  • better error reporting, a better indication of progress / success / failure
  • detection of data format without supplying a format flag
  • ...
    So this is just a first step to support CSV.

@russorat russorat requested a review from jsteenb2 March 27, 2020 18:15
Copy link
Contributor

@jsteenb2 jsteenb2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI looks good on my end. Would like to see @sebito91 or someone from @influxdata/storage-team take a gander at those bits

ctx = signals.WithStandardSignals(ctx)
if err := s.Write(ctx, orgID, bucketID, r); err != nil && err != context.Canceled {
return fmt.Errorf("failed to write data: %v", err)
}

return nil
}

func fluxWriteDryrunF(cmd *cobra.Command, args []string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent, CLI is looking good from my end. Thanks for running with feedback here 👍

@kelwang kelwang merged commit 5915fd3 into master Mar 30, 2020
@russorat russorat deleted the 17003/writeFromCsv branch March 30, 2020 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability to import an annotated Flux CSV response via the influx cli
5 participants