Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl]: Add functions for parsing CSV #30921

Closed
BinaryFissionGames opened this issue Jan 31, 2024 · 8 comments · Fixed by #31081
Closed

[pkg/ottl]: Add functions for parsing CSV #30921

BinaryFissionGames opened this issue Jan 31, 2024 · 8 comments · Fixed by #31081
Assignees
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium processor/transform Transform processor

Comments

@BinaryFissionGames
Copy link
Contributor

Component(s)

pkg/ottl, processor/transform

Is your feature request related to a problem? Please describe.

Currently, there is no equivalent to stanza's CSV parsing operator in OTTL.

I would like to be able to parse CSV with OTTL, instead of using the logtransformprocessor which is planned to be eventually removed.

Describe the solution you'd like

Add 3 converter functions:

ParseCSV(target, header, delimiter, header_delimiter)
ParseCSVLazyQuotes(target, header, delimiter, header_delimiter)
ParseCSVIgnoreQuotes(target, header, delimiter, header_delimiter)

where:

  • target is a Getter that returns a string
  • header is a Getter that returns an array of string values (should accept either a string literal or a path expression)
  • delimiter is a string
  • and header_delimeter is a string

These functions all return a pcommon.Map struct that is the result of parsing the target string as CSV.

In terms of implementation, both ParseCSV and ParseCSVLazyQuotes will use encoding/csv to parse CSV. ParseCSVIgnoreQuotes will use strings.Split to parse. This is the same way that that the stanza package accomplishes CSV parsing.

The difference between the three functions is that:
ParseCSV will do typical CSV parsing
ParseCSVLazyQuotes will parse with the LazyQuotes option (see: https://pkg.go.dev/encoding/csv#Reader)
ParseCSVIgnoreQuotes will parse by completely ignoring quotes, just using strings.Split to split on the delimiter

Describe alternatives you've considered

ParseCSVIgnoreQuotes might be similar enough to Split that we could consider removing it.
What's nice about keeping this function is that it creates a map from the split, e.g.:

header: "Field1,Field2"
value: "Value 1,Value 2"

Turns into

Parsed: {"Field1": "Value 1", "Field2": "Value 2"}

And I don't think there's a currently a way to accomplish that column headers + row -< map operation currently, so I think it's still a valuable function.

Additional context

No response

@BinaryFissionGames BinaryFissionGames added enhancement New feature or request needs triage New item requiring triage labels Jan 31, 2024
@github-actions github-actions bot added pkg/ottl processor/transform Transform processor labels Jan 31, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@BinaryFissionGames
Copy link
Contributor Author

I'd like to take on adding these functions once we agree on design.

@TylerHelmuth
Copy link
Member

In this situation the data was not collected by the filelogreceiver correct?

I believe we could use Optional parameters to reduce the functionality to 1 function. I also think both delimiters could be Optional with a default value of ,

@TylerHelmuth TylerHelmuth added priority:p2 Medium and removed needs triage New item requiring triage labels Jan 31, 2024
@BinaryFissionGames
Copy link
Contributor Author

Yes, this would be in a situation where the log is received and is not able to be parsed with operators in one of the stanza-based operators (e.g. receiving direct OTLP data with the otlp receiver).

It's been a while since I've touched a lot of the OTTL stuff, I completely missed that there are optional parameters!

Would it make sense for the signature to look something like this?

ParseCSV(target, header, Optional[delimiter], Optional[header_delimiter], Optional[lazyQuotes], Optional[ignoreQuotes])

We could condense the lazyQuotes and ignoreQuotes options into a single mode enum (e.g. one of quotes, lazyQuotes, ignoreQuotes), since really they should be considered mutually exclusive. I think I'd probably prefer that:

ParseCSV(target, header, Optional[delimiter], Optional[header_delimiter], Optional[mode])

@TylerHelmuth
Copy link
Member

ParseCSV(target, header, Optional[delimiter], Optional[header_delimiter], Optional[mode])

This would be my preference. Annoyingly functions can't specify their own enums yet, a it would be a string, similar to strategy in merge_maps.

@TylerHelmuth
Copy link
Member

@evan-bradley please take a look at this proposal

@evan-bradley
Copy link
Contributor

This all sounds good to me.

@BinaryFissionGames
Copy link
Contributor Author

@TylerHelmuth @evan-bradley I've opened a PR for adding the ParseCSV converter, when either of you get a chance to look!

evan-bradley pushed a commit that referenced this issue Feb 16, 2024
**Description:**
* Adds a new ParseCSV converter that can parse CSV row strings.

**Link to tracking Issue:** Closes #30921

**Testing:**
* Unit tests
* Manually tested the examples with a local build of the collector

**Documentation:**
* Adds documentation for using the ParseCSV converter.
djaglowski pushed a commit to djaglowski/opentelemetry-collector-contrib that referenced this issue Feb 19, 2024
**Description:**
* Adds a new ParseCSV converter that can parse CSV row strings.

**Link to tracking Issue:** Closes open-telemetry#30921

**Testing:**
* Unit tests
* Manually tested the examples with a local build of the collector

**Documentation:**
* Adds documentation for using the ParseCSV converter.
djaglowski pushed a commit to djaglowski/opentelemetry-collector-contrib that referenced this issue Feb 19, 2024
**Description:**
* Adds a new ParseCSV converter that can parse CSV row strings.

**Link to tracking Issue:** Closes open-telemetry#30921

**Testing:**
* Unit tests
* Manually tested the examples with a local build of the collector

**Documentation:**
* Adds documentation for using the ParseCSV converter.
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this issue Mar 13, 2024
**Description:**
* Adds a new ParseCSV converter that can parse CSV row strings.

**Link to tracking Issue:** Closes open-telemetry#30921

**Testing:**
* Unit tests
* Manually tested the examples with a local build of the collector

**Documentation:**
* Adds documentation for using the ParseCSV converter.
nslaughter pushed a commit to nslaughter/opentelemetry-collector-contrib that referenced this issue May 23, 2024
**Description:**
* Adds a new ParseCSV converter that can parse CSV row strings.

**Link to tracking Issue:** Closes open-telemetry#30921

**Testing:**
* Unit tests
* Manually tested the examples with a local build of the collector

**Documentation:**
* Adds documentation for using the ParseCSV converter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium processor/transform Transform processor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants