Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Improve CSV header row detection in find_file_structure #45099

Merged

Conversation

droberts195
Copy link
Contributor

When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes #45047

When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes elastic#45047
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent benwtrent self-requested a review August 1, 2019 15:53
@droberts195 droberts195 merged commit 7c43894 into elastic:master Aug 1, 2019
@droberts195 droberts195 deleted the improve_csv_header_detection branch August 1, 2019 19:08
droberts195 added a commit that referenced this pull request Aug 2, 2019
When doing a fieldwise Levenshtein distance comparison
between CSV rows, this change ignores all fields that
have long values, not just the longest field.

This approach works better for CSV formats that have
multiple freeform text fields rather than just a single
"message" field.

Fixes #45047
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] find_file_structure not detecting CSV header with many long and highly variable field values
4 participants