Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Process delimited files like semi-structured text #56038

Merged
merged 3 commits into from
Jan 28, 2020
Merged

[ML] Process delimited files like semi-structured text #56038

merged 3 commits into from
Jan 28, 2020

Conversation

droberts195
Copy link
Contributor

Summary

Changes the file upload functionality to process delimited
files by splitting them into to messages, then sending
these to the ingest pipeline as a single field for further
processing in Elasticsearch.

The csv_importer has been removed and the old sst_importer
replaced with a similar message_importer that has been
enhanced to cover the edge cases required by delimited
file processing.

Previously the file upload functionality parsed CSV in the
browser, but by parsing CSV in the ingest pipeline it
makes the Kibana file upload functionality more easily
interchangable with Filebeat such that the configurations
it creates can more easily be used to import data with the
same structure repeatedly in production.

Companion to elastic/elasticsearch#51492

Checklist

Use strikethroughs to remove checklist items you don't feel are applicable to this PR.

- [ ] This was checked for cross-browser compatibility, including a check against IE11
- [ ] Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
- [ ] Documentation was added for features that require explanation or tutorials
- [ ] Unit or functional tests were updated or added to match the most common scenarios
- [ ] This was checked for keyboard-only and screenreader accessibility

For maintainers

Changes the file upload functionality to process delimited
files by splitting them into to messages, then sending
these to the ingest pipeline as a single field for further
processing in Elasticsearch.

The csv_importer has been removed and the old sst_importer
replaced with a similar message_importer that has been
enhanced to cover the edge cases required by delimited
file processing.

Previously the file upload functionality parsed CSV in the
browser, but by parsing CSV in the ingest pipeline it
makes the Kibana file upload functionality more easily
interchangable with Filebeat such that the configurations
it creates can more easily be used to import data with the
same structure repeatedly in production.

Companion to elastic/elasticsearch#51492
@droberts195 droberts195 added release_note:enhancement enhancement New value added to drive a business result :ml v8.0.0 Feature:File and Index Data Viz ML file and index data visualizer v7.7.0 labels Jan 27, 2020
@droberts195 droberts195 requested a review from a team as a code owner January 27, 2020 17:32
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

// multiline_start_pattern regex
// if it does, it is a legitimate end of line and can be pushed into the list,
// if not, it must be a newline char inside a field value, so keep looking.
async read(text) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't seem like the method is async

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a left over from the very original csv parsing library that was used which needed to be async.
This can go, but you'll have also change the read method in ndjson_importer.js as well as removing the await from line 210 of import_view.js

Comment on lines 62 to 68
if (this.multilineStartRegex === null || line.match(this.multilineStartRegex) !== null) {
message = message.replace(/\r$/, '');
data.push({ message });
message = '';
} else {
message += '\n';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the same as lines 39-50, might deserve a small dedicated method

Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@droberts195 droberts195 merged commit 9fcbeb3 into elastic:master Jan 28, 2020
@droberts195 droberts195 deleted the use_csv_processor_in_file_structure_finder_ingest branch January 28, 2020 14:16
droberts195 added a commit that referenced this pull request Jan 28, 2020
Changes the file upload functionality to process delimited
files by splitting them into to messages, then sending
these to the ingest pipeline as a single field for further
processing in Elasticsearch.

The csv_importer has been removed and the old sst_importer
replaced with a similar message_importer that has been
enhanced to cover the edge cases required by delimited
file processing.

Previously the file upload functionality parsed CSV in the
browser, but by parsing CSV in the ingest pipeline it
makes the Kibana file upload functionality more easily
interchangable with Filebeat such that the configurations
it creates can more easily be used to import data with the
same structure repeatedly in production.

Companion to elastic/elasticsearch#51492
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jan 28, 2020
* master: (21 commits)
  [SIEM][Detection Engine] critical blocker updates to latest ECS version
  [Monitoring] Fix inaccuracies in logstash pipeline listing metrics (elastic#55868)
  Resetting errors and removing duplicates (elastic#56054)
  Add flag to opt out from sub url tracking (elastic#55672)
  [SIEM][Detection Engine] critical bug, fixes duplicate tags
  [ML] Anomaly Detection: Fix persist/restore of refreshInterval in globalState. (elastic#56113)
  [ML] Single Metric Viewer: Fix annnotations refresh. (elastic#56107)
  adapt ObjectToConfigAdapter.getFlattenedPaths to consider arrays as final values (elastic#56105)
  Add Appender.receiveAllLevels option to fix LegacyAppender (elastic#55752)
  [ML] Process delimited files like semi-structured text (elastic#56038)
  Charts plugin (combining ui/color_maps and EuiUtils) (elastic#55469)
  fix tutorial documentation (elastic#55996)
  [ML] Fix persist/restore of time/refreshInterval in data visualizer. (elastic#56122)
  [Index Management] Fix errors with validation (elastic#56072)
  [Index Management] Add try/catch when parsing index filter from URI (elastic#56051)
  [NP] add HTTP resources testing strategies (elastic#54908)
  [ML] Single Metric Viewer: Fix brush update on short recent timespans. (elastic#56125)
  [Uptime] Add timeout for slow process to skipped functional tests (elastic#56065)
  refactor (elastic#56121)
  Move tests in dashboard into appropriate folders (elastic#55304)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:File and Index Data Viz ML file and index data visualizer :ml release_note:enhancement v7.7.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants