Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packages/windows/data_stream/powershell_operations: don't split tokens on hyphen #1931

Merged
merged 1 commit into from
Nov 16, 2021

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Oct 17, 2021

What does this PR do?

The change replaces the simple tokenizer with a custom tokenizer that splits on word boundaries that do not include hyphen.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • If I'm introducing a new feature, I have modified the Kibana version constraint in my package's manifest.yml file to point to the latest Elastic stack release (e.g. ^7.13.0).

Author's Checklist

  • Confirm that this generates the correct output template.

How to test this PR locally

Related issues

Screenshots

@efd6 efd6 added bug Something isn't working, use only for issues Team:Security-External Integrations 7.16-candidate labels Oct 17, 2021
@elasticmachine
Copy link

elasticmachine commented Oct 17, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-11-16T00:52:11.679+0000

  • Duration: 19 min 33 sec

  • Commit: d8cd9c1

Test stats 🧪

Test Results
Failed 0
Passed 126
Skipped 0
Total 126

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@@ -105,6 +105,10 @@
example: "50d2dbda-7361-4926-a94d-d9eadfdb43fa"
- name: script_block_text
type: text
analyzer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't support analyzers defined in fields files:

�[36mkibana_1                     |�[0m {"type":"log","@timestamp":"2021-10-18T00:00:08+00:00","tags":["error","plugins","fleet"],"pid":1228,"message":"Error: Error installing windows 1.2.4: illegal_argument_exception: [illegal_argument_exception] Reason: composable template [logs-windows.powershell_operational] template after composition with component templates [logs-windows.powershell_operational@custom, .fleet_component_template-1] is invalid\n    at ensureInstalledPackage (/usr/share/kibana/x-pack/plugins/fleet/server/services/epm/packages/install.js:193:11)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:95:5)\n    at async Promise.all (index 0)\n    at PackagePolicyService.create (/usr/share/kibana/x-pack/plugins/fleet/server/services/package_policy.js:133:33)\n    at createPackagePolicyHandler (/usr/share/kibana/x-pack/plugins/fleet/server/routes/package_policy/handlers.js:109:27)\n    at Router.handle (/usr/share/kibana/src/core/server/http/router/router.js:163:30)\n    at handler (/usr/share/kibana/src/core/server/http/router/router.js:124:50)\n    at exports.Manager.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)\n    at Object.internals.handler (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)\n    at exports.execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)\n    at Request._lifecycle (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:370:32)\n    at Request._execute (/usr/share/kibana/node_modules/@hapi/hapi/lib/request.js:279:9)"}

Did you try using the ingest pipeline to approach this problem?

Copy link
Contributor Author

@efd6 efd6 Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't, but I think a keyword analyzer here and lowercase and split on this pattern in the ingest should work. Nope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin Do you think we need to support analyzers or is there any workaround available?

Copy link
Member

@andrewkroh andrewkroh Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analyzers do appear to work given that we allow settings declared the data stream manifest.yml (source: https://github.com/elastic/package-spec/blob/a0687c0dc7a9da3fc540bdc7c5df2d7d84ae6713/versions/1/data_stream/manifest.spec.yml#L174-L176).

I tested this and it passed the system test.

diff --git a/packages/windows/data_stream/powershell_operational/fields/fields.yml b/packages/windows/data_stream/powershell_operational/fields/fields.yml
index 2049ba44..ae35dff3 100644
--- a/packages/windows/data_stream/powershell_operational/fields/fields.yml
+++ b/packages/windows/data_stream/powershell_operational/fields/fields.yml
@@ -105,10 +105,7 @@
       example: "50d2dbda-7361-4926-a94d-d9eadfdb43fa"
     - name: script_block_text
       type: text
-      analyzer:
-        powershell:
-          type: pattern
-          pattern: "[\\W&&[^-]]+"
+      analyzer: powershell_script_analyzer
       description: >
         Text of the executed script block.
 
diff --git a/packages/windows/data_stream/powershell_operational/manifest.yml b/packages/windows/data_stream/powershell_operational/manifest.yml
index 08b887b3..8eca400c 100644
--- a/packages/windows/data_stream/powershell_operational/manifest.yml
+++ b/packages/windows/data_stream/powershell_operational/manifest.yml
@@ -1,5 +1,13 @@
 type: logs
 title: Windows Powershell/Operational logs
+elasticsearch:
+  index_template:
+    settings:
+      analysis:
+        analyzer:
+          powershell_script_analyzer:
+            type: pattern
+            pattern: '[\W&&[^-]]+'
 streams:
   - input: winlog
     template_path: winlog.yml.hbs

The logs-windows.powershell_operational@settings component template is created as

Screen Shot 2021-10-19 at 2 23 41 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec issue created: elastic/package-spec#238

@efd6 efd6 marked this pull request as ready for review October 20, 2021 06:32
@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@andrewkroh
Copy link
Member

I think we would want the same analyzer applied to these as well. Right?

@efd6
Copy link
Contributor Author

efd6 commented Oct 21, 2021

The other question is whether the search analyzer should also be provided.

@andrewkroh
Copy link
Member

andrewkroh commented Oct 26, 2021

The other question is whether the search analyzer should also be provided.

I think it does need a search_analyzer. Otherwise a search term like Invoke-WebRequest will get split up and not match what has been indexed.

GET /.ds-logs-windows.powershell_operational-ep-2021.10.26-000001/_analyze
{
  "field" : "powershell.file.script_block_text",
  "text" : "(Invoke-WebRequest -Uri \"https://aka.ms/pscore6-docs\").Links.Href"
}
{
  "tokens" : [
    {
      "token" : "invoke-webrequest",
      "start_offset" : 1,
      "end_offset" : 18,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "-uri",
      "start_offset" : 19,
      "end_offset" : 23,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "https",
      "start_offset" : 25,
      "end_offset" : 30,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "aka",
      "start_offset" : 33,
      "end_offset" : 36,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "ms",
      "start_offset" : 37,
      "end_offset" : 39,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "pscore6-docs",
      "start_offset" : 40,
      "end_offset" : 52,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "links",
      "start_offset" : 55,
      "end_offset" : 60,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "href",
      "start_offset" : 61,
      "end_offset" : 65,
      "type" : "word",
      "position" : 7
    }
  ]
}
GET /_analyze
{
  "text" : "Invoke-WebRequest"
}
{
  "tokens" : [
    {
      "token" : "invoke",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "webrequest",
      "start_offset" : 7,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

…s on hyphen

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
@efd6 efd6 merged commit 4390268 into elastic:master Nov 16, 2021
@efd6 efd6 deleted the windows/powershell branch November 16, 2021 01:51
eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022
…s on hyphen (elastic#1931)

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
7.16-candidate bug Something isn't working, use only for issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hyphens ignored in script_block_text field - Powershell Logs
4 participants