Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[O365] | Migrate to CEL input #6621

Merged
merged 30 commits into from Jul 19, 2023
Merged

Conversation

kcreddy
Copy link
Contributor

@kcreddy kcreddy commented Jun 20, 2023

What does this PR do?

Adds CEL input for fetching events from O365 Management API. Deprecates the existing o365audit input.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

  • Run elastic-package stack down && elastic-package build && elastic-package stack up -d -v && eval "$(elastic-package stack shellinit)" && elastic-package test system --generate -v
    Produces:
.......
2023/07/12 13:01:10 DEBUG found 0 hits in logs-o365.audit-ep data stream: index_not_found_exception: no such index [logs-o365.audit-ep] Status=404
2023/07/12 13:01:11 DEBUG found 0 hits in logs-o365.audit-ep data stream: index_not_found_exception: no such index [logs-o365.audit-ep] Status=404
2023/07/12 13:01:12 DEBUG found 0 hits in logs-o365.audit-ep data stream
2023/07/12 13:01:13 DEBUG found 27 hits in logs-o365.audit-ep data stream
........
........
2023/07/12 13:02:10 DEBUG found 0 hits in logs-o365.audit-ep data stream
2023/07/12 13:02:11 DEBUG found 0 hits in logs-o365.audit-ep data stream
2023/07/12 13:02:12 DEBUG found 0 hits in logs-o365.audit-ep data stream
2023/07/12 13:02:13 DEBUG found 0 hits in logs-o365.audit-ep data stream
2023/07/12 13:02:14 DEBUG found 6 hits in logs-o365.audit-ep data stream
.........
.........
--- Test results for package: o365 - START ---
╭─────────┬─────────────┬───────────┬───────────┬────────┬───────────────╮
│ PACKAGE │ DATA STREAM │ TEST TYPE │ TEST NAME │ RESULT │  TIME ELAPSED │
├─────────┼─────────────┼───────────┼───────────┼────────┼───────────────┤
│ o365    │ audit       │ system    │ o365audit │ PASS   │ 27.106824625s │
│ o365    │ audit       │ system    │ cel       │ PASS   │ 26.534162792s │
╰─────────┴─────────────┴───────────┴───────────┴────────┴───────────────╯
--- Test results for package: o365 - END   ---
Done

Related issues

Screenshots

@jamiehynds
Copy link

@kcreddy can you confirm the behaviour when a user of the current O365 integration updates to the new version that uses CEL? Will the switch to CEL be automatic without any user intervention or will httpjson be disabled and a user has re-enter their O365 API manually?

@kcreddy
Copy link
Contributor Author

kcreddy commented Jun 21, 2023

Hey @jamiehynds

Will the switch to CEL be automatic without any user intervention or will httpjson be disabled and a user has re-enter their O365 API manually?

Users would have to setup the new CEL input manually. The input parameters are pretty similar to what they have been using with O365 Audit input ie., Azure Client ID, Tenant ID, Secret, etc,. The package and datastream the events end up is also same. i.e., data_stream.dataset: o365.audit. In other words we are not creating a new package or a datastream.

The screenshot of setup looks like this:
Screenshot 2023-06-21 at 10 33 29 AM
Screenshot 2023-06-21 at 10 33 44 AM

can you confirm the behaviour when a user of the current O365 integration updates to the new version that uses CEL?

After getting suggestions from @andrewkroh, the behaviour would result in some duplicate events for users. That is because when the users upgrade the integration, the index rollover happens. So, even the existing fingerprint processor inside ingest pipeline on event's Id field would still create duplicates for last few days (up until 7 days back). We have the leverage to decide how far back we want to start pulling the events. We could also give the users ability to choose their start time. Reference
The behaviour still needs to be tested though.

@elasticmachine
Copy link

elasticmachine commented Jun 21, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-18T14:59:39.652+0000

  • Duration: 16 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 25
Skipped 0
Total 25

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Jun 21, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (1/1) 💚
Files 100.0% (1/1) 💚
Classes 100.0% (1/1) 💚
Methods 100.0% (16/16) 💚 39.024
Lines 80.994% (733/905) 👎 -18.865
Conditionals 100.0% (0/0) 💚

@jamiehynds
Copy link

@kcreddy is the user prompted to input the credentials once they've upgraded the integration? We need to minimise the chances of a user upgrade the package, saving it and not realising that ingestion has stopped until they enter their Client ID & Secret. If we enable the datastream by default, I presume they won't be able to save the integration and will have to populate the details before continuing? We should also make this obvious in the changelog too.

I'd suggest adjusting the description of the datastreams too:
Old datastream using O365 input: "Collect Office 365 Audit Logs - Deprecated"
New datastream using CEL: "Collect Office 365 Audit Logs via Graph API" (assuming we're using the Graph API?)

@kcreddy
Copy link
Contributor Author

kcreddy commented Jul 11, 2023

Hey @jamiehynds,
Below is how the upgrade scenario is going to look like for the user:

O365-upgrade.1.mp4

Basically, the user will have to upgrade the integration policy too once the integration is upgraded to latest. Once that integration policy is updated, and since the new CEL input is enabled by default, the user will have to provide details for the CEL input to start. I have updated the title and description in the previous o365audit input to Deprecated. I also updated the README docs with the steps to follow when upgrading the integration from < 1.17.0 to >= 1.17.0. Let me know if anything is missing or needs to be added/changed.

Also, during the tests, I noted that duplicates are not getting created. So, we are good there.

I didn't use the Graph API, since not all of the Office 365 and Azure AD activity events are available under Graph API. We are still using the Office 365 Management Activity API which is still actively supported.

@kcreddy
Copy link
Contributor Author

kcreddy commented Jul 12, 2023

/test

@kcreddy kcreddy marked this pull request as ready for review July 12, 2023 07:40
@kcreddy kcreddy requested a review from a team as a code owner July 12, 2023 07:40
@elasticmachine
Copy link

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@kcreddy kcreddy self-assigned this Jul 12, 2023
@jamiehynds
Copy link

@kcreddy thanks for putting the video together - really helpful.

When a user upgrades the the integration the 'upgrade integration policies' option is selected, but in this case the policy isn't updated. It's probably a bit misleading, but not sure we can solve for that.

After upgrading the integration, a user has to go to integration policies to enable the new input and then upgrade the policy. I worry that users may not inattentively know to go to integration policies after the integration upgrade.

I don't think people will know what CEL is. Could we adjust the wording of the new datastream to 'Collect Office 365 Audit Logs via Graph API' (or whatever API we're hitting)?

Am I correct in saying v1.17 of the integration will not include the deprecated input? So for new users using the O365 integration for the first time, will only have the option to use the CEL input, or will they also see the deprecated input too?

@jamiehynds
Copy link

Hey @jamiehynds, Below is how the upgrade scenario is going to look like for the user:

O365-upgrade.1.mp4
Basically, the user will have to upgrade the integration policy too once the integration is upgraded to latest. Once that integration policy is updated, and since the new CEL input is enabled by default, the user will have to provide details for the CEL input to start. I have updated the title and description in the previous o365audit input to Deprecated. I also updated the README docs with the steps to follow when upgrading the integration from < 1.17.0 to >= 1.17.0. Let me know if anything is missing or needs to be added/changed.

Also, during the tests, I noted that duplicates are not getting created. So, we are good there.

I didn't use the Graph API, since not all of the Office 365 and Azure AD activity events are available under Graph API. We are still using the Office 365 Management Activity API which is still actively supported.

@amitkanfer not sure if there's any improvements planned on the Fleet side, but this is a good example of a case where we have to make a signifiant change to an integration (changing an input) and difficult to inform users on the impact of the change. It's also a challenge to guide users on the steps to follow to ensure ingestion continues post-upgrade. In this case they have to disable the old data stream and re-enter their creds in a new data stream. While we try to keep these types of changes to a minimum, we find ourselves making them incases where a vendor deprecates an API or we have to switch inputs within packages. No action needed now, just an FYI.

@kcreddy
Copy link
Contributor Author

kcreddy commented Jul 17, 2023

Hey @jamiehynds
Update - the version is now 1.18.0. So, the old o365audit input is now at 1.17.0

When a user upgrades the the integration the 'upgrade integration policies' option is selected, but in this case the policy isn't updated. It's probably a bit misleading, but not sure we can solve for that.
After upgrading the integration, a user has to go to integration policies to enable the new input and then upgrade the policy. I worry that users may not inattentively know to go to integration policies after the integration upgrade.

That is in fact correct and is misleading the users to think that upgrade is successful without actually upgrading the integration policy. But I am not sure what we can do here rather than providing a README documentation. Maybe the Fleet UI could redirect the user automatically to integration policies tab when there is such Review field conflicts warning comes up.

I don't think people will know what CEL is. Could we adjust the wording of the new datastream to 'Collect Office 365 Audit Logs via Graph API' (or whatever API we're hitting)?

I updated the manifest.yml title to Collect Office 365 audit logs via the Management Activity API using CEL Input

Am I correct in saying v1.17 of the integration will not include the deprecated input? So for new users using the O365 integration for the first time, will only have the option to use the CEL input, or will they also see the deprecated input too?

New version v1.18 includes the deprecated input as well, but is disabled by default. So, new users also see the deprecated input as well. The CEL input is enabled by default. So, they need to configure the required parameters to save and make it run.


Also, I have now recorded a new video where I totally removed the deprecated input o365audit from the manifest.yml. Even now the users will have to manually upgrade the integration policy after upgrading the integration. But inside the policy UI, there is no deprecated input. It is bit cleaner, but the drawback could be that if something went wrong and users would like to continue reusing the old input, they would need to downgrade the integration version.

Here's the video of how it would look like without the deprecated input. Please let me know which of these you prefer, if the new integration version needs to contain or not contain the old deprecated input.

o365-upgrade-without-old-input.mp4

@jamiehynds
Copy link

Also, I have now recorded a new video where I totally removed the deprecated input o365audit from the manifest.yml. Even now the users will have to manually upgrade the integration policy after upgrading the integration. But inside the policy UI, there is no deprecated input. It is bit cleaner, but the drawback could be that if something went wrong and users would like to continue reusing the old input, they would need to downgrade the integration version.

Here's the video of how it would look like without the deprecated input. Please let me know which of these you prefer, if the new integration version needs to contain or not contain the old deprecated input.

@kcreddy while removing the deprecated input is certainly cleaner, I think given the popularity of the integration and the extent the the httpjson to CEL change, it makes more sense to have an escape hatch available, at least for a release or two as users transition to the upgraded integration. So I'd vote to keeping the deprecated input as fallback for now. WDYT @andrewkroh?

@andrewkroh
Copy link
Member

If the deprecated input remains for a release or two, then reverting to httpjson would be a matter or enabling httpjson and disabling cel via the existing toggle. The user would not need to enter any new config because the old config would be saved. Correct?

If the httpjson input was removed, what's the level of difficulty in downgrading the package version if something goes wrong?

@kcreddy
Copy link
Contributor Author

kcreddy commented Jul 18, 2023

If the deprecated input remains for a release or two, then reverting to httpjson would be a matter or enabling httpjson and disabling cel via the existing toggle. The user would not need to enter any new config because the old config would be saved. Correct?

The old input for this package is o365audit. But yes, I verified that the old config is being saved. Hence the user can just disabled CEL input and enable the old o365audit input.

If the httpjson input was removed, what's the level of difficulty in downgrading the package version if something goes wrong?

That would mean they would need to uninstall the integration and reinstall the older version. I found this tracking issue: elastic/kibana#131801. Currently there does not seem to be an easier way to downgrade the package versions in Fleet.

@andrewkroh
Copy link
Member

Because downgrading is complicated and the user would have lost their existing configuration for the o365audit input, I think keeping the input around for a brief transition period will help if there is some unknown issue.

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issues only.

The CEL code issues can be a future optimisation if you prefer.

"content_created_at": { "temp" : list_contents_resp_body}.collate("temp.contentCreated").max(),
"next_page": (has(list_contents_resp.Header) && has(list_contents_resp.Header.NextPageUri) ) ? list_contents_resp.Header.NextPageUri[0] : (has(list_contents_resp.Header) && has(list_contents_resp.Header.Nextpageuri)) ? list_contents_resp.Header.Nextpageuri[0] : "",
// keep fetching more if (nextpageuri exists) or (max time returned date != today's date)
"want_more_content": ( (has(list_contents_resp.Header) && (has(list_contents_resp.Header.NextPageUri) && (list_contents_resp.Header.NextPageUri.size() > 0)) || (has(list_contents_resp.Header.Nextpageuri) && (list_contents_resp.Header.Nextpageuri.size() > 0)) ) || ( { "temp" : list_contents_resp_body}.collate("temp.contentCreated").max().split('T')[0] != now().format("2006-01-02") ) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made less expensive. The important thing here is that || and && are not short circuiting, so the collation is always evaluated, even when the first or second operands of the first two disjunctions are true. Normally, I don't worry about this, but in this case, collate is reasonably expensive and so we should avoid it if possible.

There's notes about this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan for the suggestions. I will definitely take it as an enhancement.

Comment on lines +125 to +127
has(list_contents_resp.StatusCode)
&& list_contents_resp.StatusCode == 200
&& (list_contents_resp.Request.URL.parse_url().RawQuery.parse_query().endTime[0].split('T')[0] != now().format("2006-01-02"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do a similar thing here.

@kcreddy kcreddy merged commit f151ba8 into elastic:main Jul 19, 2023
4 checks passed
@elasticmachine
Copy link

Package o365 - 1.18.0 containing this change is available at https://epr.elastic.co/search?package=o365

gizas pushed a commit that referenced this pull request Sep 5, 2023
* Add DS

* Add inputs

* cel inp

* Update CEL program

* remove comments

* udpated prog

* upd

* upd

* update prog

* udpate state

* working

* update prog

* working default status

* update prog

* add want moer

* update system tests with pagination

* remove and add comments and lint

* update manifest with deprecated

* update readme

* explicit disable

* fix stack version and update readme

* update readme per suggestion

* address pr comments

* address pr comment - update variable

* change to now()

* refactor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:o365
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[O365] Migrate to CEL input
5 participants