Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC] [RBAC] Adds bulk update route to rule registry and bulk update function to alerts client #106297

Merged
merged 9 commits into from Aug 9, 2021

Conversation

dhurley14
Copy link
Contributor

@dhurley14 dhurley14 commented Jul 20, 2021

Summary

Adds a bulk update route (POST /internal/rac/alerts/bulk_update) to the rule registry and bulkUpdate function to the alerts as data client.

To be used here: #107141

To test...
  1. Download apm server https://www.elastic.co/downloads/apm
  2. Create a sample application to send data to APM
// index.js
const apm = require('elastic-apm-node').start({
serviceName: 'demoapp',
  serverUrl: 'http://localhost:8200',
  // Set the service environment
  environment: 'production'
});

const express = require('express');
const app = express();
app.get('/', (req, res) => {
  throw Error('HELLO WORLD!!');
});

app.listen(3000, () => console.log('server running on 3000'));
  1. start elasticsearch, start kibana, start apm server (./apm-server -e) then start simple express server above
  2. create an apm error count rule (should generate alerts in .alerts-observability-apm
  3. execute x-pack/plugins/rule_registry/server/scripts/bulk_update_observability_alert_by_query.sh
  4. Should resolve with a 200

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@peluja1012 peluja1012 mentioned this pull request Jul 20, 2021
13 tasks
@dhurley14 dhurley14 force-pushed the bulk_update_rbac_alerts branch 2 times, most recently from 2e993d2 to 8860188 Compare July 29, 2021 19:11
@dhurley14 dhurley14 changed the title wip undo me [RAC] [RBAC] Adds bulk update route to rule registry and bulk update function to alerts client Jul 29, 2021
@dhurley14 dhurley14 self-assigned this Jul 29, 2021
@dhurley14 dhurley14 marked this pull request as ready for review July 29, 2021 20:48
@dhurley14 dhurley14 requested review from a team as code owners July 29, 2021 20:48
@botelastic botelastic bot added the Team:APM All issues that need APM UI Team support label Jul 29, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@dhurley14
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@cauemarcondes cauemarcondes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dhurley14 dhurley14 requested a review from a team as a code owner July 30, 2021 21:30
@dhurley14
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alerting changes LGTM with a small suggestion.

@@ -285,7 +286,7 @@ export class AlertingAuthorization {
if (this.authorization && this.shouldCheckAuthorization()) {
const { username, authorizedRuleTypes } = await this.augmentRuleTypesWithAuthorization(
this.ruleTypeRegistry.list(),
[ReadOperations.Find],
[operation == null ? ReadOperations.Find : operation],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are changing the name to getAuthorizationFilter (which makes sense to me), I would make the parameter operation non-optional and not default to ReadOperations.Find. Maybe maintain backwards compatibility by creating a getFindAuthorizationFilter fn that calls this fn and passes in ReadOperations.Find?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome will do! Thanks for the review @ymao1

@@ -52,11 +55,26 @@ export interface UpdateOptions<Params extends AlertTypeParams> {
index: string;
}

export interface BulkUpdateOptions<Params extends AlertTypeParams> {
ids: string[] | undefined | null;
status: 'open' | 'closed';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's been a lot of discussion around what statuses can be. Haven't followed, but might want to double check that these are indeed the terms we're going with and maybe create the type for it to be shared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alerts as data spreadsheet defines it as Must be one of {open, acknowledged, closed}. so I'll create an exported common type (maybe in the rule registry package where the technical terms are defined) and update that to include acknowledged

good catch!

Copy link
Contributor

@yctercero yctercero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left some nits, but looks great. Thanks for all the work here!

});
return bulkUpdateResponse;
} catch (exc) {
this.logger.error(`error in fetchAlertAuthzAlertOperateAlert ${exc}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be:

Suggested change
this.logger.error(`error in fetchAlertAuthzAlertOperateAlert ${exc}`);
this.logger.error(`error in fetchAlertAuditOperate ${exc}`);

return;
}

const config: EsQueryConfig = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to pull this out? Think we're using it in a couple places.


return alert;
return alert.hits.hits[0]._source;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a to do to remind us that we have to follow up to remove us off of _source?

@@ -128,12 +130,16 @@ describe('createLifecycleExecutor', () => {
{
fields: {
[ALERT_ID]: 'TEST_ALERT_0',
[OWNER]: 'CONSUMER',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update these to match so we no longer use OWNER and instead CONSUMER?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming this field should happen in another PR, but yes we do want to change OWNER to CONSUMER.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: #107857

@dhurley14 dhurley14 enabled auto-merge (squash) August 3, 2021 15:50
@dhurley14 dhurley14 disabled auto-merge August 3, 2021 16:07
@jportner jportner self-requested a review August 3, 2021 18:56
Copy link
Contributor

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For audit logging, we have two basic types of operations, write (update, delete, create) and read (get, find).

For the saved objects client, today we have these four paths:

Read operations success path:

  1. Check authorization
  2. Fetch object
  3. Add audit event success
  4. Return result

Read operations failure path:

  1. Check authorization
  2. Add audit event failure
  3. Return 403 error

Write operation success path:

  1. Check authorization
  2. Add audit event w/ outcome "unknown"
  3. Create/update/delete object
  4. Return result

Write operation failure path:

  1. Check authorization
  2. Add audit event w/ outcome "unknown"
  3. Create/update/delete object
  4. Return result

For consistency, I'd like for the alerts client to hew to these paths as much as possible.


It appears this PR is drafty/unfinished, which is fine, but it's very hard to review without unit tests in place. I went ahead and made a best effort at a first pass, but I'd like to ask that you add unit tests for the different positive/negative audit event scenarios. I would be happy to re-review at that point.

@dhurley14 dhurley14 requested a review from a team as a code owner August 3, 2021 22:50
@dhurley14 dhurley14 force-pushed the bulk_update_rbac_alerts branch 2 times, most recently from db2453b to 4dc9a29 Compare August 5, 2021 13:44
more WIP

I think this is working now

getting there..

working integration tests

fixes type check errors

fix jest

somehow this was deleted after rebasing...

fix early rejection error

ids array XOR query string in bulk update route

if ids, use elasticsearch bulk update method otherwise use update by query

fix tests

refresh true

fix tests

general cleanup

fixes apm jest test

fixes jest tests

code cleanup and fixed jest tests

WIP - trying to fix integration tests after refactor

fix tests, throw appropriate HTTP errors in alerts client

fix issues after rebasing with master

updates alerts client docs

general cleanup

fix trial integration tests

possibly fixes loader issue in package

updates getSafeSortIds param type to include 'null' in param type, renames getFindAuthorizationFilter -> getAuthorizationFilter and renames usage elsewhere, utilizes const for test in apm

make 'operation' parameter non-optional in getAuthorizationFilter and retain backwards compatibility by pointing the getFindAuthorizationFilter to getAuthorizationFilter and operation = ReadOperations.Find

adds an exported type to be used when setting a status on an alert

moves es query config out of client definition and into a function to allow for customized configurations

update by query should check for siem signals status field as well

return 200 when no alerts are found, not a 401 error, updates integration tests with stricter expects

updates alerts clients docs

fixing merge with master conflicts

adds missing required space ids field to mocked alerts to fix failing apm jest tests

implements suggested changes with auditing failures in bulk update by ids function, removes audit logging in catch blocks, cleans up other jest tests

execute ensure authorized in the search after loop, adds jest tests to make sure we log an error to the audit log for the case when ensureAuthorized throws an authorization error
Comment on lines 212 to 238
expect(auditLogger.log.mock.calls).toHaveLength(2);
expect(auditLogger.log.mock.calls[0][0]).toEqual({
message: `Failed attempt to update alert [id=${unsuccessfulAuthzHit}]`,
event: {
action: 'alert_update',
category: ['database'],
outcome: 'failure',
type: ['change'],
},
error: {
code: 'Error',
message: 'Unauthorized for fake.rule and apm',
},
});
expect(auditLogger.log.mock.calls[1][0]).toEqual({
message: `Failed attempt to update alert [id=${successfulAuthzHit}]`,
event: {
action: 'alert_update',
category: ['database'],
outcome: 'failure',
type: ['change'],
},
error: {
code: 'Error',
message: 'Unauthorized for fake.rule and apm',
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: Jest provides specific functions to deal with these situations, e.g.,

expect(auditLogger.log).toHaveBeenCalledTimes(2);
expect(auditLogger.log).toHaveBeenNthCalledWith(
  1,
  {...}
);
expect(auditLogger.log).toHaveBeenNthCalledWith(
  2,
  {...}
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed here: ba1857e


expect(auditLogger.log).toHaveBeenCalledWith({
error: undefined,
event: { action: 'alert_get', category: ['database'], outcome: 'success', type: ['access'] },
message: 'User has accessed alert [id=1]',
event: { action: 'alert_get', category: ['database'], outcome: 'unknown', type: ['access'] },
Copy link
Contributor

@jportner jportner Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get should never have outcome: 'unknown'; we know the outcome (we fetched the object and returned it to the user, this is a 'success' outcome).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Yes thank you. Good catch.

Fixed here: 36838ab

event: { action: 'alert_get', category: ['database'], outcome: 'success', type: ['access'] },
message: 'User has accessed alert [id=1]',
event: { action: 'alert_get', category: ['database'], outcome: 'unknown', type: ['access'] },
message: 'User is accessing alert [id=NoxgpHkBqbdrfX07MqXV]',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a test case for a get() that results in a successful audit event; what about an audit error when the user is unauthorized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think I actually removed the test case because with the get we were relying on a search executed with the authorization filter from the alerting framework to only give us back the data we were authorized to view, so there would be no authorization failure because we never attempted to access it in the first place.

I changed this so we are not solely rely on the authorization filter. We also call into ensureAuthorized on each search result (+ utilizing the authorization filter) so adding in an audit failure test makes sense again.

Fixed here: 36838ab

Copy link
Contributor

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I think I fat-fingered this and didn't submit my entire review at once!
I think a couple minor things need to be changed and hopefully a few more test cases added but otherwise this is looking great. Thanks for all of the changes so far!

@@ -181,33 +187,26 @@ describe('update()', () => {
outcome: 'unknown',
type: ['change'],
},
message: 'User is updating alert [id=1]',
message: 'User is updating alert [id=NoxgpHkBqbdrfX07MqXV]',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a test case for an update() that results in a successful audit event; what about an audit error when the user is unauthorized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup very true. Added an audit failure test here: 36838ab

alertAuditEvent({
action: operationAlertAuditActionMap[operation],
id: item._id,
outcome: 'unknown',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my other comment on the unit test: the outcome should only be 'unknown' if the action is an update.
If outcome is omitted and error is not present, the audit event is written as a a successful event (outcome: 'success')

Suggested change
outcome: 'unknown',
...(operation === WriteOperations.Update ? { outcome: 'unknown' } : {}),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: 36838ab

alertAuditEvent({
action: operationAlertAuditActionMap[operation],
id,
...(operation === WriteOperations.Update ? { outcome: 'unknown' } : { operation }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to pass operation in if this is not a write operation:

Suggested change
...(operation === WriteOperations.Update ? { outcome: 'unknown' } : { operation }),
...(operation === WriteOperations.Update ? { outcome: 'unknown' } : {}),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned this up here: 36838ab

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
observability 511.2KB 511.8KB +617.0B
securitySolution 6.5MB 6.5MB +617.0B
timelines 273.1KB 273.7KB +617.0B
total +1.8KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
infra 149.5KB 150.1KB +617.0B
uptime 35.1KB 35.7KB +617.0B
total +1.2KB
Unknown metric groups

API count

id before after diff
alerting 243 247 +4

API count missing comments

id before after diff
alerting 235 239 +4

References to deprecated APIs

id before after diff
alerting 23 26 +3

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @dhurley14

Copy link
Contributor

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, thanks again for being so responsive to feedback!

@dhurley14 dhurley14 merged commit ab43afa into elastic:master Aug 9, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Aug 9, 2021
…function to alerts client (elastic#106297)

Adds a bulk update route (POST /internal/rac/alerts/bulk_update) to the rule registry and bulkUpdate function to the alerts as data client.
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Aug 10, 2021
…function to alerts client (#106297) (#107965)

Adds a bulk update route (POST /internal/rac/alerts/bulk_update) to the rule registry and bulkUpdate function to the alerts as data client.

Co-authored-by: Devin W. Hurley <devin.hurley@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated: Automatically backport this PR after it's merged Feature:RAC label obsolete release_note:enhancement Team:APM All issues that need APM UI Team support Theme: rac label obsolete v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants