[APM] Offer users upgrade to multi-metric job #119980

dgieselaar · 2021-11-30T15:48:03Z

elasticmachine · 2021-12-01T09:45:07Z

Pinging @elastic/apm-ui (Team:apm)

dgieselaar · 2021-12-01T09:46:13Z

@formgeist I did not implement the popover, partly due to time constraints but also because I'm wondering if it adds any value. It looks like it's mostly just an extended version of whatever is in the tooltip, and the user still has to navigate to the settings UI to be able to execute the upgrade. Or am I missing something here?

formgeist · 2021-12-01T11:35:38Z

@formgeist I did not implement the popover, partly due to time constraints but also because I'm wondering if it adds any value. It looks like it's mostly just an extended version of whatever is in the tooltip, and the user still has to navigate to the settings UI to be able to execute the upgrade. Or am I missing something here?

I'm sorry, but I'm not following which popover you're referring to?

dgieselaar · 2021-12-01T11:37:05Z

@formgeist ah, I thought this was a popover, I was fooled by the highlighting 😄 :

I'll have a look at that one.

formgeist · 2021-12-01T11:38:44Z

@formgeist ah, I thought this was a popover, I was fooled by the highlighting 😄
I'll have a look at that one.

My spotlight screenshot feature is pretty convincing 😆

sorenlouv · 2021-12-03T12:06:13Z

x-pack/plugins/apm/common/anomaly_detection/get_anomaly_detection_setup_state.ts

+// eslint-disable-next-line @kbn/eslint/no-restricted-paths
+import { FETCH_STATUS } from '../../public/hooks/use_fetcher';
+// eslint-disable-next-line @kbn/eslint/no-restricted-paths
+import type { APIReturnType } from '../../public/services/rest/createCallApmApi';
+import { ENVIRONMENT_ALL } from '../environment_filter_values';


Why is this file in /common if it's only used by on the client? Looks like those eslint warnings would go away if this was moved to /public

sorenlouv · 2021-12-03T12:21:39Z

x-pack/plugins/apm/public/components/app/Settings/anomaly_detection/jobs_list_status.tsx

+    datafeedState === DATAFEED_STATE.STARTED ||
+    datafeedState === DATAFEED_STATE.STARTING;
+
+  const isClosed =


nit

Suggested change

const isClosed =

const jobIsClosed =

sorenlouv · 2021-12-03T12:22:03Z

x-pack/plugins/apm/public/components/app/Settings/anomaly_detection/jobs_list_status.tsx

+        )}
+      </EuiBadge>
+    );
+  } else if (!isClosed) {


nit: flat if's are a bit easier to parse when there are many of them

Suggested change

} else if (!isClosed) {

}

if (!isClosed) {

sorenlouv · 2021-12-03T12:29:57Z

x-pack/plugins/apm/public/components/app/Settings/anomaly_detection/update_jobs_callout.tsx

+      .catch(() => {
+        setLoading(false);
+      });


Shouldn't we show a warning toast if the job creation fails? And shouldn't setLoading(false); be called regardless if it failed or not?

Suggested change

.catch(() => {

setLoading(false);

});

.catch(() => {

core.notification.toasts.addWarning({...})

})

.finally(() => {

setLoading(false);

})

sorenlouv · 2021-12-03T12:40:55Z

x-pack/plugins/apm/public/context/anomaly_detection_jobs/anomaly_detection_jobs_context.tsx

+      if (!isAuthorized) {
+        return;
+      }


Will this fix #118711 ?

sorenlouv

Looks good to me!

We've had some problems with duplicate jobs for the same environment being created. Is that something you think will be solved with this PR?

What happens if you create two jobs for the same environment in two different spaces? Is that supported? Should it be?

sorenlouv · 2021-12-03T12:47:06Z

x-pack/plugins/apm/server/routes/settings/anomaly_detection/route.ts

    if (!isActivePlatinumLicense(context.licensing.license)) {
      throw Boom.forbidden(ML_ERRORS.INVALID_LICENSE);
    }


I'm wondering if we still need to check the license or if the check can be removed (I assume ML does this already)

sorenlouv · 2021-12-03T12:53:04Z

x-pack/plugins/apm/server/lib/anomaly_detection/get_ml_jobs_with_apm_group.ts

+      const [jobs, allJobStats, allDatafeedStats] = await Promise.all([
+        anomalyDetectors
+          .jobs(APM_ML_JOB_GROUP)
+          .then((response) => response.jobs),


Shouldn't 404's be caught here as well?

Suggested change

.then((response) => response.jobs),

.then((response) => response.jobs)

.catch(catch404),

sorenlouv · 2021-12-03T12:55:03Z

x-pack/plugins/apm/server/lib/anomaly_detection/get_ml_jobs_with_apm_group.ts

+    } catch (e) {
+      return catch404(e) as ApmMlJob[];
    }


If you add the catch above I think this can be removed

Suggested change

} catch (e) {

return catch404(e) as ApmMlJob[];

}

} catch (e) {

return catch404(e) as ApmMlJob[];

}

sorenlouv · 2021-12-03T13:04:57Z

x-pack/plugins/apm/server/routes/settings/anomaly_detection/route.ts

+    if (!setup.ml) {
+      throw Boom.forbidden(ML_ERRORS.ML_NOT_AVAILABLE);
+    }
+
+    const jobs = await getMlJobsWithAPMGroup(setup.ml?.anomalyDetectors);


There is a helper getAnomalyDetectionJobs. All it does is calling through to getMlJobsWithAPMGroup if setup.ml is defined, else it'll throw like here.

Question: should we use the helper here? If not, perhaps the helper is not so useful and we can delete it.

formgeist

Great work! I suggested a couple of changes and I stumbled on the tooltip on the header link where the tooltip gets stuck on the next page.

formgeist · 2021-12-03T10:43:22Z

x-pack/plugins/apm/public/components/app/Settings/anomaly_detection/jobs_list_status.tsx

+          'xpack.apm.settings.anomalyDetection.jobList.warningStatusLabel',
+          {
+            defaultMessage:
+              'Job might not be running correctly. Go to the job management page in the ML app to get more information.',


Suggested change

'Job might not be running correctly. Go to the job management page in the ML app to get more information.',

'Job might be experiencing problems. Click the Manage Jobs link to learn more.',

Just felt the copy could use an update

formgeist · 2021-12-03T13:18:19Z

x-pack/plugins/apm/public/components/app/Settings/anomaly_detection/jobs_list.tsx

+            {i18n.translate(
+              'xpack.apm.settings.anomalyDetection.jobList.manageMlJobsButtonText',
+              {
+                defaultMessage: 'Manage ML Jobs',


Suggested change

defaultMessage: 'Manage ML Jobs',

defaultMessage: 'Manage jobs',

Just want to get rid of the ML label

formgeist · 2021-12-03T13:20:18Z

x-pack/plugins/apm/public/components/app/Settings/anomaly_detection/jobs_list.tsx

+          <EuiSwitch
+            checked={showLegacyJobs}
+            onChange={(e) => {
+              setShowLegacyJobs(e.target.checked);
+            }}
+            label={i18n.translate(
+              'xpack.apm.settings.anomalyDetection.jobList.showLegacyJobsCheckboxText',
+              {
+                defaultMessage: 'Show legacy jobs',
+              }
+            )}
+          />


Can we make the switch toggled on by default to show legacy jobs. I found that when I see the update callout and go to the page, I don't see my legacy jobs that needs to be updated.

If we instead toggle it on by default, they'll show up as expected. The user can choose to toggle it off once they're done updating and make the table more manageable.

It now shows legacy jobs if there are no other jobs. That seems like a reasonable compromise, without having to implement some setting mechanism backed by localStorage.

sorenlouv · 2021-12-03T13:28:53Z

x-pack/plugins/apm/public/components/app/Settings/anomaly_detection/update_jobs_callout.tsx

+          'xpack.apm.settings.anomalyDetection.jobsList.updateAvailableDescription',
+          {
+            defaultMessage:
+              'We have updated the anomaly detection jobs that provide insights into degraded performance and added detectors for throughput and failed transaction rate. If you choose to upgrade, we will create the new jobs and close the existing legacy jobs. The data shown in the APM app will automatically switch to the new.',


It's not super clear why the user should upgrade, or what the consequences of not upgrading are.
Perhaps we should mention transaction metrics directly:

We have detected that you are using transaction metrics (preaggregated transactions), which are not supported by your current ML jobs. Please upgrade to ensure your ML jobs work correctly. If you choose to upgrade, we will create the new jobs and close the existing legacy jobs. Your data from the legacy jobs will still be available to view in the ML app.

(wording can be improved)

IMO, I think the upgrade message clearly describes why they should upgrade and what happens after. We intentionally made the message more about the benefits of upgrading rather than trying to explain the technical aspects. I think your proposal will only make it more confusing to understand the why.

I agree that it's better to keep the implementation details away from the end user. Still, when reading the current message it feels like an optional upgrade, and not something we strongly urge users to do. ML team said that even if all transactions are sampled, they still wouldn't recommend showing transaction-based ML results on top of metric based data (which is what will happen until people upgrade afaik).

Either way, this will be easy to change down the road so we can release it as-is and adapt if we get feedback.

ML team said that even if all transactions are sampled, they still wouldn't recommend showing transaction-based ML results on top of metric based data (which is what will happen until people upgrade afaik).

That information wasn't available at the time when we made the choice to design the migration this way. If that's the case, I'd recommend us looking into a more strict migration in the next iteration. That could still just involve us changing the messaging and displays to say that we're not going to show the legacy job results in the app but only rely on the new jobs results. Do you think it's worth opening a new issue to track this?

I went back to see where ML said that we shouldn't mix metric events and transaction anomalies and found this from @sophiec20 :

think we would be safer with transaction charts showing transaction anomalies, and metric powered charts showing metric anomalies. The anomalies would probably be OK, but the expected bounds might be very different.

It's then followed up with an edit (that I had missed) saying:

expected bounds are for mean transaction duration, therefore it should be ok to plot providing sampling is representative

The way I read this we are okay to mix as long as end users sampling is representative.

Do you think it's worth opening a new issue to track this

No, let's marked this as resolved.

@dgieselaar quick question - is the ML job version tracked in our current telemetry? Just wondering if we can employ telemetry to figure out if our migration rate is too slow and then try another strategy to get users to convert. I also think once we will display expected bounds for all metrics, it'll make more sense to get jobs upgraded.

it's not, we would have to add it, which would require us to migrate to telemetry v2 first

OK, so we should create a new issue to implement it even if we do it in another iteration. I think the telemetry would be nice to have 👍

dgieselaar · 2021-12-06T15:22:10Z

Created a follow-up ticket to address remaining feedback: #120503

walterra · 2021-12-07T09:37:20Z

x-pack/plugins/apm/common/anomaly_detection/apm_ml_job.ts

+ * 2.0; you may not use this file except in compliance with the Elastic License
+ * 2.0.
+ */
+import { DATAFEED_STATE, JOB_STATE } from '../../../ml/common/constants/states';


With the changed export I guess this can be imported from '../../../ml/common' now?

cheers, updated!

walterra

ML changes LGTM

kibana-ci · 2021-12-07T13:04:59Z

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`apm`	1170	1172	+2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`ml`	279	281	+2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`apm`	2.7MB	2.8MB	+5.6KB
`ml`	3.5MB	3.5MB	-1.5KB
total			+4.1KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`apm`	45	46	+1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`ml`	37.7KB	38.3KB	+621.0B

Unknown metric groups

API count

id	before	after	diff
`ml`	283	285	+2

History

💔 Build #11688 failed ec1139f
💔 Build #11664 failed a0e4a7b
💔 Build #11381 failed 9829514
💔 Build #10316 failed cb21911
💔 Build #10279 failed daeaef5
💔 Build #10271 failed 8025301

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

kibanamachine · 2021-12-07T13:28:02Z

💚 Backport successful

Status	Branch	Result
✅	8.0

This backport PR will be merged automatically after passing CI.

Co-authored-by: Dario Gieselaar <dario.gieselaar@elastic.co>

dgieselaar added 2 commits November 30, 2021 16:45

[APM] Offer users upgrade to multi-metric job

dff3019

Closes elastic#112502

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

d4785d2

dgieselaar added release_note:enhancement Team:APM All issues that need APM UI Team support v8.0.0 auto-backport Deprecated - use backport:version if exact versions are needed v8.1.0 labels Nov 30, 2021

dgieselaar added 2 commits December 1, 2021 09:35

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

8df6489

Fix API test, alignment

1437407

dgieselaar marked this pull request as ready for review December 1, 2021 09:45

dgieselaar requested review from a team as code owners December 1, 2021 09:45

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

daeaef5

dgieselaar force-pushed the ml-job-upgrade branch from 8025301 to daeaef5 Compare December 1, 2021 10:15

Fix type errors & tests

cb21911

sorenlouv reviewed Dec 3, 2021

View reviewed changes

sorenlouv approved these changes Dec 3, 2021

View reviewed changes

formgeist reviewed Dec 3, 2021

View reviewed changes

sorenlouv reviewed Dec 3, 2021

View reviewed changes

dgieselaar added 3 commits December 6, 2021 11:33

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

0fbfcee

Copy changes

b6263dc

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

48260c0

dgieselaar mentioned this pull request Dec 6, 2021

[APM] Follow-up on ML feedback #120503

Closed

Copy update

9829514

dgieselaar added 2 commits December 7, 2021 10:25

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

8c96208

Upate ML API usage in API test

a0e4a7b

walterra reviewed Dec 7, 2021

View reviewed changes

dgieselaar added 2 commits December 7, 2021 11:48

Update imports for ML constants

c8af5d5

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

ec1139f

walterra approved these changes Dec 7, 2021

View reviewed changes

dgieselaar added 2 commits December 7, 2021 12:43

Merge branch 'main' of github.com:elastic/kibana into ml-job-upgrade

adc3310

Fix i18n errors

042adae

dgieselaar merged commit 0ed5cb9 into elastic:main Dec 7, 2021

dgieselaar deleted the ml-job-upgrade branch December 7, 2021 13:24

kibanamachine mentioned this pull request Dec 7, 2021

[8.0] [APM] Offer users upgrade to multi-metric job (#119980) #120605

Merged

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Dec 7, 2021

[APM] Offer users upgrade to multi-metric job (elastic#119980)

7bbb72d

kibanamachine added a commit that referenced this pull request Dec 7, 2021

[APM] Offer users upgrade to multi-metric job (#119980) (#120605)

9b21224

Co-authored-by: Dario Gieselaar <dario.gieselaar@elastic.co>

Heenawter pushed a commit to Heenawter/kibana that referenced this pull request Dec 8, 2021

[APM] Offer users upgrade to multi-metric job (elastic#119980)

7978a0a

sorenlouv added the apm:test-plan-8.0.0 label Dec 21, 2021

TinLe pushed a commit to TinLe/kibana that referenced this pull request Dec 22, 2021

[APM] Offer users upgrade to multi-metric job (elastic#119980)

7dc1bbb

MiriamAparicio mentioned this pull request Dec 28, 2021

[APM] Not able to update legacy jobs after creating a multi-metrics job before updating them #122073

Closed

MiriamAparicio added the apm:test-plan-done Pull request that was successfully tested during the test plan label Dec 28, 2021

This was referenced Dec 29, 2021

[APM] Show legacy jobs toggle is off when there are multiple jobs (v2, v3) #122133

Closed

[APM] Anomaly detection job status badges #122136

Closed

gbamparop pushed a commit to gbamparop/kibana that referenced this pull request Jan 12, 2022

[APM] Offer users upgrade to multi-metric job (elastic#119980)

d8ea05e

	.then((response) => response.jobs),
	.then((response) => response.jobs)
	.catch(catch404),

	'Job might not be running correctly. Go to the job management page in the ML app to get more information.',
	'Job might be experiencing problems. Click the Manage Jobs link to learn more.',

	defaultMessage: 'Manage ML Jobs',
	defaultMessage: 'Manage jobs',

[APM] Offer users upgrade to multi-metric job #119980

[APM] Offer users upgrade to multi-metric job #119980

Conversation

dgieselaar commented Nov 30, 2021 • edited Loading

elasticmachine commented Dec 1, 2021

dgieselaar commented Dec 1, 2021

formgeist commented Dec 1, 2021

dgieselaar commented Dec 1, 2021

formgeist commented Dec 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorenlouv left a comment • edited Loading

Choose a reason for hiding this comment

sorenlouv Dec 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

formgeist left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorenlouv Dec 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorenlouv Dec 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgieselaar commented Dec 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

walterra left a comment

Choose a reason for hiding this comment

kibana-ci commented Dec 7, 2021

💚 Build Succeeded

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

Public APIs missing exports

Page load bundle

API count

History

kibanamachine commented Dec 7, 2021

💚 Backport successful

dgieselaar commented Nov 30, 2021 •

edited

Loading

sorenlouv left a comment •

edited

Loading

sorenlouv Dec 3, 2021 •

edited

Loading

sorenlouv Dec 3, 2021 •

edited

Loading

sorenlouv Dec 6, 2021 •

edited

Loading