Skip to content

Conversation

@stevemessick
Copy link
Member

Needs testing, and tests. Some basic work is still not done, too.

@stevemessick stevemessick changed the title [WIP] Add more analytics Add more analytics Mar 14, 2022
@stevemessick
Copy link
Member Author

This is ready for review.

Note that due to changes in Testing there will be compilation errors when using any IntelliJ older than EAP. The Dart plugin API that is being used is apparently only available in the EAP builds.

@stevemessick stevemessick mentioned this pull request Mar 24, 2022
8 tasks
}

@Override
public void computedErrors(String path, List<AnalysisError> list) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this getting called for every file in a project?
If it is called for more than each root path, I'm concerned we are logging too much.

Copy link
Member Author

@stevemessick stevemessick Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every file, but all Dart files plus some unexpected files. All three AndroidManifest.xml files were checked plus pubspec.yaml. Also analysis_options.yaml.

FlutterInitializer.getAnalytics().sendTiming(E2E_IJ_COMPLETION_TIME, FAILURE, e2eCompletionMS); // test: logE2ECompletionErrorMS()
}

private void logAnalysisError(@Nullable AnalysisError error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned this is logging too much. If you have 10,000 errors, we are still trying to log 100 events.
What are we trying to achieve by logging the error codes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we log 100 events plus the analysis time of every error in an open editor. So, that's potentially worse. We'd have to check with @jwren for design rationale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to log far fewer events. Logging analytics needs to not be the reason why users have poor performance or complain our tools use too much of their bandwidth. I would suggest changing all these analytics so a single analytic event or a couple of events are logged summarizing the state of all errors reported rather than ever emitting events proportional to the number of errors or files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be straight-forward to change. I'll drop the call to logAnalysisError for each error and clean up unused elements. We already log the summary info in serverStatus.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! PTAL

public void computedErrors(String path, List<AnalysisError> list) {
assert list != null;
list.forEach(this::logAnalysisError);
pathToErrors.put(path, list);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if you restart the analysis server? Do we get a new event with new errors.
Wish we had a merged Dart and Flutter plugin so we didn't have to duplicate this logic that the analysis errors window already handles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It would make sense to check if the file had already been analyzed and, if the errors are the same, ignore it, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that test and asked for an opinion from jwren.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No right choice. We can discuss, there are pros and cons for both (as with all things). If you don't send the information you won't have the signal that users are re-analyzing which is a signal in-itself, but if you do send the information there will be fuller logs with the same information.

Copy link
Contributor

@jacob314 jacob314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments then LGTM.
Main thing I'm worried about is ensuring we don't start logging too much and cause performance problems particularly for c cases where an IDE is already struggling with 1000s of errors in the analysis errors view.

@stevemessick
Copy link
Member Author

@jacob314 My main concern is that we are doing a lot of work that 90% of our users won't see for a year. The new Dart API is (or was -- I have not checked this week) only present in the EAP version of the Dart plugin.

BTW you have to click the Approve button now. LGTM in comments is no longer sufficient to merge. No hurry, I have not yet looked into your questions.

@stevemessick stevemessick force-pushed the more-analytics branch 2 times, most recently from 132f677 to fdacbf5 Compare March 28, 2022 21:09
public void computedErrors(String path, List<AnalysisError> list) {
assert list != null;
list.forEach(this::logAnalysisError);
pathToErrors.put(path, list);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It would make sense to check if the file had already been analyzed and, if the errors are the same, ignore it, I think.

FlutterInitializer.getAnalytics().sendTiming(E2E_IJ_COMPLETION_TIME, FAILURE, e2eCompletionMS); // test: logE2ECompletionErrorMS()
}

private void logAnalysisError(@Nullable AnalysisError error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we log 100 events plus the analysis time of every error in an open editor. So, that's potentially worse. We'd have to check with @jwren for design rationale.

public void computedErrors(String path, List<AnalysisError> list) {
assert list != null;
List<AnalysisError> existing = pathToErrors.get(path);
if (existing != null && existing.equals(list)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwren Do you think this test makes sense? And will it be expensive?

Copy link
Contributor

@jacob314 jacob314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@stevemessick
Copy link
Member Author

@jacob314 @jwren After looking at the log file and seeing just how much stuff was getting sent I decided to throttle everything to one transmission per minute. The cumulative error counts are sent when the project is closed, and every two hours, for back-end percentile analysis. Both of those intervals can be adjusted.

if (lintCount > 0) {
analytics.sendEventMetric(DAS_STATUS_EVENT_TYPE, LINTS, lintCount); // test: serverStatus()
}
errorCount = warningCount = hintCount = lintCount = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we zero out the counts? This seems wrong. I would expect these #s should match the # of errors reported in the analysis server window. As is, I'm not clear how I would interpret these #s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are accumulated while analysis is active, then sent when analysis is complete. They need to be zeroed so the accumulated values are accurate.


void logE2ECompletionSuccessMS(long e2eCompletionMS) {
FlutterInitializer.getAnalytics().sendTiming(E2E_IJ_COMPLETION_TIME, SUCCESS, e2eCompletionMS); // test: logE2ECompletionSuccessMS()
maybeReport(true, (analytics) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the completion time, rather than throttling, what you could alternately report a single metric on intellij close that reports the P50, P90, and P95 times for the entire session.
That would make the completion time numbers less noisy than filtering to only report 1 completion per minute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment below for why we cannot do it when exiting.

if (IS_TESTING) {
errorCount = warningCount = hintCount = lintCount = 1;
}
maybeReportErrorCounts();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you manually verified that these #s match what the analyzer window shows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while, but yes.

@jacob314
Copy link
Contributor

Rather than rate limiting, I think we should emit summary statistics when the IntelliJ session is closed. The reason is that just rate limiting like this can cause the data to be a bit noisy and skewed.
For example: right now we'd over sample the very first autocompletion returned right when a user starts typing which might not be representative of autocompletions when users are in the middle of typing. Imagine a user that types for a bit then pauses for a minute to run a build or read code.

Example summary statistics format that I think would work better:

{
  event: 'autocompleteE2E',
  P50Time: 73,
  P90Time: 350,
  P95Time: 700 ,
  count: 2000,
}

This indicates that the user had 2000 autocomplete events with a P50 time of 73, P90 time of 350, and P95 time of 700.
That way few events are sent to analytics but there is enough data to compute P50, P90, or P95 times across users.
Fyi @jwren who might have some ideas based on how similar problems have been solved in g3.

@stevemessick
Copy link
Member Author

stevemessick commented Apr 12, 2022

We can't do all the computation and reporting at exit. IntelliJ enforces a strict, limited time for exit processing (i.e. project close). We can't know what else is going on, so relying on it would potentially limit the data we would collect.

@stevemessick
Copy link
Member Author

how similar problems have been solved in g3

Percentiles are computed on the server. The same analytic events are used here as are used there.

My statistics is rusty, and my stats book doesn't even mention this, but I'm not sure you'd get the same percentiles if you tried combining percentiles from samples rather than computing the population percentiles on the server.

Copy link
Member

@jwren jwren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jacob314
Copy link
Contributor

We can't do all the computation and reporting at exit. IntelliJ enforces a strict, limited time for exit processing (i.e. project close). We can't know what else is going on, so relying on it would potentially limit the data we would collect.

Ok I'm ok with sampling for most things instead. Perhaps I'm being too conservative on how many analytics events we should send.

@jacob314
Copy link
Contributor

I agree the median of the median is not the same as the median of all the data but in some ways it is the metric we really want.
For metrics like this what I'm looking for is something I can conceptualize.
For example, the statement I want to make is that for 90% of our users, 90% of their completions are less than 200ms. Framed like that, there is no harm in aggregating locally as it is actually exactly what we want.

@jacob314
Copy link
Contributor

lgtm
Lets land this and perhaps iterate on summarizing some of the metrics.

@stevemessick stevemessick merged commit d4072cf into master Apr 14, 2022
@stevemessick stevemessick deleted the more-analytics branch April 14, 2022 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants