Skip to content
This repository has been archived by the owner. It is now read-only.

feat(metrics): Add metrics gathering. #1146

Merged
merged 1 commit into from May 23, 2014
Merged

feat(metrics): Add metrics gathering. #1146

merged 1 commit into from May 23, 2014

Conversation

@shane-tomlinson
Copy link
Member

@shane-tomlinson shane-tomlinson commented May 21, 2014

  • Use SpeedTrap to collect client side metrics
  • Add screen views and errors to an event stream.
  • Metrics are sent to the /metrics endpoint on document unload.
  • Data that is sent to the backend is filtered - only data we expect is sent.
  • auth-errors.toMessage now accepts forceMessage
@@ -80,6 +80,7 @@ function (

this._window = options.window || window;
this._window.router = this._router = options.router || new Router();

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

What's this all about?

return this._send(data, url, false);
},

getAllData: function () {

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

Data doesn't really provide anything useful. Getting rid of it seems worse.

@@ -111,6 +135,9 @@ function (
},

showView: function (viewToShow) {
var url = this.window.location.pathname;
this.metrics.events.capture('screen:' + urlToScreenName(url));

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

This needs a test

};
}

function urlToScreenName(url) {

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

This needs a test. Move to app/scripts/lib/url.

* file, You can obtain one at http://mozilla.org/MPL/2.0/. */


(function(define){ define(function (require,exports,module,undefined){

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

It would be awesome to move this to bower and out of this repo.

@@ -287,6 +298,7 @@ function (_, Backbone, $, p, Session, AuthErrors, FxaClient, Url, Strings, Ephem
this.hideSuccess();
this.$('.spinner').hide();

this.metrics.events.capture('error:' + this.errorCode(err));

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

Add a test.

@@ -317,6 +329,7 @@ function (_, Backbone, $, p, Session, AuthErrors, FxaClient, Url, Strings, Ephem
this.hideSuccess();
this.$('.spinner').hide();

this.metrics.events.capture('error:' + this.errorCode(err));

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

Add a test.

@@ -258,6 +265,10 @@ function (_, Backbone, $, p, Session, AuthErrors, FxaClient, Url, Strings, Ephem
return !!this._isSuccessVisible;
},

errorCode: function (err) {

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

This should be in auth-errors.js

var msg = t('Unknown account. <a href="/signup">Sign up</a>');
return self.displayErrorUnsafe(msg);
err.forceMessage = t('Unknown account. <a href="/signup">Sign up</a>');
return self.displayErrorUnsafe(err);
} else if (AuthErrors.is(err, 'USER_CANCELED_LOGIN')) {
// if user canceled login, just stop

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

Log cancellations.

var roundedDate = new Date();
roundedDate.setTime(roundedTime);

return roundedDate;

This comment has been minimized.

@pdehaan

pdehaan May 21, 2014
Contributor

I'm mildly lost. What are we doing here?
You just want the current date sans time?

Couldn't we just stomp the time bits using something like setHours(), or am I missing some edge cases?

var today = new Date();
today.setHours(0, 0, 0, 0);
console.log(today); // Wed May 21 2014 00:00:00 GMT-0700 (PDT)

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

We could, this is largely copy/paste code from Persona.

This comment has been minimized.

this._date = getDate();

this._context = Url.searchParam('context');
this._service = Url.searchParam('sync');

This comment has been minimized.

@pdehaan

pdehaan May 21, 2014
Contributor

Sorry, I had to read this a couple times and I should probably know the answer, but should the searchParam be "sync" here? Will this support other services in the future?

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

No, you are right! that points out a missing test.

@pdehaan pdehaan added the WIP label May 21, 2014
return filteredData;
},

toErrorEvent: function (err) {

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

This needs a better name. We aren't creating an event, but rather creating an error event identifier.

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

Where is this better, here or in AuthErrors?

@@ -53,7 +57,17 @@ function (

function showView(View, options) {
return function () {
this.showView(new View(options || {}));
options = options || {};

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

This line isn't needed since we are effectively repeating ourselves a few lines down.

@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 21, 2014

@kparlante - here is the initial client side work, and a dumb metrics endpoint on the server side that just logs the metrics to the console. Could you have a look and see what you think?

The format is similar to yesterday, but with the changes I wrote to you about. This is an example:

{
  "date": "2014-05-21T00:00:00.000Z",
  "navigationTiming": {
    "navigationStart": 0,
    "unloadEventStart": null,
    "unloadEventEnd": null,
    "redirectStart": 1,
    "redirectEnd": 1,
    "fetchStart": 1,
    "domainLookupStart": 1,
    "domainLookupEnd": 1,
    "connectStart": 1,
    "connectEnd": 1,
    "secureConnectionStart": null,
    "requestStart": 2,
    "responseStart": 3,
    "responseEnd": 3,
    "domLoading": 5,
    "domInteractive": 67,
    "domContentLoadedEventStart": 73,
    "domContentLoadedEventEnd": 77,
    "domComplete": 101,
    "loadEventStart": 101,
    "loadEventEnd": 107
  },
  "duration": 30237,
  "timers": {},
  "events": [
    {
      "type": "screen:signup",
      "offset": 362
    },
    {
      "type": "screen:signin",
      "offset": 6178
    },
    {
      "type": "error:102",
      "offset": 10249
    },
    {
      "type": "screen:signup",
      "offset": 12300
    },
    {
      "type": "screen:confirm",
      "offset": 17759
    }
  ]
}

Any time a screen is diaplayed, an entry is added to the event log.
Any time an error is shown, an entry is added to the event log of the format error:<errno>

I went through each screen and hooked up other interesting events:

  • complete_reset_password:link-damaged
  • complete_reset_password:link-expired
  • complete_signn_up:link-damaged
  • confirm:too-many-attempts
  • confirm:resend
  • confirm_reset_password:resend
  • login:canceled

I think that is a full list.

@zaach, @pdehaan, @ckarlof - could you do an initial review? It'd be nice to have the client side reporting results before I leave, if possible!

@@ -111,6 +133,9 @@ function (
},

showView: function (viewToShow) {
var path = this.window.location.pathname;
this.metrics.events.capture('screen:' + Url.pathToScreenName(path));

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

Should this be done here or in view instantiation?

return router.showView(signUpView);
})
.then(function () {
assert.ok($('#fxa-signup-header').length);
// if there is a back button, it can be shown now.
assert.equal(Session.canGoBack, true);

var events = metrics.getFilteredData().events;

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

Use TestHelpers.isEventLogged here.

@@ -19,9 +19,11 @@
<script data-main="main" src="/bower_components/requirejs/require.js"></script>
<script>
// Avoid caching files that are loaded with Require.JS
/*

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 21, 2014
Author Member

This is because of #1145 and can be reverted until we decide how to handle caching in dev.

@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 21, 2014

@shane-tomlinson I looked through the code and it looked good (although I have no idea what I'm doing).
If I grab this branch locally and run it, how would I check the logged stats?


route.process = function(req, res) {

logger.error('metrics: %s', JSON.stringify(req.body, null, 2));

This comment has been minimized.

@pdehaan

pdehaan May 21, 2014
Contributor

Curious, why error (aka: server.metrics.ERROR) here?
I usually think of error as "OMG, SOMETHING IS WRONG" versus "hey, have some stats".

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 22, 2014
Author Member

@pdehaan - The reason is pretty mundane. I permanently turn on iterm2's search to see the ERROR highlighted as it goes by. I could just as easily search for another keyword, but I'm lazy and didn't think of it until you brought it up. This will be removed.

@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 21, 2014

So far, so good. I'm seeing this in my stdout:

[2014-05-21T16:29:58.155Z] server.metrics.ERROR: metrics: {
  "date": "2014-05-21T00:00:00.000Z",
  "navigationTiming": {
    "navigationStart": 0,
    "unloadEventStart": 23,
    "unloadEventEnd": 30,
    "redirectStart": null,
    "redirectEnd": null,
    "fetchStart": 0,
    "domainLookupStart": 0,
    "domainLookupEnd": 0,
    "connectStart": 0,
    "connectEnd": 0,
    "secureConnectionStart": null,
    "requestStart": 9,
    "responseStart": 22,
    "responseEnd": 23,
    "domLoading": 33,
    "domInteractive": 48,
    "domContentLoadedEventStart": 48,
    "domContentLoadedEventEnd": 48,
    "domComplete": 105,
    "loadEventStart": 105,
    "loadEventEnd": 105
  },
  "duration": 62486,
  "timers": {},
  "events": [
    {
      "type": "screen:settings",
      "offset": 296
    },
    {
      "type": "screen:change_password",
      "offset": 1986
    },
    {
      "type": "error:103",
      "offset": 16827
    },
    {
      "type": "error:104",
      "offset": 27887
    },
    {
      "type": "screen:confirm",
      "offset": 42844
    },
    {
      "type": "confirm:resend",
      "offset": 44666
    },
    {
      "type": "confirm:too-many-attempts",
      "offset": 46420
    },
    {
      "type": "confirm:resend",
      "offset": 46421
    }
  ]
}
[2014-05-21T16:29:58.155Z] server.requests.INFO: POST /metrics 200 1
@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 21, 2014

Sorry, more fun and noise for @shane-tomlinson's amusement:

{
  "date": "2014-05-21T00:00:00.000Z",
  "navigationTiming": {
    "navigationStart": 0,
    "unloadEventStart": 21,
    "unloadEventEnd": 26,
    "redirectStart": null,
    "redirectEnd": null,
    "fetchStart": 0,
    "domainLookupStart": 0,
    "domainLookupEnd": 0,
    "connectStart": 0,
    "connectEnd": 0,
    "secureConnectionStart": null,
    "requestStart": 11,
    "responseStart": 19,
    "responseEnd": 20,
    "domLoading": 29,
    "domInteractive": 49,
    "domContentLoadedEventStart": 49,
    "domContentLoadedEventEnd": 49,
    "domComplete": 90,
    "loadEventStart": 90,
    "loadEventEnd": 90
  },
  "duration": 117353,
  "timers": {},
  "events": [
    {
      "type": "screen:settings",
      "offset": 271
    },
    {
      "type": "screen:signin",
      "offset": 273
    },
    {
      "type": "error:Session expired. Sign in to continue.",
      "offset": 279
    },
    {
      "type": "screen:settings",
      "offset": 12021
    },
    {
      "type": "screen:change_password",
      "offset": 14395
    },
    {
      "type": "screen:settings",
      "offset": 25467
    },
    {
      "type": "screen:change_password",
      "offset": 31495
    },
    {
      "type": "screen:settings",
      "offset": 33307
    },
    {
      "type": "screen:signin",
      "offset": 39628
    },
    {
      "type": "error:103",
      "offset": 77301
    },
    {
      "type": "screen:settings",
      "offset": 82187
    },
    {
      "type": "screen:delete_account",
      "offset": 90479
    },
    {
      "type": "error:103",
      "offset": 98151
    },
    {
      "type": "error:103",
      "offset": 105475
    },
    {
      "type": "screen:signup",
      "offset": 111997
    }
  ]
}

Verified my account locally and doing a bit of password changing, signing out, signing back in, deleting accounts, etc.
One thing of note above may be this snippet, which seemed mildly-inconsistent:

    {
      "type": "error:Session expired. Sign in to continue.",
      "offset": 279
    }

Other things we can argue/hug-out are underscores-vs-dashes:

    {
      "type": "screen:change_password",
      "offset": 1986
    },
    {
      "type": "confirm:too-many-attempts",
      "offset": 46420
    },
    {
      "type": "screen:delete_account",
      "offset": 90479
    },
@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 22, 2014

"type": "error:Session expired. Sign in to continue.",

Awesome, you found a case I missed.

@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 22, 2014

"type": "error:Session expired. Sign in to continue.",

There is no error code for this error.

I made this error code 1002

@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 22, 2014

@pdehaan - I have normalized event names to use _

@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 22, 2014

This gets a 👍 from me.

@zaach, @ckarlof - Can you two take a quick look at this Uber PR before @shane-tomlinson scampers off on holidays?

@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 22, 2014

I should probably squash these commits before we merge.

@ckarlof
Copy link
Contributor

@ckarlof ckarlof commented May 22, 2014

If it's ready for final review, let's get the WIP stuff off of it.

@shane-tomlinson shane-tomlinson changed the title WIP: feature: Add metrics gathering. feature: Add metrics gathering. May 22, 2014
@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 22, 2014

@ckarlof - wip removed.

@ckarlof ckarlof removed the WIP label May 22, 2014
@ckarlof ckarlof added this to the train-14 [ux] (Jun 2) milestone May 22, 2014
@pdehaan pdehaan changed the title feature: Add metrics gathering. feat: Add metrics gathering. May 22, 2014
@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 22, 2014

Renaming title to "feat" (was "feature"), per #1123

@ckarlof
Copy link
Contributor

@ckarlof ckarlof commented May 22, 2014

Renaming title to "feat" (was "feature")

Yeah, "feature" won't make it in the changelog.

@zaach zaach changed the title feat: Add metrics gathering. feat(metrics): Add metrics gathering. May 22, 2014
@zaach
Copy link
Contributor

@zaach zaach commented May 22, 2014

@shane-tomlinson Yeah, thinking from the perspective of the changelog it doesn't seem right to mark a commit as a fix unless it's fixing an issue on master, rather than one that is self-contained in the branch. And we probably only want to include fixes that affect end users, QA, or OPs.

(also, we have refactor and test types, but no cleanup type: https://github.com/angular/angular.js/blob/master/CONTRIBUTING.md#type)

@zaach zaach added the strings label May 22, 2014
};

var CODE_TO_MESSAGES = {
// errors returned by the auth server
999: t('Unexpected error'),
110: t('Invalid authentication token in request signature'),
110: t('Invalid token'),

This comment has been minimized.

@zaach

zaach May 22, 2014
Contributor

I've added the strings label to this PR. 😎

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 22, 2014
Author Member

This is a string we already had, so there shouldn't need to be any new ones.

This comment has been minimized.

@zaach

zaach May 22, 2014
Contributor

Ah, indeed it is.

setTimeout(function () {
self.logEvent('inactivity:flush');
self.flush();
// self.logEvent('inactivity:continuation');

This comment has been minimized.

@zaach

zaach May 22, 2014
Contributor

👀 Should this be removed or uncommented?

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 22, 2014
Author Member

removed. will do.

@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 22, 2014

@zaach - about the commit log - I wasn't sure what to do there. Should we squash into one commit?

@@ -134,6 +142,13 @@ function () {
},

/**
* Return a metrics log identifier for the error.

This comment has been minimized.

@zaach

zaach May 22, 2014
Contributor

Kind of a high level code organization question, but would it better to have this as a method in the metrics module? Seems like it would be more cohesive.

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 23, 2014
Author Member

HA! I moved it out of metrics because of this comment.

This comment has been minimized.

@shane-tomlinson

shane-tomlinson May 23, 2014
Author Member

@zaach - moved to metrics.js

@zaach zaach removed the strings label May 22, 2014
@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 22, 2014

Fixes #1119?

@kparlante
Copy link

@kparlante kparlante commented May 23, 2014

@shane-tomlinson : Looks like a great start -- looking forward to poking around the data. A few comments/questions:

  • It would probably be a good idea to have a config option to not collect data on production if we're not ready for privacy/legal/etc. reasons on the next train (like w/ persona).
  • What does the navigation timing data tell us? Aren't there several page loads going on (one for each screen)?
  • My initial thought was that we'd want to process these blobs before sending them on to heka and the rest of the data stack -- flatten into separate event entries, say. I need to think it through before having a recommendation though; logging as is sounds good for now.
    I'll spend some quality time with the content server while you're gone and have more to say when you get back!
@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 23, 2014

@kparlante - Thanks for the feedback - responses to your questions.

It would probably be a good idea to have a config option to not collect data on production if we're not ready for privacy/legal/etc. reasons on the next train (like w/ persona).

Great point, I'll add it today.

What does the navigation timing data tell us? Aren't there several page loads going on (one for each screen)?

Screen transitions are all done client side, so only one set of navigation timing data. More on navigation timing can be found on MDN. Basically, it's page load performance data, from the client's perspective.

My initial thought was that we'd want to process these blobs before sending them on to heka and the rest of the data stack -- flatten into separate event entries, say. I need to think it through before having a recommendation though; logging as is sounds good for now. I'll spend some quality time with the content server while you're gone and have more to say when you get back!

I have the /metrics endpoint to munge the data on the content server before sending it to Heka. This PR is to get the client side finished before I leave. I'll have to leave the munging up to someone else while I'm away - who wants it?

@shane-tomlinson
Copy link
Member Author

@shane-tomlinson shane-tomlinson commented May 23, 2014

Fixes #1119?

@pdehaan - this is only the client side component. There is a shell of a server side endpoint, but it needs to send data to Heka.

* Use SpeedTrap to collect client side metrics
* Add screen views and errors to an event stream.
* Metrics are sent to the `/metrics` endpoint on document unload or after 10 minutes of inactivity.
* Data that is sent to the backend is filtered - only data we expect is sent.
* Data collection sample rate is set by server configuration.

Other logged events:

* users who cancel login from the browser.
* complete_reset_password:link_damaged
* complete_reset_password:link_expired
* complete_signn_up:link_damaged
* confirm:too_many_attempts
* confirm:resend
* confirm_reset_password_resend
* login:canceled

Others changes:
* AuthErrors.toMessage now accepts `forceMessage`
* Change AuthErrors.toCode to accept an error object.
* Give `Session expired` an error code of 1002.

issue #1119
@zaach zaach merged commit 084fce0 into mozilla:master May 23, 2014
1 check passed
1 check passed
@shane-tomlinson
continuous-integration/travis-ci The Travis CI build passed
Details
@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 23, 2014

I see this adds a fair bit of noise in das Travis Logs (see https://travis-ci.org/mozilla/fxa-content-server/builds/25912446)... Acceptable? Should metrics be disabled for Travis runs?

@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 24, 2014

Do we need to specify a metrics sample rate in the fxa-content-server puppet-config's default.json.erb file for stage/prod?

I noticed we added one in server/config/local.json-dist:17-19 and server/config/awsbox.json:9-11 (although the sampleRate is set to "1", which I believe means "log everything" -- I couldn't easily see if the default was "0" which means "log nothing" once this lands on Stage).

/cc @jrgm @shane-tomlinson

@pdehaan
Copy link
Contributor

@pdehaan pdehaan commented May 24, 2014

Actually, it looks like the default is 0 when I tested locally w/:

  • $ grunt server
  • $ grunt server:dist

Removing this from my server/config/local.json stopped any metrics data from getting logged to my console:

  "metrics": {
    "sample_rate": 1
  }
@shane-tomlinson shane-tomlinson deleted the shane-tomlinson:metrics branch Jun 13, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants