Skip to content

Commit

Permalink
Convert sites logged in report to database backend
Browse files Browse the repository at this point in the history
The report is also converted to display the mean number of sites logged in
instead of the median (closes #29).

The reason behind it (from #29):

Report #1 is the median number of sites a user logs into with Persona.

As part of migrating to CouchDB as the backend (#27), finding the median
of the data series becomes a significantly harder technical challenge.
(To do it in a map/reduce framework requires a quick-select algorithm,
which there doesn't seem to be a good way to do in CouchDB.)

Alternately, the median value for each day could be precalculated when
data arrives and then stored in the database. However, this would
require either a new database (cumbersome) or a change to the data
format and code of the current one (very undesirable).

Calculating the mean of the dataset, however, is much easier.

While the median is a more sensible value to look at (it is less
sensitive to outliers), it has been agreed, before, that this entire
report is not hugely meaningful. The median value itself doesn't really
say anything. The only way we'd use it is to watch the number and hope
it trends up. In that case, however, the mean is just about as good: we
can look at it and watch its trend.
  • Loading branch information
nmalkin committed Jul 18, 2012
1 parent 288d260 commit e740a0a
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 7 deletions.
26 changes: 26 additions & 0 deletions server/lib/db.js
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,14 @@ var VIEWS = {
},

reduce: '_count'
},

sites: {
map: function(doc) {
emit(doc.date, doc.number_sites_logged_in);
},

reduce: '_stats'
}
};

Expand Down Expand Up @@ -241,6 +249,24 @@ var VIEWS = {
});
})();

/** Initialize sites report */
(function() {
var getMapBySegment = function(segmentation) {
return function(doc) {
emit([doc.date, doc["---SEGMENTATION---"]], doc.number_sites_logged_in);
}.toString().replace('---SEGMENTATION---', segmentation);

};

var segmentations = Object.keys(data.getSegmentations());
segmentations.forEach(function(segmentation) {
VIEWS['sites_' + segmentation] = {
map: getMapBySegment(segmentation),
reduce: '_stats'
};
});
})();

/** Design document */
var DOCUMENT = {
views: VIEWS
Expand Down
58 changes: 53 additions & 5 deletions server/lib/reports.js
Original file line number Diff line number Diff line change
Expand Up @@ -82,11 +82,59 @@ function summaryReport(segmentation, start, end, aggregator, summarizer, callbac
* @see summaryReport for parameter documentation
*/
exports.sites = function(segmentation, start, end, callback) {
summaryReport(segmentation, start, end, dateAggregator,
function(dayData) { // to summarize each day's data,
// find the median number of sites users are logged in to
return aggregate.median(dayData.map(data.getNumberSitesLoggedIn));
}, callback);
var dbOptions = {
group: true
};

if(segmentation) {
if(start) {
dbOptions.startkey = [ util.getDateStringFromUnixTime(start) ];
// Note that the key is in an array, and we are omitting the second key (segment).
}
if(end) {
dbOptions.endkey = [ util.getDateStringFromUnixTime(end) ];
}

db.view('sites_' + segmentation, dbOptions, function(response) {
var result = {};
response.forEach(function(row) {
var date = row.key[0],
segment = row.key[1],
mean = row.value.sum / row.value.count;

if(! (segment in result)) {
result[segment] = [];
}

result[segment].push({
category: date,
value: mean
});
});

callback(result);
});
} else {
if(start) {
dbOptions.startkey = util.getDateStringFromUnixTime(start);
}
if(end) {
dbOptions.endkey = util.getDateStringFromUnixTime(end);
}

db.view('sites', dbOptions, function(response) {
var dates = [];
response.forEach(function(row) {
var stats = row.value;
dates.push({
category: row.key, // date
value: stats.sum / stats.count // mean
});
});

callback({ Total: dates });
});
}
};


Expand Down
4 changes: 2 additions & 2 deletions static/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -150,12 +150,12 @@ <h2>Can new users log in?</h2>
</div>

<div id="sites" class="tab-pane form-inline">
<h2>Median Number of Sites Logged In</h2>
<h2>Mean Number of Sites Logged In</h2>
<p>
If the user is authenticated to the Persona servers at any point during the interaction,
we report the number of distinct sites that the user has logged into recently using Persona.
</p>
<p>This report presents the median number of sites logged in, for each day.</p>
<p>This report presents the mean number of sites logged in, for each day.</p>

<div class="chart"></div>
<div class="timeline"></div>
Expand Down

0 comments on commit e740a0a

Please sign in to comment.