New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New statistics, without calling out to Plot.ly. #61
Conversation
Operations should set a cron job to call `_maintenance_update_graph_data.sh` periodically throughout the day. When there's a more permament server we'll update the URL that script downloads from. Right now, my server produces an update once per day, and that's all this dataset has for resolution anyway, so it should be OK. I have more datasets in progress, particularly around continents, countries, and TLDs, but this conquers the first goal of fixing things to not be so suck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is terrific! I really like the overall idea here, both in terms of how we visually present the data, and how we get the data to the browser.
#!/bin/bash | ||
outFile="$(dirname $0)/js/graphdata.tsv" | ||
|
||
curl --silent https://ct.tacticalsecret.com/graphdata.tsv >${outFile} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you name graphdata.tsv certcounts.tsv or similar? That way if we add additional graphs using this method, we can distinguish them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@@ -0,0 +1,356 @@ | |||
datestamp certsIssued certsActive fqdnsActive regDomainsActive | |||
2015-09-12 1 NULL NULL NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a bug in your export script? It looks like the fqdnsActive
and regDomainsActive
columns are frequently null when they shouldn't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no problem, it's just expensive to calculate. The NULL values are ones where I have not rewound time to derive the data. I picked a series of dates ~45 days apart to provide historical context, and going forward the data will be day-to-day.
@@ -0,0 +1,74 @@ | |||
function parse_tsv(s, f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment header to this file describing what it's used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
dateFormat = /\d{4}-\d{2}-\d{2}/; | ||
numFormat = /\d+/; | ||
|
||
parse_tsv(this.responseText.trim(), function(row){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than trimming here, it would probably be cleaner to make parse_tsv robust to empty lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
numFormat = /\d+/; | ||
|
||
parse_tsv(this.responseText.trim(), function(row){ | ||
if (!dateFormat.exec(row[0])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dateFormat.test
would be better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched.
} | ||
} | ||
|
||
Plotly.plot( timeline, traces, layout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
legend: { | ||
xanchor:"left", | ||
yanchor:"top", | ||
x:0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent, we should have a space after all colons in this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Plotly.plot( timeline, traces, layout); | ||
} | ||
|
||
window.onload = function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking nit: This will wait for the whole document to be loaded, including images and scripts, before starting the fetch for graph data. It would be better to start the fetch immediately, and then if the event listener fires and the <div id="timeline">
doesn't yet exist, postpone until onload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. PR is welcome, I'm not a JS expert.
<a href="https://plot.ly/~letsencrypt/20/" target="_blank" title="Certificates by ~TLD" style="display: block; text-align: center;"><img src="https://plot.ly/~letsencrypt/20.png" alt="Certificates by ~TLD" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a> | ||
<script data-plotly="letsencrypt:20" src="https://plot.ly/embed.js" async></script> | ||
</div> | ||
<script src="/js/plotly-min.js"></script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both of these scripts can by async, if you make the fix I suggested in the non-blocking nit above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow-up: This is dependent on the non-blocking nit, which I'd suggest we do in a follow-on. I've opened #62.
margin: { t: 0 }, | ||
yaxis: { title: 'Active Count' }, | ||
yaxis2: { | ||
title: 'Issued Per Day', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this title accurate? I think all of the numbers from the graphdata.tsv are actually cumulative rather than per-day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is accurate for yaxis2, it's covering tIssued
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding. Does tIssued
represent a count of the certificates actually issued on a given day, or a cumulative count? I think it's valuable to have graphs of each, but I don't think we should display daily counts on the same graph as cumulative ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, just looked at your demo and it makes more sense now. I do think we should split out the two data types (with different scales) onto two graphs. I definitely see an argument for putting them together, since the incremental data helps explain the cumulative data, but I think the different scales together makes it much harder to interpret the graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I added that in b2bb96c, with the compromise that I kept a combined graph at a different page (/stats-dashboard/
) - that's for mozilla/moz-corsica#71 , for we Mozillians and whomever else wants a fullscreen single-graph dash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, jsha!
@@ -0,0 +1,356 @@ | |||
datestamp certsIssued certsActive fqdnsActive regDomainsActive | |||
2015-09-12 1 NULL NULL NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no problem, it's just expensive to calculate. The NULL values are ones where I have not rewound time to derive the data. I picked a series of dates ~45 days apart to provide historical context, and going forward the data will be day-to-day.
#!/bin/bash | ||
outFile="$(dirname $0)/js/graphdata.tsv" | ||
|
||
curl --silent https://ct.tacticalsecret.com/graphdata.tsv >${outFile} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
margin: { t: 0 }, | ||
yaxis: { title: 'Active Count' }, | ||
yaxis2: { | ||
title: 'Issued Per Day', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is accurate for yaxis2, it's covering tIssued
Plotly.plot( timeline, traces, layout); | ||
} | ||
|
||
window.onload = function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. PR is welcome, I'm not a JS expert.
<a href="https://plot.ly/~letsencrypt/20/" target="_blank" title="Certificates by ~TLD" style="display: block; text-align: center;"><img src="https://plot.ly/~letsencrypt/20.png" alt="Certificates by ~TLD" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a> | ||
<script data-plotly="letsencrypt:20" src="https://plot.ly/embed.js" async></script> | ||
</div> | ||
<script src="/js/plotly-min.js"></script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
legend: { | ||
xanchor:"left", | ||
yanchor:"top", | ||
x:0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
} | ||
} | ||
|
||
Plotly.plot( timeline, traces, layout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
numFormat = /\d+/; | ||
|
||
parse_tsv(this.responseText.trim(), function(row){ | ||
if (!dateFormat.exec(row[0])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched.
dateFormat = /\d{4}-\d{2}-\d{2}/; | ||
numFormat = /\d+/; | ||
|
||
parse_tsv(this.responseText.trim(), function(row){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,74 @@ | |||
function parse_tsv(s, f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
c9883a7
to
02ea6e7
Compare
"demo": https://ct.tacticalsecret.com/ |
There was an off-by-one error in the summation math for the numActive column. Also, I've continued calculating old numActiveFQDNs and numActiveRegistereDomains for the dataset, though it takes time for each row. Updating this here is a little silly, since the maintenance task will update it anyway, but eh, why not have the latest when this merges to master? :) And also, I noticed the off-by-one error in the Github review, so it feels right to fix it.
- Also adds "stats-dashboard/" which contains the combined graph, as that's what I'd like to show internally in Mozilla. :) - Do an atomic update in _maintenance_update_graph_data.sh to avoid the case where during the curl execution the tsv would be blank. - Moves some styling into css/main.scss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking good!
@@ -0,0 +1,129 @@ | |||
// stats.js is used by stats.md to download graph data from the webserver, | |||
// and then display it using plotly.js. Right now it displays a single graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"displays a single graph" is now out of date.
--- | ||
|
||
<div class="dashboard"> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an HTML comment describing why this is here and linking to moz-corsica, so future maintainers know there is a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, accidentally hit the "Approve" button too early. There are still a couple of minor changes (above), but also this is blocked on Ops deployment work.
Also, could you take the step of rewinding time to fill in the history gaps before we commit this? The linear jumps on the graph make it looks like something noteworthy happened, when it didn't. |
This kicks off the TSV fetch without waiting for the whole document to finish loading. It also puts the "var" keyword on some variables.
# Download to a temp file, then move over-top the original, so that the update is | ||
# atomic from the perspective of the webserver | ||
curl --silent https://ct.tacticalsecret.com/cert-timeline.tsv >"${outFile}.bak" | ||
mv "${outFile}.bak" "${outFile}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mv should only happen if the curl is successful, so curl ... && mv ...
. Alternately you can set -o errexit
at the top of the script. I don't have strong opinions about which one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure; I debated even having this script, though - I figured Ops should pull this out, customize it, and delete it from the repo to avoid the chances of someone convincing us to merge a change with badness in it.
Still, updated in 0984a30. (correction: bad internet connection)
If you want to wait to merge, that's fine. I figured even with some missing data it was better than the existing graphs, and the NULLs will fill in over time with the maintenance task picking up the increased-fidelity data. Optimistically - at the rate it's currently going - they should all be filled in sometime mid-next-week, approximately September 28th. I'm afraid I don't have the time to optimize it more before it completes. I'll check back in after a few days and see if I should commit another intermediate state. |
7bcccd4
to
839867f
Compare
- Set errexit in _maintenance_update_graph_data.sh - Reset it to not-executable, and rename to indicate it shouldn't be directly run - Remove inaccurate comment in stats.js - Note reference to Mozilla in stats-dashboard. The exact tooling is likely to change, but at the time of this commit it was relative to https://github.com/mozilla/moz-corsica/
839867f
to
0984a30
Compare
Merged your slick stats-update-update in 4903f1f -- thank you! I knew there'd be a clean JS way to do such a thing, but all I knew to do was to bring in require.js so that stats.js and plotly.js would load in the right order, and then make an event listener in stats.js.... So yeah, thanks! |
Move Google Analytics to its own file and load it per guidance: https://developers.google.com/analytics/devguides/collection/analyticsjs/#alternative_async_tracking_snippet This in combination with PR #61 permits the following, much tighter CSP policy: > Content-Security-Policy "default-src 'self'; style-src 'unsafe-inline' 'self'; script-src 'unsafe-eval' 'self' https://www.google-analytics.com; img-src 'self' data: https://www.google-analytics.com;" This can land whenever, but the CSP policy shouldn't be updated until #61 lands. Additionally, the CSP policy should be updated with care and tested as well as possible before being left alone.
Move Google Analytics to its own file and load it per guidance: https://developers.google.com/analytics/devguides/collection/analyticsjs/#alternative_async_tracking_snippet This in combination with PR #61 permits the following, much tighter CSP policy: > Content-Security-Policy "default-src 'self'; style-src 'unsafe-inline' 'self'; script-src 'unsafe-eval' 'self' https://www.google-analytics.com; img-src 'self' data: https://www.google-analytics.com;" This can land whenever, but the CSP policy shouldn't be updated until #61 lands. Additionally, the CSP policy should be updated with care and tested as well as possible before being left alone.
That makes sure the Plotly script is fully loaded.
Operations should set a cron job to call
_maintenance_update_graph_data.sh
periodically throughout the day. When there's a more permament server we'll
update the URL that script downloads from.
Right now, my server produces an update once per day, and that's all this
dataset has for resolution anyway, so it should be OK.
There may need to be some git finangling to force-update around the constantly-changing
js/certcounts.tsv
, but feel free to move it around as seems appropriate.The plot.ly JS is MIT-licensed, so we're good to go on redistributing.
I have more datasets in progress, particularly around continents, countries, and
TLDs, but this conquers the first goal of fixing things to not be so suck.