Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New statistics, without calling out to Plot.ly. #61

Merged
merged 16 commits into from Oct 21, 2016
Merged

Conversation

jcjones
Copy link
Contributor

@jcjones jcjones commented Sep 17, 2016

Operations should set a cron job to call _maintenance_update_graph_data.sh
periodically throughout the day. When there's a more permament server we'll
update the URL that script downloads from.

Right now, my server produces an update once per day, and that's all this
dataset has for resolution anyway, so it should be OK.

There may need to be some git finangling to force-update around the constantly-changing js/certcounts.tsv, but feel free to move it around as seems appropriate.

The plot.ly JS is MIT-licensed, so we're good to go on redistributing.

I have more datasets in progress, particularly around continents, countries, and
TLDs, but this conquers the first goal of fixing things to not be so suck.

Operations should set a cron job to call `_maintenance_update_graph_data.sh`
periodically throughout the day. When there's a more permament server we'll
update the URL that script downloads from.

Right now, my server produces an update once per day, and that's all this
dataset has for resolution anyway, so it should be OK.

I have more datasets in progress, particularly around continents, countries, and
TLDs, but this conquers the first goal of fixing things to not be so suck.
Copy link
Contributor

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is terrific! I really like the overall idea here, both in terms of how we visually present the data, and how we get the data to the browser.

#!/bin/bash
outFile="$(dirname $0)/js/graphdata.tsv"

curl --silent https://ct.tacticalsecret.com/graphdata.tsv >${outFile}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name graphdata.tsv certcounts.tsv or similar? That way if we add additional graphs using this method, we can distinguish them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@@ -0,0 +1,356 @@
datestamp certsIssued certsActive fqdnsActive regDomainsActive
2015-09-12 1 NULL NULL NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a bug in your export script? It looks like the fqdnsActive and regDomainsActive columns are frequently null when they shouldn't be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no problem, it's just expensive to calculate. The NULL values are ones where I have not rewound time to derive the data. I picked a series of dates ~45 days apart to provide historical context, and going forward the data will be day-to-day.

@@ -0,0 +1,74 @@
function parse_tsv(s, f) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment header to this file describing what it's used for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

dateFormat = /\d{4}-\d{2}-\d{2}/;
numFormat = /\d+/;

parse_tsv(this.responseText.trim(), function(row){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than trimming here, it would probably be cleaner to make parse_tsv robust to empty lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

numFormat = /\d+/;

parse_tsv(this.responseText.trim(), function(row){
if (!dateFormat.exec(row[0])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dateFormat.test would be better here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

}
}

Plotly.plot( timeline, traces, layout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

legend: {
xanchor:"left",
yanchor:"top",
x:0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, we should have a space after all colons in this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Plotly.plot( timeline, traces, layout);
}

window.onload = function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nit: This will wait for the whole document to be loaded, including images and scripts, before starting the fetch for graph data. It would be better to start the fetch immediately, and then if the event listener fires and the <div id="timeline"> doesn't yet exist, postpone until onload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. PR is welcome, I'm not a JS expert.

<a href="https://plot.ly/~letsencrypt/20/" target="_blank" title="Certificates by ~TLD" style="display: block; text-align: center;"><img src="https://plot.ly/~letsencrypt/20.png" alt="Certificates by ~TLD" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
<script data-plotly="letsencrypt:20" src="https://plot.ly/embed.js" async></script>
</div>
<script src="/js/plotly-min.js"></script>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these scripts can by async, if you make the fix I suggested in the non-blocking nit above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow-up: This is dependent on the non-blocking nit, which I'd suggest we do in a follow-on. I've opened #62.

margin: { t: 0 },
yaxis: { title: 'Active Count' },
yaxis2: {
title: 'Issued Per Day',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this title accurate? I think all of the numbers from the graphdata.tsv are actually cumulative rather than per-day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accurate for yaxis2, it's covering tIssued

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding. Does tIssued represent a count of the certificates actually issued on a given day, or a cumulative count? I think it's valuable to have graphs of each, but I don't think we should display daily counts on the same graph as cumulative ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, just looked at your demo and it makes more sense now. I do think we should split out the two data types (with different scales) onto two graphs. I definitely see an argument for putting them together, since the incremental data helps explain the cumulative data, but I think the different scales together makes it much harder to interpret the graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added that in b2bb96c, with the compromise that I kept a combined graph at a different page (/stats-dashboard/) - that's for mozilla/moz-corsica#71 , for we Mozillians and whomever else wants a fullscreen single-graph dash.

Copy link
Contributor Author

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, jsha!

@@ -0,0 +1,356 @@
datestamp certsIssued certsActive fqdnsActive regDomainsActive
2015-09-12 1 NULL NULL NULL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no problem, it's just expensive to calculate. The NULL values are ones where I have not rewound time to derive the data. I picked a series of dates ~45 days apart to provide historical context, and going forward the data will be day-to-day.

#!/bin/bash
outFile="$(dirname $0)/js/graphdata.tsv"

curl --silent https://ct.tacticalsecret.com/graphdata.tsv >${outFile}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

margin: { t: 0 },
yaxis: { title: 'Active Count' },
yaxis2: {
title: 'Issued Per Day',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accurate for yaxis2, it's covering tIssued

Plotly.plot( timeline, traces, layout);
}

window.onload = function () {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. PR is welcome, I'm not a JS expert.

<a href="https://plot.ly/~letsencrypt/20/" target="_blank" title="Certificates by ~TLD" style="display: block; text-align: center;"><img src="https://plot.ly/~letsencrypt/20.png" alt="Certificates by ~TLD" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
<script data-plotly="letsencrypt:20" src="https://plot.ly/embed.js" async></script>
</div>
<script src="/js/plotly-min.js"></script>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

legend: {
xanchor:"left",
yanchor:"top",
x:0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

}
}

Plotly.plot( timeline, traces, layout);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

numFormat = /\d+/;

parse_tsv(this.responseText.trim(), function(row){
if (!dateFormat.exec(row[0])) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

dateFormat = /\d{4}-\d{2}-\d{2}/;
numFormat = /\d+/;

parse_tsv(this.responseText.trim(), function(row){
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,74 @@
function parse_tsv(s, f) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jcjones
Copy link
Contributor Author

jcjones commented Sep 19, 2016

"demo": https://ct.tacticalsecret.com/

There was an off-by-one error in the summation math for the numActive column.
Also, I've continued calculating old numActiveFQDNs and numActiveRegistereDomains
for the dataset, though it takes time for each row.

Updating this here is a little silly, since the maintenance task will update it
anyway, but eh, why not have the latest when this merges to master? :) And also,
I noticed the off-by-one error in the Github review, so it feels right to fix it.
- Also adds "stats-dashboard/" which contains the combined graph, as that's what
  I'd like to show internally in Mozilla. :)
- Do an atomic update in _maintenance_update_graph_data.sh to avoid the case
  where during the curl execution the tsv would be blank.
- Moves some styling into css/main.scss.
Copy link
Contributor

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good!

@@ -0,0 +1,129 @@
// stats.js is used by stats.md to download graph data from the webserver,
// and then display it using plotly.js. Right now it displays a single graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"displays a single graph" is now out of date.

---

<div class="dashboard">

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an HTML comment describing why this is here and linking to moz-corsica, so future maintainers know there is a dependency.

Copy link
Contributor

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, accidentally hit the "Approve" button too early. There are still a couple of minor changes (above), but also this is blocked on Ops deployment work.

@jsha
Copy link
Contributor

jsha commented Sep 20, 2016

Also, could you take the step of rewinding time to fill in the history gaps before we commit this? The linear jumps on the graph make it looks like something noteworthy happened, when it didn't.

This kicks off the TSV fetch without waiting for the whole document to finish
loading. It also puts the "var" keyword on some variables.
# Download to a temp file, then move over-top the original, so that the update is
# atomic from the perspective of the webserver
curl --silent https://ct.tacticalsecret.com/cert-timeline.tsv >"${outFile}.bak"
mv "${outFile}.bak" "${outFile}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mv should only happen if the curl is successful, so curl ... && mv .... Alternately you can set -o errexit at the top of the script. I don't have strong opinions about which one.

Copy link
Contributor Author

@jcjones jcjones Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; I debated even having this script, though - I figured Ops should pull this out, customize it, and delete it from the repo to avoid the chances of someone convincing us to merge a change with badness in it.

Still, updated in 0984a30. (correction: bad internet connection)

@jcjones
Copy link
Contributor Author

jcjones commented Sep 20, 2016

If you want to wait to merge, that's fine. I figured even with some missing data it was better than the existing graphs, and the NULLs will fill in over time with the maintenance task picking up the increased-fidelity data.

Optimistically - at the rate it's currently going - they should all be filled in sometime mid-next-week, approximately September 28th. I'm afraid I don't have the time to optimize it more before it completes. I'll check back in after a few days and see if I should commit another intermediate state.

  - Set errexit in _maintenance_update_graph_data.sh
  - Reset it to not-executable, and rename to indicate it shouldn't be directly
    run
  - Remove inaccurate comment in stats.js
  - Note reference to Mozilla in stats-dashboard. The exact tooling is likely
    to change, but at the time of this commit it was relative to
    https://github.com/mozilla/moz-corsica/
@jcjones
Copy link
Contributor Author

jcjones commented Sep 20, 2016

Merged your slick stats-update-update in 4903f1f -- thank you! I knew there'd be a clean JS way to do such a thing, but all I knew to do was to bring in require.js so that stats.js and plotly.js would load in the right order, and then make an event listener in stats.js.... So yeah, thanks!

jcjones added a commit that referenced this pull request Sep 23, 2016
Move Google Analytics to its own file and load it per guidance:

https://developers.google.com/analytics/devguides/collection/analyticsjs/#alternative_async_tracking_snippet

This in combination with PR #61 permits the following, much tighter CSP
policy:

> Content-Security-Policy "default-src 'self'; style-src 'unsafe-inline' 'self'; script-src 'unsafe-eval' 'self' https://www.google-analytics.com; img-src 'self' data: https://www.google-analytics.com;"

This can land whenever, but the CSP policy shouldn't be updated until #61 lands.
Additionally, the CSP policy should be updated with care and tested as well as
possible before being left alone.
@jcjones jcjones mentioned this pull request Sep 23, 2016
jsha pushed a commit that referenced this pull request Sep 23, 2016
Move Google Analytics to its own file and load it per guidance:

https://developers.google.com/analytics/devguides/collection/analyticsjs/#alternative_async_tracking_snippet

This in combination with PR #61 permits the following, much tighter CSP
policy:

> Content-Security-Policy "default-src 'self'; style-src 'unsafe-inline' 'self'; script-src 'unsafe-eval' 'self' https://www.google-analytics.com; img-src 'self' data: https://www.google-analytics.com;"

This can land whenever, but the CSP policy shouldn't be updated until #61 lands.
Additionally, the CSP policy should be updated with care and tested as well as
possible before being left alone.
@danjeffery danjeffery merged commit ccf2532 into master Oct 21, 2016
@jsha jsha deleted the stats_update branch July 31, 2017 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants