Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
harterrt committed Apr 27, 2018
1 parent 08e6e4a commit 578c120
Show file tree
Hide file tree
Showing 60 changed files with 6,161 additions and 1,412 deletions.
2 changes: 2 additions & 0 deletions archives.html
Expand Up @@ -67,6 +67,8 @@ <h2><a href="https://blog.harterrt.com">Ryan T. Harter</a></h2>
<article>
<div class="article_text">
<dl>
<dt>Tue 24 April 2018</dt>
<dd><a href="https://blog.harterrt.com/dividing_hll.html">PSA: Don't use approximate counts for trends</a></dd>
<dt>Wed 28 March 2018</dt>
<dd><a href="https://blog.harterrt.com/coding_in_textboxes.html">Don't make me code in your text box!</a></dd>
<dt>Wed 28 February 2018</dt>
Expand Down
2 changes: 1 addition & 1 deletion author/ryan-harter.html
Expand Up @@ -76,7 +76,7 @@ <h1><a href="https://blog.harterrt.com/lit-review.html">Literature Review: Writi
I&#8217;ve noticed writing documentation is a difficult thing to get right.
I haven&#8217;t seen any great example for a data product, either.
I don&#8217;t have much experience in this area,
so I decided to review </p></div></div></div>
so I decided to review ...</p></div></div></div>
</div>
</article>

Expand Down
60 changes: 28 additions & 32 deletions author/ryan-t-harter.html
Expand Up @@ -65,6 +65,23 @@ <h2><a href="https://blog.harterrt.com">Ryan T. Harter</a></h2>
</p>
</header>

<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/dividing_hll.html">PSA: Don't use approximate counts for trends</a></h1>
</div>
<div class="article_text">
<p>I got caught giving some bad advice this week,
so I decided to share here as penance.
TL;DR: approximate counts are approximate</p>
<hr />
<p>Counting stuff is hard.
We use probabilistic algorithms pretty frequently at Mozilla.
For example, when trying to get user counts,
we rely heavily on Presto's
<a href="https://prestodb.io/docs/current/functions/aggregate.html#approx_distinct">approx_distinct ...</a></p>
</div>
</article>
<hr />
<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/coding_in_textboxes.html">Don't make me code in your text box!</a></h1>
Expand All @@ -76,7 +93,7 @@ <h1><a href="https://blog.harterrt.com/coding_in_textboxes.html">Don't make me c
My workflow looks like this:
Code a little, plot the data, what do you see?
Ah, outliers.
Code a little, plot the data </p>
Code a little, plot the data ...</p>
</div>
</article>
<hr />
Expand All @@ -89,7 +106,7 @@ <h1><a href="https://blog.harterrt.com/stages_e13n.html">The 5 Stages of Experim
Our team is spending a lot of effort trying to make Firefox experimentation feel easy.
But what happens after the experiment's been run?
There's <strong>not a clear process for taking experimental data and turning it into a decision</strong>.</p>
<p>I noted the importance …</p>
<p>I ...</p>
</div>
</article>
<hr />
Expand All @@ -103,7 +120,7 @@ <h1><a href="https://blog.harterrt.com/preferred_media.html">Asking Questions</a
I have been meaning to write a similar article for a while now.
His post finally pushed me over the edge. </p>
<p>Be sure to read Will's post first.
The rest of this article is an …</p>
The rest of this article is ...</p>
</div>
</article>
<hr />
Expand All @@ -115,7 +132,7 @@ <h1><a href="https://blog.harterrt.com/sdmb.html">Managing Someday-Maybe Project
<p>I have a problem managing projects I'm interested in but don't have time for.
For example, the <a href="/slack_alerts.html">CLI for generating slack alerts</a> I posted about last year.
Not really a priority, but helpful and not that complicated.
I sat on that project for about a year before I could finally …</p>
I sat on that project for about a year before I ...</p>
</div>
</article>
<hr />
Expand All @@ -129,8 +146,7 @@ <h1><a href="https://blog.harterrt.com/disqus.html">Removing Disqus</a></h1>
I added it because it was easy to do,
but I no longer think it's worth keeping.</p>
<p>If you'd like to share your thoughts,
feel free to shoot me an email at <code>harterrt</code> on gmail.
I …</p>
feel free to shoot me an email at <code>harterrt ...</code></p>
</div>
</article>
<hr />
Expand All @@ -144,7 +160,7 @@ <h1><a href="https://blog.harterrt.com/productivity_systems.html">Productivity S
but now it's grown to include the good bits from other systems.
It's involved, but I love it.</p>
<p>I get a lot of comments,
especially on the little black book I keep …</p>
especially on the little black ...</p>
</div>
</article>
<hr />
Expand All @@ -154,12 +170,12 @@ <h1><a href="https://blog.harterrt.com/slack_alerts.html">CLI for alerts via Sla
</div>
<div class="article_text">
<p>I finally got a chance to scratch an itch today.</p>
<h2>Problem</h2>
<h2 id="problem">Problem</h2>
<p>When working with bigger ETL jobs,
I frequently run into jobs that take hours to run.
I usually either step away from the computer
or work on something less important while the job runs.
I <strong>don't have a good …</strong></p>
I <strong>don't have a ...</strong></p>
</div>
</article>
<hr />
Expand All @@ -175,7 +191,7 @@ <h1><a href="https://blog.harterrt.com/experiments_are_releases.html">Experiment
Will has a
<a href="https://wlach.github.io/blog/2017/10/mission-control/">great write up here</a>
if you want to read more.</p>
<p>The key here is that the data has to be </p>
<p>The key here is that the data has to be ...</p>
</div>
</article>
<hr />
Expand All @@ -184,35 +200,15 @@ <h1><a href="https://blog.harterrt.com/experiments_are_releases.html">Experiment
<h1><a href="https://blog.harterrt.com/good_experiment_tools.html">Desirable features of experimentation tools</a></h1>
</div>
<div class="article_text">
<h2>Introduction</h2>
<h2 id="introduction">Introduction</h2>
<p>At Mozilla,
we're quickly climbing up our
<a href="https://cdn-images-1.medium.com/max/1600/1*7IMev5xslc9FLxr9hHhpFw.png">Data Science Hierarchy of Needs</a>
<sup>1</sup>.
I think the next big step for our data team
is to <strong>make experimentation feel natural</strong>.
There are a few components to this (e.g. training or culture)
but improving the <strong>tooling is going to be …</strong></p>
</div>
</article>
<hr />
<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/dates.html">Submission Date vs Activity Date</a></h1>
</div>
<div class="article_text">
<p>My comments on
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">Bug 1422892</a>
started to get long,
so I started untangling my thoughts here.</p>
<hr>
<p>From
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">the bug</a>:</p>
<blockquote>
<p>We experimented with using <code>activity_date</code> instead of <code>submission_date</code>
when developing the <code>clients_daily</code> etl job.
We should summarize our findings and decide on
which of these measures we'd like to standardize against …</p></blockquote>
but improving the <strong>tooling is going to ...</strong></p>
</div>
</article>

Expand Down
75 changes: 45 additions & 30 deletions author/ryan-t-harter2.html
Expand Up @@ -65,6 +65,26 @@ <h2><a href="https://blog.harterrt.com">Ryan T. Harter</a></h2>
</p>
</header>

<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/dates.html">Submission Date vs Activity Date</a></h1>
</div>
<div class="article_text">
<p>My comments on
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">Bug 1422892</a>
started to get long,
so I started untangling my thoughts here.</p>
<hr />
<p>From
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1422892">the bug</a>:</p>
<blockquote>
<p>We experimented with using <code>activity_date</code> instead of <code>submission_date</code>
when developing the <code>clients_daily</code> etl job.
We should summarize our findings and decide on
which of these measures we'd like to standardize ...</p></blockquote>
</div>
</article>
<hr />
<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/okrs_and_4dx.html">OKRs and 4DX</a></h1>
Expand All @@ -75,7 +95,7 @@ <h1><a href="https://blog.harterrt.com/okrs_and_4dx.html">OKRs and 4DX</a></h1>
my team started using Objectives and Key Results (OKRs) for our planning.
It's been a learning process.
I had some prior experience with OKRs at Google,
but I've never felt like I was fully taking advantage of the …</p>
but I've never felt like I was fully taking ...</p>
</div>
</article>
<hr />
Expand All @@ -93,7 +113,7 @@ <h1><a href="https://blog.harterrt.com/new_tools.html">Evaluating New Tools</a><
or scale our knowledge
(<a href="https://medium.com/airbnb-engineering/scaling-knowledge-at-airbnb-875d73eff091">knoledge-repo</a>.
vs. <a href="https://www.gitbook.com/">gitbook</a>)</p>
<p>Most of these tools look like …</p>
<p>Most of these tools ...</p>
</div>
</article>
<hr />
Expand All @@ -108,11 +128,11 @@ <h1><a href="https://blog.harterrt.com/docs-style-guide.html">Documentation Styl
You can find the
<a href="https://github.com/mozilla/firefox-data-docs/pull/41">PR here</a>
but I figured it's worth sharing here as well.</p>
<h2>Style Guide</h2>
<h2 id="style-guide">Style Guide</h2>
<p>Articles should be written in
<a href="https://daringfireball.net/projects/markdown/syntax">Markdown</a>
(not <a href="http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/">AsciiDoc</a>).
Markdown is usually …</p>
Markdown ...</p>
</div>
</article>
<hr />
Expand All @@ -123,11 +143,11 @@ <h1><a href="https://blog.harterrt.com/probes.html">Beer and Probes</a></h1>
<div class="article_text">
<p>Quick post to clear up some terminology.
But first, an analogy to clear up my thinking:</p>
<h2>Analogy</h2>
<h2 id="analogy">Analogy</h2>
<p>Temperature control is a big part of brewing beer.
Throughout the brewing process I use a thermometer
to measure the temperature of the soon-to-be beer.
Because I take several temperature readings throughout the </p>
Because I take several temperature readings throughout the ...</p>
</div>
</article>
<hr />
Expand All @@ -140,8 +160,7 @@ <h1><a href="https://blog.harterrt.com/bad-tools.html">Bad Tools are Insidious</
In the past, I've always been a data scientist -
a consumer of these tools.
I'm learning a lot.</p>
<p>Last quarter, I learned that bad tools are often hard to spot even when they're damaging productivity.
I sum this …</p>
<p>Last quarter, I learned that bad tools are often hard to spot even when they're damaging productivity ...</p>
</div>
</article>
<hr />
Expand All @@ -153,7 +172,7 @@ <h1><a href="https://blog.harterrt.com/is-moving-to-the-bay-area-worth-it.html">
<p>I came across <a href="http://blog.triplebyte.com/does-it-make-sense-for-programmers-to-move-to-the-bay-area">this article</a> on the front page of Hacker News yesterday.
The author argues that Bay Area housing prices may be high, but the salary increase probably makes it worth while.
The author pulls together some interesting data to make their point,
but I have major <strong>issues with </strong></p>
but I have major <strong>issues with ...</strong></p>
</div>
</article>
<hr />
Expand All @@ -166,7 +185,7 @@ <h1><a href="https://blog.harterrt.com/announcing-the-cross-sectional-dataset.ht
<p>The Cross Sectional dataset makes it easy to describe our users by providing
summary statistics for each client. Like the Longitudinal table, there's one
row for each client_id in a 1% sample of clients. However, the Cross Sectional
dataset simplifies your analysis …</p>
dataset simplifies ...</p>
</div>
</article>
<hr />
Expand All @@ -178,7 +197,7 @@ <h1><a href="https://blog.harterrt.com/meta-documentation.html">Meta Documentati
<p>You'll see a lot of posts coming down the line on documentation.</p>
<p>We surveyed our customers last quarter and asked where our data pipeline was lacking.
It turns out the most painful part of using our data pipeline, is reading the documentation.
I've been interesting in learning how to write …</p>
I've been interesting in learning how ...</p>
</div>
</article>
<hr />
Expand All @@ -187,30 +206,26 @@ <h1><a href="https://blog.harterrt.com/meta-documentation.html">Meta Documentati
<h1><a href="https://blog.harterrt.com/why-markdown.html">Why Markdown?</a></h1>
</div>
<div class="article_text">
<p>[TOC]</p>
<div class="toc">
<ul>
<li><a href="#better-process">Better Process</a></li>
<li><a href="#better-tools">Better Tools</a><ul>
<li><a href="#one-less-tool">One less tool</a></li>
</ul>
</li>
<li><a href="#the-documentation-sits-next-to-the-code">The documentation sits next to the code</a><ul>
<li><a href="#syncronization">Syncronization</a></li>
<li><a href="#discoverability">Discoverability</a></li>
</ul>
</li>
</ul>
</div>
<p>Last week I finished a <a href="https://github.com/mozilla/telemetry-batch-view/pull/128">pull
request</a> that moved
some documentation from <a href="https://wiki.mozilla.org/Telemetry/LongitudinalExamples">mozilla's
wiki</a> to a <a href="https://github.com/mozilla/telemetry-batch-view/blob/master/docs/longitudinal_examples.md">github
repository</a>.
It took a couple of hours of editing and toying with pandoc to get right, but
when I was done, I realized the benefits were difficult to see. So, I decided …</p>
</div>
</article>
<hr />
<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/working-over-ssh.html">Working over SSH</a></h1>
</div>
<div class="article_text">
<p>[TOC]</p>
<h2>Introduction</h2>
<p>Working over SSH can be impossibly frustrating if you're not using the right tools.
I promised my teammates a write-up how I work over ssh.
Using these tools will make it significantly easier / more fun to work with a remote linux system.</p>
<h2>Tools</h2>
<h3><a href="https://tmux.github.io/">tmux</a></h3>
<p>For me, tmux is …</p>
It took a couple of hours of editing and toying with pandoc to get right ...</p>
</div>
</article>

Expand Down
44 changes: 38 additions & 6 deletions author/ryan-t-harter3.html
Expand Up @@ -65,20 +65,52 @@ <h2><a href="https://blog.harterrt.com">Ryan T. Harter</a></h2>
</p>
</header>

<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/working-over-ssh.html">Working over SSH</a></h1>
</div>
<div class="article_text">
<div class="toc">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#tools">Tools</a><ul>
<li><a href="#tmux">tmux</a><ul>
<li><a href="#session-persistence">Session Persistence</a></li>
<li><a href="#multiplexing">Multiplexing</a></li>
</ul>
</li>
<li><a href="#homeshick">Homeshick</a></li>
</ul>
</li>
</ul>
</div>
<h2 id="introduction">Introduction</h2>
<p>Working over SSH can be impossibly frustrating if you're not using the right tools.
I promised my teammates a write-up how I work over ssh.
Using these tools will make it significantly easier / more fun to work with a remote linux ...</p>
</div>
</article>
<hr />
<article>
<div class="article_title">
<h1><a href="https://blog.harterrt.com/strange-spark-error.html">Strange Spark Error</a></h1>
</div>
<div class="article_text">
<p>[TOC]</p>
<h1>Introduction</h1>
<div class="toc">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#the-bug">The Bug</a></li>
<li><a href="#fixes">Fixes?</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
</div>
<h1 id="introduction">Introduction</h1>
<p>I spend the better part of last week debugging a Spark error, so I figure it's worth writing up.</p>
<h1>The Bug</h1>
<h1 id="the-bug">The Bug</h1>
<p>I added the <a href="https://github.com/harterrt/spark-failure/blob/master/failure.scala">this very simple view</a> to our <a href="https://github.com/mozilla/telemetry-batch-view/tree/master/src/main/scala/com/mozilla/telemetry/views">batch views repository</a>.</p>
<div class="highlight"><pre><span></span><span class="k">package</span> <span class="nn">com.mozilla.telemetry.views</span>
<div class="codehilite"><pre><span></span><span class="k">package</span> <span class="nn">com.mozilla.telemetry.views</span>

<span class="k">import</span> <span class="nn">org.apache.spark.</span><span class="o">{</span><span class="nc">SparkConf</span><span class="o">,</span> <span class="nc">SparkContext</span><span class="o">}</span>
<span class="k">import</span> <span class="nn">org.apache.spark …</span></pre></div>
<span class="k">import</span> <span class="nn">org.apache.spark.</span><span class="o">{</span><span class="nc">SparkConf ...</span></pre></div>
</div>
</article>

Expand Down
2 changes: 1 addition & 1 deletion authors.html
Expand Up @@ -68,7 +68,7 @@ <h2><a href="https://blog.harterrt.com">Ryan T. Harter</a></h2>
<div class="article_text">
<ul>
<li><a href="https://blog.harterrt.com/author/ryan-harter.html">Ryan Harter</a> (1)</li>
<li><a href="https://blog.harterrt.com/author/ryan-t-harter.html">Ryan T. Harter</a> (21)</li>
<li><a href="https://blog.harterrt.com/author/ryan-t-harter.html">Ryan T. Harter</a> (22)</li>
</ul>
</div>
</article>
Expand Down

0 comments on commit 578c120

Please sign in to comment.