Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ethanweed committed May 17, 2024
1 parent 7900c2c commit 4c6024d
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion 05.05-anova2.html
Original file line number Diff line number Diff line change
Expand Up @@ -2456,7 +2456,7 @@ <h3><span class="section-number">17.5.2. </span>ANOVA with binary factors as a r
</div>
<p>There’s a few interesting things to note here. First, notice that the intercept term is 43.5, which is close to the “group” mean of 42.5 observed for those two students who didn’t read the text or attend class. Second, notice that we have the regression coefficient of <span class="math notranslate nohighlight">\(b_1 = 18.0\)</span> for the attendance variable, suggesting that those students that attended class scored 18% higher than those who didn’t. So our expectation would be that those students who turned up to class but didn’t read the textbook would obtain a grade of <span class="math notranslate nohighlight">\(b_0 + b_1\)</span>, which is equal to <span class="math notranslate nohighlight">\(43.5 + 18.0 = 61.5\)</span>. Again, this is similar to the observed group mean of 62.5. You can verify for yourself that the same thing happens when we look at the students that read the textbook.</p>
<p>Actually, we can push a little further in establishing the equivalence of our ANOVA and our regression. Look at the <span class="math notranslate nohighlight">\(p\)</span>-values associated with the <code class="docutils literal notranslate"><span class="pre">attend</span></code> variable and the <code class="docutils literal notranslate"><span class="pre">reading</span></code> variable in the regression output. They’re identical to the ones we encountered earlier when running the ANOVA. This might seem a little surprising, since the test used when running our regression model calculates a <span class="math notranslate nohighlight">\(t\)</span>-statistic and the ANOVA calculates an <span class="math notranslate nohighlight">\(F\)</span>-statistic. However, if you can remember all the way back to the chapter on <a class="reference internal" href="04.02-probability.html#probability"><span class="std std-ref">probability</span></a>, I mentioned that there’s a relationship between the <span class="math notranslate nohighlight">\(t\)</span>-distribution and the <span class="math notranslate nohighlight">\(F\)</span>-distribution: if you have some quantity that is distributed according to a <span class="math notranslate nohighlight">\(t\)</span>-distribution with <span class="math notranslate nohighlight">\(k\)</span> degrees of freedom and you square it, then this new squared quantity follows an <span class="math notranslate nohighlight">\(F\)</span>-distribution whose degrees of freedom are 1 and <span class="math notranslate nohighlight">\(k\)</span>. We can check this with respect to the <span class="math notranslate nohighlight">\(t\)</span> statistics in our regression model. For the <code class="docutils literal notranslate"><span class="pre">attend</span></code> variable we get a <span class="math notranslate nohighlight">\(t\)</span> value of 4.648. If we square this number we end up with 21.604, which is identical to the corresponding <span class="math notranslate nohighlight">\(F\)</span> statistic in our ANOVA. <a class="reference external" href="https://www.youtube.com/watch?v=NsUFBm1uENs">I love it when a plan comes together.</a></p>
<p>I mentioned there was a second reason I didn’t use <code class="docutils literal notranslate"><span class="pre">pingouin</span></code> for this example. This is because as far as I can tell, when performing an ANOVA <code class="docutils literal notranslate"><span class="pre">pingouin</span></code> always calculated not only the main effects, but also the interaction, thus giving slightly different results. In order to keep things simple (and maintain parity with <a class="reference external" href="http://learningstatisticswithr.com/">LSR</a>, I decided to go with <code class="docutils literal notranslate"><span class="pre">statsmodels</span></code> and not specify any interactions. Just to be sure though, let’s run the ANOVA with <code class="docutils literal notranslate"><span class="pre">pingouin</span></code>, and then run the regression in <code class="docutils literal notranslate"><span class="pre">statsmodels</span></code> with a little ANOVA dressing on top, and confirm that we get the same result:</p>
<p>I mentioned there was a second reason I didn’t use <code class="docutils literal notranslate"><span class="pre">pingouin</span></code> for this example. This is because as far as I can tell, when performing an ANOVA <code class="docutils literal notranslate"><span class="pre">pingouin</span></code> always calculates not only the main effects, but also the interaction, thus giving slightly different results. In order to keep things simple (and maintain parity with <a class="reference external" href="http://learningstatisticswithr.com/">LSR</a>, I decided to go with <code class="docutils literal notranslate"><span class="pre">statsmodels</span></code> and not specify any interactions. Just to be sure though, let’s run the ANOVA with <code class="docutils literal notranslate"><span class="pre">pingouin</span></code>, and then run the regression in <code class="docutils literal notranslate"><span class="pre">statsmodels</span></code> with a little ANOVA dressing on top, and confirm that we get the same result:</p>
<div class="cell docutils container">
<div class="cell_input docutils container">
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">pg</span><span class="o">.</span><span class="n">anova</span><span class="p">(</span><span class="n">dv</span><span class="o">=</span><span class="s1">&#39;grade&#39;</span><span class="p">,</span> <span class="n">between</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;attend&#39;</span><span class="p">,</span> <span class="s1">&#39;reading&#39;</span><span class="p">],</span> <span class="n">data</span><span class="o">=</span><span class="n">rtfm1</span><span class="p">)</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
Expand Down
10 changes: 5 additions & 5 deletions _sources/05.05-anova2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2997,12 +2997,12 @@
"id": "76fcaa5a-c7b3-497d-9fe3-5a01701e199f",
"metadata": {},
"source": [
"I mentioned there was a second reason I didn't use `pingouin` for this example. This is because as far as I can tell, when performing an ANOVA `pingouin` always calculated not only the main effects, but also the interaction, thus giving slightly different results. In order to keep things simple (and maintain parity with [LSR](http://learningstatisticswithr.com/), I decided to go with `statsmodels` and not specify any interactions. Just to be sure though, let's run the ANOVA with `pingouin`, and then run the regression in `statsmodels` with a little ANOVA dressing on top, and confirm that we get the same result:"
"I mentioned there was a second reason I didn't use `pingouin` for this example. This is because as far as I can tell, when performing an ANOVA `pingouin` always calculates not only the main effects, but also the interaction, thus giving slightly different results. In order to keep things simple (and maintain parity with [LSR](http://learningstatisticswithr.com/), I decided to go with `statsmodels` and not specify any interactions. Just to be sure though, let's run the ANOVA with `pingouin`, and then run the regression in `statsmodels` with a little ANOVA dressing on top, and confirm that we get the same result:"
]
},
{
"cell_type": "code",
"execution_count": 99,
"execution_count": 102,
"id": "ec480505-abcd-43fa-abdd-a5622915820a",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -3089,7 +3089,7 @@
"3 Residual 142.0 4 35.5 NaN NaN NaN"
]
},
"execution_count": 99,
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -3100,7 +3100,7 @@
},
{
"cell_type": "code",
"execution_count": 98,
"execution_count": 103,
"id": "b8cbf2e1-be49-4206-a2ea-35dc01501053",
"metadata": {
"scrolled": true
Expand Down Expand Up @@ -3174,7 +3174,7 @@
"Residual 142.0 4.0 NaN NaN"
]
},
"execution_count": 98,
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
Expand Down

0 comments on commit 4c6024d

Please sign in to comment.