# P-Values and Decisions Making

Of all the fallacies I see data science students make, few are as pervasive and as important as the mis-interpretation of p-values. And to be honest, I can't fault students. It is all too common for p-values to be mis-used in academic writing and textbooks. And in my own experience, the way most instructors communicate the problem with p-values leaves something to be desired. 

This reading is my effort to communicate the problem with p-values in a way that I have found to be effective in my own classes. If you find it resonates (and/or would resonate better with some changes!) please don't hesitate to let me know.

## A Question

Below is a regression estimating the effects of a 2007 experiment in which a random sample of ultra-poor households in West Bengal were given livestock and 30 or 40 week stipend of 90 rupees a week for 18 months. Development economists then tracked these households — along with a set of control households — for many years. The code below regresses household expenditures three years after the experiment began on whether a given household was "treated" (received the livestock and stipend).[^sample]

[^sample]: A small confession: I down-sampled the original data to get interesting p-values. The original study was very successful and the p-values on the full study were very, very small, which makes it harder to use as an example.

(Development economists measure income by what people are able to buy, grow, and consume every month, since poor households often don't earn a cash wage.) 

In [2]:
import pandas as pd
import numpy as np

pd.set_option("mode.copy_on_write", True)
hh = pd.read_csv(
    # "https://github.com/nickeubank/MIDS_Data/raw/"
    # "refs/heads/master/cash_transfers/"
    # "TUP_cash_transfers_pvalue_exercise.csv"
    "/users/nce8/github/MIDS_Data/"
    "cash_transfers/TUP_cash_transfers_pvalue_exercise.csv",
)
import statsmodels.formula.api as smf

model = smf.ols(formula="pc_exp_month_el2 ~ treatment", data=hh).fit()
model.summary()

0,1,2,3
Dep. Variable:,pc_exp_month_el2,R-squared:,0.008
Model:,OLS,Adj. R-squared:,0.006
Method:,Least Squares,F-statistic:,3.467
Date:,"Wed, 05 Mar 2025",Prob (F-statistic):,0.0633
Time:,13:56:34,Log-Likelihood:,-2070.4
No. Observations:,431,AIC:,4145.0
Df Residuals:,429,BIC:,4153.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,60.6431,2.081,29.140,0.000,56.553,64.733
treatment,5.3161,2.855,1.862,0.063,-0.295,10.928

0,1,2,3
Omnibus:,192.196,Durbin-Watson:,2.066
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1065.859
Skew:,1.866,Prob(JB):,3.56e-232
Kurtosis:,9.74,Cond. No.,2.7


So here is my question to you: 

**What is the probability that the *true* effect of this experiment was 0 and that the difference we see between the treated and control households is the result of chance variation?**

In other words, what is the probability, given this result, that the experiment didn't actually do anything?

Give it some thought, and once you have an answer, [follow this link to the next page.](./45_pvalues_and_decision_making_2)