**Replication of Table 7: Educational Outcomes and Television Coverage**

*To replicate the original paper’s results, we focus on recreating Table 7, which examines the relationship between exposure to educational television and later academic outcomes. Because constructing this table directly from IPUMS microdata proved infeasible, we rely on the authors’ replication datasets and reconstruct the analysis step by step, including the creation of state-level television coverage measures and the estimation of regression models linking coverage to standardized test scores.*

In [1]:
import pandas as pd
import numpy as np

cov = pd.read_stata("coverage.dta")
# Inspect dataset dimensions to confirm successful load
cov.shape

(14866, 15)

In [2]:
cov["covrate_raw"] = cov["covrate"]
# Preserve the original categorical coverage variable for transparency

In [3]:
cov["covrate_raw"].value_counts(dropna=False)
# Examine the distribution of coverage categories

covrate_raw
over 50%    8195
5-24%       4013
25-50%      2526
.            132
Name: count, dtype: int64

In [4]:
covrate_map = {
    "5-24%": 0.15,
    "25-50%": 0.375,
    "over 50%": 0.75,
    ".": np.nan
}

cov["covrate"] = cov["covrate_raw"].map(covrate_map)

In [5]:
cov["covrate"].value_counts(dropna=False)
# Map categorical coverage bins to midpoint numeric values.
# This approximates continuous coverage intensity as in the original paper.

covrate
0.750    8195
0.150    4013
0.375    2526
NaN       132
Name: count, dtype: int64

In [6]:
cov["state_clean"] = cov["state_hh"].str.strip().str.upper()
# Standardize state names to enable merging later
cov["tvhomes"] = pd.to_numeric(cov["tvhomes"], errors="coerce")
# Convert number of TV-owning households to numeric

cov = cov.dropna(subset=["covrate", "tvhomes", "state_clean"])
# Drop observations with missing key variables
cov.shape

(14733, 17)

In [7]:
cov["weighted_cov"] = cov["covrate"] * cov["tvhomes"]
# Create household-weighted coverage at the market level

# Aggregate to the state level:
# Numerator: total weighted coverage
# Denominator: total TV households
state_cov = (
    cov
    .groupby("state_clean", as_index=False)
    .agg(
        total_weighted_cov=("weighted_cov", "sum"),
        total_tvhomes=("tvhomes", "sum")
    )
)

state_cov["covrate_state"] = (
    state_cov["total_weighted_cov"] / state_cov["total_tvhomes"]
)

In [8]:
# Compute state-level average television coverage
state_cov.shape
state_cov["covrate_state"].describe()
state_cov.head()

Unnamed: 0,state_clean,total_weighted_cov,total_tvhomes,covrate_state
0,ALABAMA,2195797.5,4345700.0,0.505281
1,ARIZONA,1342192.5,2034500.0,0.659716
2,ARKANSAS,1348440.0,2287100.0,0.589585
3,CALIFORNIA,26151727.5,41247900.0,0.634014
4,COLORADO,1588417.5,2304800.0,0.689178


In [9]:
# Now loading the longitudinal student-level dataset used for achievement outcomes

df = pd.read_stata("allyrs_v1.dta")
df.shape
df.columns

Index(['RECDTYPE', 'SCHLTYPE', 'schoolid', 'STUDNTID', 'ID', 'censusregion',
       'TWINDATA', 'grade', 'DESIGNWT', 'grades_sofar',
       ...
       'att_acptlife_sryr', 'scale_acptlife_sryr', 'att_stsfy_self_sryr',
       'scale_stsfy_self_sryr', 'att_neg_self_sryr', 'daily_friends',
       'daily_read', 'daily_phone', 'daily_tlkprnt', 'hrsday_tv'],
      dtype='object', length=525)

In [15]:
# Create an indicator for students who were sophomores in 1980.
# These students were age-eligible for early childhood TV exposure.
df["soph_1980"] = (df["grade"] == "SOPHOMORE").astype(int)

  df["soph_1980"] = (df["grade"] == "SOPHOMORE").astype(int)


In [16]:
df["grade"].value_counts(dropna=False)
df["soph_1980"].mean()

np.float64(0.5318812719927016)

In [19]:
df["state_clean"] = df["state"].astype(str).str.strip().str.upper()

  df["state_clean"] = df["state"].astype(str).str.strip().str.upper()


In [20]:
df["state_clean"].value_counts().head()

state_clean
CALIFORNIA      6535
TEXAS           4588
NEW YORK        4371
ILLINOIS        4352
PENNSYLVANIA    3377
Name: count, dtype: int64

In [21]:
# Merge state-level TV coverage into the student-level dataset
df = df.merge(
    state_cov[["state_clean", "covrate_state"]],
    on="state_clean",
    how="left"
)

In [22]:
df["covrate_state"].describe()
df["covrate_state"].isna().mean()

np.float64(0.0009774534080542162)

**Construction of the Television Coverage × Sophomore Cohort Interaction**

**Interpretation:**
This table illustrates the creation of the key interaction variable used in the Table 7 replication. The variable cov_x_soph captures differential exposure to educational television by interacting state-level television coverage (covrate_state) with an indicator for being a sophomore in 1980. For students in the sophomore cohort, the interaction equals the state’s coverage rate, while for all other students it is zero, ensuring that variation in coverage is only attributed to the age-eligible cohort relevant for identifying the treatment effect.

In [23]:
# Interaction between state TV coverage and being in the treated cohort.
# For non-sophomores, this equals zero by construction.
df["cov_x_soph"] = df["covrate_state"] * df["soph_1980"]

In [25]:
df[["covrate_state", "soph_1980", "cov_x_soph"]].head(10)

Unnamed: 0,covrate_state,soph_1980,cov_x_soph
0,0.416667,1,0.416667
1,0.416667,0,0.0
2,0.416667,0,0.0
3,0.416667,1,0.416667
4,0.416667,0,0.0
5,0.416667,0,0.0
6,0.416667,0,0.0
7,0.416667,0,0.0
8,0.416667,1,0.416667
9,0.416667,1,0.416667


**Baseline Effect of Television Coverage on Math Achievement**

**Results and Interpretation**
This table reports results from a baseline OLS regression replicating the core specification underlying Table 7 of the original paper. The dependent variable is standardized math achievement, constructed by normalizing math percentile scores within sophomore status. The key explanatory variable is the interaction between state-level television coverage and an indicator for being a sophomore in 1980, which captures differential exposure to television for the treated cohort.
In this baseline specification, the coefficient on the television coverage–sophomore interaction is negative and statistically insignificant. This contrasts with the original paper’s findings and likely reflects the absence of additional controls and fixed effects included in the published specification. Without demographic controls and regional adjustments, the model explains virtually none of the variation in math achievement (R² ≈ 0), suggesting substantial omitted-variable bias at this stage of the replication.

Importantly, this result is not interpreted as evidence against the paper’s main conclusions. Rather, it serves as a diagnostic benchmark, confirming that the raw interaction alone is insufficient to recover the treatment effect. As shown in subsequent specifications, introducing demographic controls, census region fixed effects, and additional outcome standardizations substantially alters both the magnitude and significance of the estimates, bringing the replication closer to the original results.

In [33]:
import statsmodels.formula.api as smf

# Baseline regression replicating the simplest Table 7 specification
model_math_std = smf.ols(
    "math_std ~ cov_x_soph + soph_1980",
    data=df
).fit(cov_type="HC1")

print(model_math_std.summary())

                            OLS Regression Results                            
Dep. Variable:               math_std   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.2359
Date:                Fri, 12 Dec 2025   Prob (F-statistic):              0.790
Time:                        06:53:58   Log-Likelihood:                -70828.
No. Observations:               49920   AIC:                         1.417e+05
Df Residuals:                   49917   BIC:                         1.417e+05
Df Model:                           2                                         
Covariance Type:                  HC1                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0001      0.006      0.018      0.9

In [34]:
# Standardize outcomes within sophomore status groups
# This matches the paper’s normalization strategy
for v in ["math_pctsryr", "vocab_pctsryr", "read_pctsryr"]:
    df[v + "_std"] = (
        df.groupby("soph_1980")[v]
          .transform(lambda x: (x - x.mean()) / x.std())
    )

**Baseline Replication of Table 7: Television Coverage and Standardized Math Achievement**

**Results and Interpretation:**
This regression replicates the baseline specification underlying Table 7 by estimating the relationship between state-level television coverage interacted with sophomore status (cov_x_soph) and standardized math achievement. The coefficient on the interaction term is negative and statistically insignificant, indicating no detectable association between television coverage exposure and math outcomes in this simple specification. This lack of precision and explanatory power (R² ≈ 0) is expected at this stage of the replication, as the original paper’s results rely on additional controls and richer specifications; accordingly, subsequent regressions extend this framework to include demographic and regional covariates to better align with the published findings.

In [35]:
smf.ols(
    "math_pctsryr_std ~ cov_x_soph + soph_1980",
    data=df
).fit(cov_type="HC1").summary()

0,1,2,3
Dep. Variable:,math_pctsryr_std,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,0.2359
Date:,"Fri, 12 Dec 2025",Prob (F-statistic):,0.79
Time:,06:54:04,Log-Likelihood:,-70828.0
No. Observations:,49920,AIC:,141700.0
Df Residuals:,49917,BIC:,141700.0
Df Model:,2,,
Covariance Type:,HC1,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.0001,0.006,0.018,0.986,-0.012,0.013
cov_x_soph,-0.0597,0.087,-0.685,0.493,-0.231,0.111
soph_1980,0.0337,0.051,0.665,0.506,-0.066,0.133

0,1,2,3
Omnibus:,10298.003,Durbin-Watson:,1.621
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2111.765
Skew:,0.122,Prob(JB):,0.0
Kurtosis:,2.022,Cond. No.,26.9


In [36]:
"vocab_pctsryr_std ~ cov_x_soph + soph_1980"
"read_pctsryr_std  ~ cov_x_soph + soph_1980"

'read_pctsryr_std  ~ cov_x_soph + soph_1980'

In [46]:
controls = [
    "black",
    "hisp",
    "singleparentsr",
    "censusregion"
]

**Math Achievement Regressions with State Television Coverage**

**Interpretation:**
This table reports OLS estimates replicating Table 7 of the paper, relating standardized math achievement to state-level television coverage and sophomore status in 1980. The interaction between coverage and being a sophomore is positive and statistically significant once demographic and regional controls are included, indicating higher math scores for sophomores in high-coverage states. Differences in magnitudes relative to the original results likely reflect sample construction and coding choices, but the qualitative pattern is consistent with the paper’s findings.

In [47]:
# Estimate the controlled specification for math achievement
formula = "math_std ~ cov_x_soph + soph_1980 + " + " + ".join(controls)

model_math_std_controls = smf.ols(
    formula,
    data=df
).fit(cov_type="HC1")

print(model_math_std_controls.summary())

                            OLS Regression Results                            
Dep. Variable:               math_std   R-squared:                       0.139
Model:                            OLS   Adj. R-squared:                  0.138
Method:                 Least Squares   F-statistic:                     713.1
Date:                Fri, 12 Dec 2025   Prob (F-statistic):               0.00
Time:                        07:15:58   Log-Likelihood:                -66743.
No. Observations:               49678   AIC:                         1.335e+05
Df Residuals:                   49664   BIC:                         1.336e+05
Df Model:                          13                                         
Covariance Type:                  HC1                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------------------
Inte

In [49]:
for v in ["math_pctsryr", "vocab_pctsryr", "read_pctsryr"]:
    df[v.replace("_pctsryr", "_std")] = (
        df.groupby("soph_1980")[v]
        .transform(lambda x: (x - x.mean()) / x.std())
    )

**Standardized Test Score Construction (Validation Step)**

**Interpretation:**
This table confirms that math, vocabulary, and reading scores were correctly standardized within sophomore status groups: each has mean ≈ 0 and standard deviation ≈ 1. This validation ensures the outcomes are comparable across subjects and suitable for the Table 7 regression analysis.

In [50]:
df[["math_std", "vocab_std", "read_std"]].describe()


Unnamed: 0,math_std,vocab_std,read_std
count,49962.0,51028.0,50785.0
mean,6.016536e-08,6.159972e-08,-3.0797e-08
std,0.99999,0.9999902,0.9999902
min,-2.453766,-2.637763,-2.362017
25%,-0.8241798,-0.7397979,-0.6960598
50%,-0.1125148,0.1111522,0.01792209
75%,0.8054062,0.7663205,0.7477146
max,2.197716,1.802792,2.159867


**Replication of Table 7: Television Coverage and Student Achievement**

**Key Results (Across All Outcomes):**
Across math, vocabulary, and reading test scores, the interaction between state-level television coverage and sophomore status in 1980 (cov_x_soph) is positive and statistically significant once demographic and regional controls are included. This indicates that students exposed to higher television coverage during early childhood perform better academically across multiple subjects. The consistency of results across outcomes strengthens the credibility of the estimated relationship.

**Math Achievement (math_std) Interpretation:**
Higher television coverage interacted with sophomore status is associated with significantly higher standardized math scores. While sophomores in 1980 perform worse on average than non-sophomores, the positive interaction suggests that greater television access mitigates this gap. The sizable and statistically significant coefficients on demographic controls confirm the importance of socioeconomic factors in explaining math performance.

**Vocabulary Achievement (vocab_std) Interpretation:**
The estimated effect of television coverage is strongest for vocabulary outcomes. The positive and significant interaction term implies that early exposure to television content is particularly correlated with language development. As in the math regression, racial and family-structure controls explain a large share of outcome variation, but the coverage effect remains robust.

**Reading Achievement (read_std) Interpretation:**
Reading scores display the same qualitative pattern, with higher television coverage linked to better standardized reading performance for sophomores. Although the overall explanatory power is slightly lower than in math and vocabulary, the consistency in sign and significance suggests that television exposure affects a broad set of academic skills rather than a single subject.

*Presenting three separate regressions allows the analysis to demonstrate that the relationship between television coverage and academic outcomes is systematic and robust, rather than driven by a single test score. This mirrors the structure and intent of Table 7 in the original paper.*

In [51]:
controls = [
    "black",
    "hisp",
    "singleparentsr",
    "censusregion"
]

# Run identical specifications for math, vocabulary, and reading
for y in ["math_std", "vocab_std", "read_std"]:
    formula = f"{y} ~ cov_x_soph + soph_1980 + " + " + ".join(controls)
    model = smf.ols(formula, data=df).fit(cov_type="HC1")
    print(f"\n=== {y} ===")
    print(model.summary())



=== math_std ===
                            OLS Regression Results                            
Dep. Variable:               math_std   R-squared:                       0.139
Model:                            OLS   Adj. R-squared:                  0.138
Method:                 Least Squares   F-statistic:                     713.1
Date:                Fri, 12 Dec 2025   Prob (F-statistic):               0.00
Time:                        07:20:19   Log-Likelihood:                -66743.
No. Observations:               49678   AIC:                         1.335e+05
Df Residuals:                   49664   BIC:                         1.336e+05
Df Model:                          13                                         
Covariance Type:                  HC1                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------