---
title: "Documenting Inequality"
bibliography: ../reading_list.bib
format:
  revealjs:
    theme: solarized
    transition: slide
    chalkboard:
        theme: whiteboard
        chalk-effect: 0.0
        chalk-width: 6
jupyter: julia-1.11
execute:
  cache: true
  freeze: auto
---


## Overview

:::{.incremental}
- In this section of the class we'll look at what the data tells us about labor market inequality in the United States
- Will look at trends in inequality *over time* and *over the life-cycle*
- Will focus on **labor market inequality**
- Some review of basic statistics along the way
- Later in the section will dig deeper into *policy implications*
:::

## Let's start with some data

:::{.incremental}
- Data: the *Current Population Survey* (CPS) 
- Will use the Annual Social and Economic Supplement (ASEC)
- Years: 1990 - 2024, Ages: 25-64
- Cleaned and harmonized by [IPUMS](https://cps.ipums.org/cps/index.shtml)
- Using annual wage income, hours, demographics
- Deflate using the Personal Consumption Expenditure Price Index
- Consult the [data dictionary](../data/cps_00049.cbk)
:::


## Some Cleaning Steps


In [None]:
#| code-line-numbers: "4|5|6|7|8-9|10"
#| echo: true
#| output: false
using DataFrames, CSV, StatsPlots, DataFramesMeta, StatsBase
pce = CSV.read("../data/PCE_index.csv",DataFrame)

data = @chain CSV.read("../data/cps_00049.csv",DataFrame) begin
  @subset :INCWAGE .>0 #<- keep only if positive earnings
  @subset :WKSWORK1 .>10 #<- keep only if worked more than 10 weeks
  @transform :HOURS = :UHRSWORKLY .* :WKSWORK1 #<- calculate annual hours
  innerjoin(pce,on=:YEAR)
  @transform :INCWAGE = 100 * :INCWAGE ./ :PCE #<- deflate earnings (2017 base)
  @transform :WAGE = :INCWAGE ./ :HOURS #<- annual earnings / annual hours
end

## A Quick Look at Sample Sizes


In [None]:
#| echo: true

@chain data begin
  groupby(:YEAR)
  @combine :N = length(:INCWAGE)
end

## Most things we will get very precisely
e.g. gender wage gaps:

In [None]:
#| code-fold: true
#| echo: true
@chain data begin
  groupby([:YEAR,:SEX])
  @combine begin 
    :m_wage = mean(log.(:WAGE),weights(:ASECWT))
    :v_est = var(log.(:WAGE),weights(:ASECWT)) / length(:WAGE)
    end
  @transform :ci_width = 1.96 * sqrt.(:v_est)
  @df _ scatter(:YEAR,:m_wage,yerror=:ci_width,group=:SEX)
end

## Reviewing Population Weights (Exercise)

::: {.panel-tabset}

### Unweighted


In [None]:
#| echo: true
@chain data begin
    groupby(:YEAR)
    @combine :wage = mean(:WAGE)
    @df _ plot(:YEAR,:wage)
end

### Weighted


In [None]:
#| echo: true

@chain data begin
    groupby(:YEAR)
    @combine :wage = mean(:WAGE,weights(:ASECWT))
    @df _ plot(:YEAR,:wage)
end

### Earnings


In [None]:
#| echo: true

@chain data begin
    groupby(:YEAR)
    @combine :earn = mean(:INCWAGE,weights(:ASECWT))
    @df _ plot(:YEAR,:earn)
end

### Hours


In [None]:
#| echo: true

@chain data begin
    groupby(:YEAR)
    @combine :hours = mean(:HOURS,weights(:ASECWT))
    @df _ plot(:YEAR,:hours)
end

:::

# Some Basic Facts About

## Trends in Inequality

:::{.incremental}

- How should we document inequality?
- Some natural statistics to analyze the distribution of earnings and wages
  - Variance (definition)
  - Percentiles (definition)

:::

## Variance in log wages


In [None]:
#| echo: true

@chain data begin
    groupby(:YEAR)
    @combine :var_lw = var(log.(:WAGE),weights(:ASECWT))
    @df _ plot(:YEAR,:var_lw,linewidth=3)
end


## Percentiles of log wages


In [None]:
#| code-fold: true
#| echo: true
pctiles = @chain data begin
    groupby(:YEAR)
    @combine begin 
    :p_90_10 = quantile(log.(:WAGE),weights(:ASECWT),0.9) - quantile(log.(:WAGE),weights(:ASECWT),0.1)
    :p_75_25 = quantile(log.(:WAGE),weights(:ASECWT),0.75) - quantile(log.(:WAGE),weights(:ASECWT),0.25)
    end
    stack(Not(:YEAR))
    @df _ plot(:YEAR,:value,group=:variable,linewidth=3)
end

## Percentiles of log wages (normalized)


In [None]:
#| code-fold: true
#| echo: true
pctiles = @chain data begin
    groupby(:YEAR)
    @combine begin 
    :p_90_10 = quantile(log.(:WAGE),weights(:ASECWT),0.9) - quantile(log.(:WAGE),weights(:ASECWT),0.1)
    :p_75_25 = quantile(log.(:WAGE),weights(:ASECWT),0.75) - quantile(log.(:WAGE),weights(:ASECWT),0.25)
    end
    stack(Not(:YEAR))
    groupby(:variable)
    @transform :value = :value .- :value[1]
    @df _ plot(:YEAR,:value,group=:variable,linewidth=3)
end
plot!([1990,2024],[0.,0.],linestyle=:dash,color="grey",label=false)

## Percentiles of log wages (normalized)


In [None]:
#| code-fold: true
#| echo: true
pctiles = @chain data begin
    groupby(:YEAR)
    @combine begin 
    :p_90 = quantile(log.(:WAGE),weights(:ASECWT),0.9)
    :p_10 = quantile(log.(:WAGE),weights(:ASECWT),0.1)
    :p_75 = quantile(log.(:WAGE),weights(:ASECWT),0.75)
    :p_25 = quantile(log.(:WAGE),weights(:ASECWT),0.25)
    end
    stack(Not(:YEAR))
    groupby(:variable)
    @transform :value = :value .- :value[1]
    @df _ plot(:YEAR,:value,group=:variable,linewidth=3)
end
plot!([1990,2024],[0.,0.],linestyle=:dash,color="grey",label=false)

# Trends in Other Metrics

## Wage Gaps

:::.{incremental}

- While overall wage inequality is a main topic of interest, other metrics may matter for policy questions
- E.g. gaps by education, gender/sex, race
- Will look at these using the same data

:::

## Gender Wage Gap


In [None]:
#| code-fold: true
#| echo: true
#| code-line-numbers: 2-3|4|5|6

@chain data begin
  groupby([:YEAR,:SEX])
  @combine :m_wage = mean(log.(:WAGE),weights(:ASECWT))
  unstack(:YEAR,:SEX,:m_wage,renamecols=x->Symbol(:wage_, x))
  @transform :gender_gap = :wage_1 .- :wage_2
  @df _ plot(:YEAR,:gender_gap,label = "gender wage gap",linewidth=3.)
end

## Race Wage Gap


In [None]:
#| code-fold: true
#| echo: true

@chain data begin
  groupby([:YEAR,:RACE])
  @combine :m_wage = mean(log.(:WAGE),weights(:ASECWT))
  unstack(:YEAR,:RACE,:m_wage,renamecols=x->Symbol(:wage_, x))
  @transform :black_white_gap = :wage_100 .- :wage_200
  @df _ plot(:YEAR,:black_white_gap,label = "race gap: black-white",linewidth=3.)
end

How much of this is sampling error?...

## Race Wage Gap


In [None]:
#| code-fold: true
#| echo: true
#| code-line-numbers:  5-6|8|9-10|12-13

@chain data begin
  @subset :RACE .<= 200
  groupby([:YEAR,:RACE])
  @combine begin 
    :m_wage = mean(log.(:WAGE),weights(:ASECWT))
    :v_est = var(log.(:WAGE),weights(:ASECWT)) / length(:WAGE)
  end
  stack()
  @transform :variable = string.(:variable,"_",:RACE)
  unstack(:YEAR,:variable,:value)
  @transform begin
    :black_white_gap = :m_wage_100 .- :m_wage_200
    :std_err = 1.96*sqrt.(:v_est_100 .+ :v_est_200)
  end
  @df _ scatter(:YEAR,:black_white_gap,yerror = :std_err,label = "race gap: black-white")
end

## Education Premia {.smaller}

- Define the "college premium" or "returns to college" as:
$$ \mathbb{E}[\log(\text{wage})\ |\ \geq \text{degree}] - \mathbb{E}[\log(\text{wage})\ |\ \geq \text{High School or Equiv}] $$
- Note: there are other definitions @Heathcote2023

## 4-Yr and 2-Yr College Premia


In [None]:
#| code-fold: true
#| echo: true
#| code-line-numbers:  2|8

@chain data begin
  @transform :educ = (:EDUC.>=70) .+ (:EDUC.>=90) .+ (:EDUC.>=110)
  groupby([:YEAR,:educ])
  @combine begin 
    :m_wage = mean(log.(:WAGE),weights(:ASECWT))
  end
  unstack(:YEAR,:educ,:m_wage,renamecols=x->Symbol(:wage_, x))
  @transform :two_yr = :wage_2 .- :wage_1 :four_yr = :wage_3 .- :wage_1
  @df _ begin
    plot(:YEAR,:two_yr,label="two-year prem",linewidth=3)
    plot!(:YEAR,:four_yr,label="four-year prem",linewidth=3)
  end
end

## Trends in Single Parenthood


In [None]:
#| code-fold: true
#| echo: true
#| code-line-numbers:  2|4-6

test = @chain data begin
  @subset :NCHILD.>0
  groupby(:YEAR)
  @combine begin
    :divorce = mean(:MARST.==4)
    :never_married = mean(:MARST.==6)
    :single = mean(:MARST.>2)
  end
  stack(Not(:YEAR))
  @df _ plot(:YEAR,:value,group=:variable,linewidth=3)
end

# Trends over the life-cycle

## Life-cycle wage dynamics

:::{.incremental}
- So far we have seen changes in wage distributions and wage gaps over time
- But perhaps these statistics vary with age, too.
- We have 34 years of data to study cohorts over time
- Beware the rank condition!
:::

## Calculating life-cycle profiles {.scrollable}


In [None]:
#| echo: true
#| code-line-numbers:  2|3-4

v_w = @chain data begin
  @transform :cohort = Int.(round.((:YEAR .- :AGE) ./ 10) .* 10)
  groupby([:cohort,:AGE])
  @combine :var_logw = var(log.(:WAGE),weights(:ASECWT))
end

## Variance in log-wages increases with age


In [None]:
@df v_w plot(:AGE,:var_logw,group=:cohort,linewidth=4)

. . .

What do you think could be driving this pattern? Why does it matter? (Exercise at board on income processes)

# Summary: More Unequal We Stand?

## Overview: @Heathcote2023 {.smaller}

:::{.incremental}
- Heathcote, Perri, Violante, and Zhang (2023) update their 2010 analysis [@Heathcote2010] through 2021
- Use the **household budget constraint** as organizing framework
$$ c + s = \sum_{i=1}^{N}w_{i}h_{i} + d + b^{p} + b^{g} - \tau $$
- $d$: asset income, $b^{p}$/$b^{g}$: private/gov transfers, $\tau$: taxes
- Start with individual wages → add hours → add household members → add government
- Multiple data sources: CPS, ACS, PSID, CES, SCF
:::

## Key Finding 1: Wage Inequality Trends

:::: {.columns}
::: {.column width="55%"}
<!-- Extract Figure 7 from papers/more_unequal_we_stand.pdf (page 10) -->
![](../papers/figures/heathcote_fig7.png)
:::

::: {.column width="45%"}
::: {.incremental style="font-size: 0.8em;"}
- **At the top**: Steady growth in inequality since mid-1980s
- **At the bottom**: Remarkably stable since 2000
- College premium stopped growing after early 2000s
- Gender wage gap continues to shrink
:::
:::
::::

## Key Finding 2: The Role of Labor Supply

:::: {.columns}
::: {.column width="60%"}
<!-- Extract Figure 11 from papers/more_unequal_we_stand.pdf (page 14) -->
![Earnings, wages, and weeks worked in three slices of the earnings distribution [@Heathcote2023]](../papers/figures/heathcote_fig11.png)
:::

::: {.column width="40%"}
::: {.incremental style="font-size: 0.8em;"}
- **Top earners**: Inequality driven entirely by wages (hours stable at ~50 weeks/year)
- **Bottom earners**: Weeks worked fell 80% since 1967, wages fell 20%
- Employment differences are key driver at the bottom
:::
:::
::::

## Key Finding 3: The Declining Role of the Household {.smaller}

:::{.incremental}
- Define index of **household income pooling**:
$$ HP_{t} = \left(var(y_{it}) - var(\bar{y}_{it})\right) / var(y_{it}) $$
- $\bar{y}_{i}$: per-person income in household $i$
- *Interpretation*: the share of variance in income reduced by pooling at household level
- Household income pooling used to reduce inequality by 60% in 1967
- By 2021, only reduces inequality by 35%
- **Why the decline?** (decomposition exercise at board)
  - Narrowing gender earnings gap → women less reliant on spousal income
  - Rising correlation of earnings within couples → less insurance
- Single households increasingly disadvantaged
:::

## Key Finding 4: The Growing Role of Government

:::: {.columns}
::: {.column width="60%"}
<!-- Extract Figure 19 from papers/more_unequal_we_stand.pdf (page 21) -->
![Inequality reduction mechanisms [@Heathcote2023]](../papers/figures/heathcote_fig19.png)
:::

::: {.column width="40%"}
::: {.incremental style="font-size: 0.8em;"}
- Government redistribution (taxes + transfers) increasingly important
- Index: compare pretax to post-tax and transfer income
:::
:::
::::

## Key Finding 5: Income vs. Consumption Inequality

![Income and consumption inequality [@Heathcote2023]](../papers/figures/heathcote_fig21.png)

::: {.incremental style="font-size: 0.5em;"}
- **Market income**: Large increase in inequality (especially at bottom during recessions)
- **Disposable income**: Much smaller increase (government smooths fluctuations)
- **Consumption**: Remarkably stable over entire period
:::

## Main Takeaways from @Heathcote2023

:::{.incremental}
1. **Inequality at top**: Driven by wage differentials, continues to grow (but more slowly)
2. **Inequality at bottom**: Driven by employment, dominated by cyclical fluctuations
3. **The household**: Declining importance as inequality-reducing mechanism
4. **The government**: Increasing role in redistribution and stabilization
5. **Consumption**: Remains remarkably stable despite income inequality trends
:::

## What's next? {.smaller}

- We've got a good picture of:
  1. Patterns of inequality over time and over the life-cycle
  2. The role that government transfers can play in off-setting this
- But we don't yet know:
  1. What are the **welfare implications** of these facts?
  2. How much would people value programs that insure / redistribute?
  3. What are the origins of this inequality and should we counteract with policy?
- These are the topics we will work on for the rest of this course

## 