# Application to the Demand for Cigarettes

Why are we interested in knowing the elasticity of demand for cigarettes?
1. Theory of optimal taxation: optimal tax is inverse to elasticity: smaller deadweight loss if quantity is affected less.
2. Externalities of smoking – role for government intervention to discourage smoking
- second-hand smoke (non-monetary)
- monetary externalities

## Panel data set
- Annual cigarette consumption, average prices paid by end consumer (including tax), personal income
- 48 continental US states, 1985-1995

## Estimation strategy
- Having panel data allows us to control for unobserved statelevel characteristics that enter the demand for cigarettes, as
long as they don’t vary over time 
- But we still need to use IV estimation methods to handle the simultaneous causality bias that arises from the interaction of
supply and demand.

## Fixed-effects model of cigarette demand
\begin{equation}
ln( Q_{it}^{cigarettes} ) = \alpha_i + \beta_1 ln( P_{it}^{cigarettes}  ) + \beta_2 ln(Income_{it}) + u_{it}
\end{equation}

- i = 1,…,48, t = 1985, 1986,…,1995
- $\alpha_i$ reflects unobserved omitted factors that vary across states but not over time, e.g. attitude towards smoking
- Still, $corr(ln(P_{it}^{cigarettes}),u_{it})$ is plausibly nonzero because of supply/demand interactions

Estimation strategy:
- Use panel data regression methods to eliminate $\alpha_i$

## The “changes” method (when T=2)

Rewrite the regression in “changes” form:

\begin{equation}
ln(Q_{i1995}^{cigarettes}) - ln(Q_{i1985}^{cigarettes}) = \beta_1 [ln(P_{i1995}^{cigarettes}) - ln(P_{i1985}^{cigarettes})] + \beta_2[ln(Income_{i1995}) - ln(Income_{i1985})] + [u_{i1995} - u_{i1985}]
\end{equation}

- Create “10-year change” variables, for example: 10-year change in log price
- Then estimate the demand elasticity by TSLS using 10-year changes in the instrumental variables


## Empirical Exercise in Stata

Note: This material is based on the [Companion Website for Stock and Watson's Introduction to Econometrics](https://wps.pearsoned.com/aw_stock_ie_3/178/45691/11696959.cw/index.html).

### Set Stata Magic in Python

In [1]:
%%capture
import os
os.chdir('/Program Files/Stata17/utilities')
from pystata import config
config.init('se')

### Import Data from Web

In [2]:
%%stata
* Import data
import delimited "C:\Users\edloaeza\Documents\GitHub\Marijuana\pnas.1903434116.sd01.csv"


. * Open Stata file from web page book
. use https://wps.pearsoned.com/wps/media/objects/11422/11696965/datasets3e/dat
> asets/cig_ch12.dta, clear

. 


In [3]:
%%stata
des


Contains data from https://wps.pearsoned.com/wps/media/objects/11422/11696965/d
> atasets3e/datasets/cig_ch12.dta
 Observations:            96                  
    Variables:             9                  29 Dec 2010 20:03
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
state           str9    %9s                   
year            float   %9.0g                 
cpi             float   %9.0g                 
pop             float   %9.0g               * population from RAs before 1990;
                                                from web for 1990-2000
packpc          float   %9.0g                 packs per capita = packs/pop
income          float   %9.0g                 state personal income (total,
                                                nominal)
tax          

### Create initial variables

In [4]:
%%stata
sort state year

* Generate variables
gen ravgprs = avgprs/cpi
 label var ravgpr "real average price during fiscal year, including sales taxes"
gen rtax = tax/cpi
 label var rtax "real average Cig specifice tax during fiscal year"
gen rtaxs = taxs/cpi
 label var rtaxs "real average total tax during fiscal year,including sales taxes"
gen rtaxso = rtaxs-rtax
 label var rtaxso "real average sales tax per pack during fiscal year"
gen lpackpc = log(packpc)
gen lravgprs = log(ravgprs)

* Real Percapita State Income 
gen perinc = income/(pop*cpi)
gen lperinc = log(perinc)
encode state, gen(snum)

* Create “10-year change” variables
g dlpackpc = log(packpc/packpc[_n-1])
g dlavgprs = log(avgprs/avgprs[_n-1])
g dlperinc = log(perinc/perinc[_n-1])
g drtaxs = rtaxs-rtaxs[_n-1]
g drtax = rtax-rtax[_n-1]
g drtaxso = rtaxso-rtaxso[_n-1]


. sort state year

. 
. * Generate variables
. gen ravgprs = avgprs/cpi

.  label var ravgpr "real average price during fiscal year, including sales tax
> es"

. gen rtax = tax/cpi

.  label var rtax "real average Cig specifice tax during fiscal year"

. gen rtaxs = taxs/cpi

.  label var rtaxs "real average total tax during fiscal year,including sales t
> axes"

. gen rtaxso = rtaxs-rtax

.  label var rtaxso "real average sales tax per pack during fiscal year"

. gen lpackpc = log(packpc)

. gen lravgprs = log(ravgprs)

. 
. * Real Percapita State Income 
. gen perinc = income/(pop*cpi)

. gen lperinc = log(perinc)

. encode state, gen(snum)

. 
. * Create “10-year change” variables
. g dlpackpc = log(packpc/packpc[_n-1])
(1 missing value generated)

. g dlavgprs = log(avgprs/avgprs[_n-1])
(1 missing value generated)

. g dlperinc = log(perinc/perinc[_n-1])
(1 missing value generated)

. g drtaxs = rtaxs-rtaxs[_n-1]
(1 missing value generated)

. g drtax = rtax-rtax[_n-1]
(1 missing 

### Use TSLS to estimate the demand elasticity by using the “10-year changes” specification

In [5]:
%%stata
eststo clear
ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso) if year==1995, r
est store IV1
qui: estat firststage
qui: mat fstat = r(singleresults)
qui: estadd scalar fs = fstat[1,4]
qui: estadd local IV "Sales Tax", replace


. eststo clear

. ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso) if year==1995, r

Instrumental variables 2SLS regression            Number of obs   =         48
                                                  Wald chi2(2)    =      26.26
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.5499
                                                  Root MSE        =     .08803

------------------------------------------------------------------------------
             |               Robust
    dlpackpc | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    dlavgprs |  -.9380143   .2009132    -4.67   0.000    -1.331797   -.5442317
    dlperinc |   .5259693   .3287138     1.60   0.110     -.118298    1.170237
       _cons |   .2085492   .1260941     1.65   0.098    -.0385906     .455689
---

### Check instrument relevance: compute first-stage F

In [6]:
%%stata
reg dlavgprs drtaxso dlperinc if year==1995, r
test drtaxso


. reg dlavgprs drtaxso dlperinc if year==1995, r

Linear regression                               Number of obs     =         48
                                                F(2, 45)          =      16.84
                                                Prob > F          =     0.0000
                                                R-squared         =     0.5146
                                                Root MSE          =     .06334

------------------------------------------------------------------------------
             |               Robust
    dlavgprs | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     drtaxso |   .0254611   .0043876     5.80   0.000      .016624    .0342982
    dlperinc |  -.2241037   .2188815    -1.02   0.311    -.6649536    .2167463
       _cons |   .5321948   .0295315    18.02   0.000     .4727153    .5916742
-------------------------------------------

First stage F = 33.7 > 10 so instrument is not weak

Can we check instrument exogeneity? No: $m = k$

In [7]:
%%stata
ivregress 2sls dlpackpc dlperinc (dlavgprs = drtax) if year==1995, r
est store IV2
qui: estat firststage
qui: mat fstat = r(singleresults)
qui: estadd scalar fs = fstat[1,4]
qui: estadd local IV "Cig. specific tax", replace


. ivregress 2sls dlpackpc dlperinc (dlavgprs = drtax) if year==1995, r

Instrumental variables 2SLS regression            Number of obs   =         48
                                                  Wald chi2(2)    =      43.88
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.5197
                                                  Root MSE        =     .09094

------------------------------------------------------------------------------
             |               Robust
    dlpackpc | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    dlavgprs |  -1.342515   .2213997    -6.06   0.000     -1.77645   -.9085793
    dlperinc |   .4281457   .2892323     1.48   0.139    -.1387391    .9950305
       _cons |   .4502643   .1347968     3.34   0.001     .1860674    .7144613
---------------------

### What about two instruments?
#### Check instrument relevance: compute first stage F

In [8]:
%%stata
reg dlavgprs drtaxso drtax dlperinc if year==1995, r
test drtaxso drtax


. reg dlavgprs drtaxso drtax dlperinc if year==1995, r

Linear regression                               Number of obs     =         48
                                                F(3, 44)          =      66.68
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7779
                                                Root MSE          =     .04333

------------------------------------------------------------------------------
             |               Robust
    dlavgprs | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     drtaxso |    .013457   .0031405     4.28   0.000     .0071277    .0197863
       drtax |   .0075734   .0008859     8.55   0.000     .0057879    .0093588
    dlperinc |  -.0289943   .1242309    -0.23   0.817    -.2793654    .2213767
       _cons |   .4919733   .0183233 

88.62 > 10 so instruments aren’t weak

### What about two instruments (cig-only tax, sales tax)?

In [9]:
%%stata
ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso drtax) if year==1995, r
est store IV3
qui: estat firststage
mat fstat = r(singleresults)


. ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso drtax) if year==1995, r

Instrumental variables 2SLS regression            Number of obs   =         48
                                                  Wald chi2(2)    =      45.44
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.5466
                                                  Root MSE        =     .08836

------------------------------------------------------------------------------
             |               Robust
    dlpackpc | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    dlavgprs |  -1.202403   .1906896    -6.31   0.000    -1.576148   -.8286588
    dlperinc |   .4620299   .2995177     1.54   0.123    -.1250139    1.049074
       _cons |   .3665388   .1180414     3.11   0.002     .1351819    .5978957
-------------

With m>k, we can test the overidentifying restrictions…

### Test the overidentifying restrictions

In [10]:
%%stata
predict e, resid
reg e drtaxso drtax dlperinc if year==1995
test drtaxso drtax


. predict e, resid
(1 missing value generated)

. reg e drtaxso drtax dlperinc if year==1995

      Source |       SS           df       MS      Number of obs   =        48
-------------+----------------------------------   F(3, 44)        =      1.64
       Model |  .037769176         3  .012589725   Prob > F        =    0.1929
    Residual |  .336952289        44  .007658007   R-squared       =    0.1008
-------------+----------------------------------   Adj R-squared   =    0.0395
       Total |  .374721465        47  .007972797   Root MSE        =    .08751

------------------------------------------------------------------------------
           e | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     drtaxso |   .0127669   .0061587     2.07   0.044      .000355    .0251789
       drtax |  -.0038077   .0021179    -1.80   0.079     -.008076    .0004607
    dlperinc |  -.0934062   .297845

#### Compute J-statistic, which is m*F, where F tests whether coefficients on the instruments are zero so

In [11]:
%%stata
dis "OverID stat:" r(df)*r(F)
dis "p-value:"  chiprob(r(df)-1,r(df)*r(F))


. dis "OverID stat:" r(df)*r(F)
OverID stat:4.9319853

. dis "p-value:"  chiprob(r(df)-1,r(df)*r(F))
p-value:.02636401

. 


## The correct degrees of freedom for the J-statistic is m–k:
$J = mF$

where $F$ = the F-statistic testing the coefficients on $Z_{1i},…,Z_{mi}$ in a regression of the TSLS residuals against $Z_{1i},…,Z_{mi}, W_{1i},…,W_{mi}$.
- Under the null hypothesis that all the instruments are exogeneous, $J$ has a chi-squared distribution with $m–k$ degrees
of freedom
- Here, $J = 4.93$, distributed chi-squared with d.f. = 1; the 5\% critical value is 3.84, so reject at 5\% sig. level.

## Tabular summary of these results:

In [12]:
%%stata
esttab IV1 IV2 IV3, label se star(* 0.10 ** 0.05 *** 0.01) ///
title(2SLS Estimates of the Demand of Cigarrets Using Panel Data for States in USA) ///
stats(IV fs Jtest pvalue N, ///
label("Instr. Variables" "F-statistic" "J-test" "P-value" "Observations"))


. esttab IV1 IV2 IV3, label se star(* 0.10 ** 0.05 *** 0.01) ///
> title(2SLS Estimates of the Demand of Cigarrets Using Panel Data for States i
> n USA) ///
> stats(IV fs Jtest pvalue N, ///
> label("Instr. Variables" "F-statistic" "J-test" "P-value" "Observations"))

2SLS Estimates of the Demand of Cigarrets Using Panel Data for States in USA
--------------------------------------------------------------------
                              (1)             (2)             (3)   
                         dlpackpc        dlpackpc        dlpackpc   
--------------------------------------------------------------------
dlavgprs                   -0.938***       -1.343***       -1.202***
                          (0.201)         (0.221)         (0.191)   

dlperinc                    0.526           0.428           0.462   
                          (0.329)         (0.289)         (0.300)   

Constant                    0.209*          0.450***        0.367***
                          (0.

## How should we interpret the J-test rejection?

- J-test rejects the null hypothesis that both the instruments are exogenous
- This means that either rtaxso is endogenous, or rtax is endogenous, or both
- The J-test doesn’t tell us which!! You must exercise judgment…
- Why might rtax (cig-only tax) be endogenous?
 - People become health conscious: fewer smokers and political pressure for high cigarette taxes
 - If so, cig-only tax is endogenous
- This reasoning doesn’t apply to general sales tax
- ⇒ use just one instrument, the general sales tax

## The Demand for Cigarettes: Summary of Empirical Results

- Use the estimated elasticity based on TSLS with the general sales tax as the only instrument:
 - Elasticity = -.94, SE = .21
- This elasticity is surprisingly large (not inelastic) 
    – a 1\% increase in prices reduces cigarette sales by nearly 1\%. 
    This is much more elastic than conventional wisdom in the health economics literature.
- Why?
- This is a long-run (ten-year change) elasticity.
- What would you expect a short-run (one-year change) elasticity to be – more or less elastic?

## Summary: IV Regression
- A valid instrument lets us isolate a part of X that is uncorrelated with u, and that part can be used to estimate the effect of a change in X on Y
- IV regression hinges on having valid instruments:
1. Relevance: check via first-stage F
2. Exogeneity: Test overidentifying restrictions via the J-statistic