## Join mortality rates for sex, age, year and state (USMDB)

In [1]:
use "Z:\Daten\CPS\cps_00024.dta", clear

Remove individuals younger than 16 and older than 75 from the data set.

In [6]:
keep if inrange(age, 15, 75)

(18,340,670 observations deleted)


In [3]:
*sort statecensus year sex age
*keep year month cpsidp mish wtfinl age sex race bpl yrimmig citizen nativity empstat labforce occ ind durunemp durunem2 whyunemp uhrs* panlwt classwkr statecensus absent whyabsnt whyptlwk wkstat
*save, replace

In [7]:
save "Z:\Daten\CPS\cps20211012"

file Z:\Daten\CPS\cps20211012.dta saved


In [8]:
use "Z:\Daten\USMDB\usmdb.dta", clear
sort statecensus year sex age
save, replace




file Z:\Daten\USMDB\usmdb.dta saved


In [9]:
use "Z:\Daten\CPS\cps20211012.dta"
merge m:1 statecensus year sex age using "Z:\Daten\USMDB\usmdb.dta"



(note: variable sex was byte, now float to accommodate using data's values)
(note: variable age was byte, now float to accommodate using data's values)
(note: variable statecensus was byte, now float to accommodate using data's
       values)
(note: variable year was int, now float to accommodate using data's values)

    Result                           # of obs.
    -----------------------------------------
    not matched                     3,095,410
        from master                 2,680,585  (_merge==1)
        from using                    414,825  (_merge==2)

    matched                        52,698,471  (_merge==3)
    -----------------------------------------


In [3]:
tab age if _merge == 2

_merge not found


r(111);





In [6]:
save "Z:\Daten\CPS\cps20211012.dta", replace

file Z:\Daten\CPS\cps20211012.dta saved


In [1]:
use "Z:\Daten\CPS\cps20211012.dta", clear

In [5]:
keep if inlist(_merge, 1, 3)

(414,825 observations deleted)


In [7]:
tab _merge


                 _merge |      Freq.     Percent        Cum.
------------------------+-----------------------------------
        master only (1) |  2,680,585        4.84        4.84
            matched (3) | 52,698,471       95.16      100.00
------------------------+-----------------------------------
                  Total | 55,379,056      100.00


In [1]:
cd "Z:\Daten\CPS"
use "cps20211012.dta", clear


Z:\Daten\CPS



## Flow calc

In [None]:
/* loading data
cd "Z:\Daten\CPS"
use "cps20211012.dta", clear
keep if (inrange(age, 15, 75)) /*& empstat ~= 0) | (empstat == 0 & age == 15)*/

* create flag for redesign of survey in January 1994
gen reset = 0
replace reset = 1 if year==1994 & month==1

save, replace

* create date variable
gen date = mdy(month, 12, year)
format date %td
*/
keep cpsdip date mish sex race age

* next month's age, race and sex for validation
bysort cpsidp (date mish): gen lead_mish = mish[_n+1]
bysort cpsidp (date mish): gen lead_age = age[_n+1] if lead_mish - mish == 1
bysort cpsidp (date mish): gen lead_race = race[_n+1] if lead_mish - mish == 1
bysort cpsidp (date mish): gen lead_sex = sex[_n+1] if lead_mish - mish == 1

* validation
gen linked = .m
replace linked = 1 if lead_mish - mish == 1 & lead_race == race & lead_sex == sex & ((mish == 4 & lead_age - age >= 0 & lead_age - age <= 2) | (mish ~= 4 & lead_age - age >= 0 & lead_age - age <= 1))
tab linked
bysort cpsidp (mish): gen lead_linked = linked[_n+1]
tab linked lead_linked, missing
save check_link, replace

In [None]:
numlabel, add
tab empstat

gen armed_forces = empstat == 1

save, replace

In [5]:
drop reset

In [None]:
recode empstat (1/12 = 1) (20/22 = 2) (30/max = 3), gen(emp2)
label define emp2 1 "employed" 2 "unemployed" 3 "not in labor force"
label value emp2 emp2
gen armed_forces = empstat == 1

save "cps_20210812.dta", replace


(52491975 differences between empstat and emp2)




file cps_20210812.dta saved


q(x)      Probability of death between age x and age x+n

I refrain from simply removing niu, since in a significant amount of cases these states are linked to a valid preceding or following entry for empstat. Imputation might offer a solution. 

In [None]:
gen wmort = wtfinl * (1-qx)

In [None]:
bysort cpsidp (mish): gen flow = emp2*10 + emp2[_n+1]
label define flow 11 "EE" 12 "EU" 13 "EN" 21 "UE" 22 "UU" 23 "UN" 31 "NE" 32 "NU" 33 "NN" 1 "E" 2 "U" 3 "N" 0 "ooo" 10 "Eooo" 20 "Uooo" 30 "Nooo" 
label value flow flow


(10,593,542 missing values generated)




In [3]:
describe


Contains data from Z:\Daten\CPS\cps_20210812.dta
  obs:    52,698,471                          
 vars:            46                          5 Oct 2021 21:47
--------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------
year            float   %8.0g                 survey year
month           byte    %12.0g     MONTH      month
mish            byte    %8.0g      MISH       month in sample, household level
statecensus     float   %75.0g     STATECENSUS
                                              state (census code)
wtfinl          double  %12.0g                final basic weight
cpsidp          double  %12.0g                cpsid, person record
age             float   %23.0g     AGE        age
sex             float   %9.0g      SEX        sex
race            int     %58.0g     RACE   

# STOP!
https://cps.ipums.org/cps/cps_linking_documentation.shtml

There are several redesigns to the CPS, so that some samples are not linkable across months. Remove those links!

In [None]:
tab flow


       flow |      Freq.     Percent        Cum.
------------+-----------------------------------
        ooo |     51,197        0.12        0.12
          E |      2,339        0.01        0.13
          U |        278        0.00        0.13
          N |        493        0.00        0.13
       Eooo |      1,456        0.00        0.13
         EE | 25,450,770       60.45       60.58
         EU |    426,737        1.01       61.59
         EN |    987,205        2.34       63.94
       Uooo |        141        0.00       63.94
         UE |    479,387        1.14       65.08
         UU |    828,661        1.97       67.04
         UN |    391,923        0.93       67.97
       Nooo |        561        0.00       67.98
         NE |    898,280        2.13       70.11
         NU |    399,379        0.95       71.06
         NN | 12,186,122       28.94      100.00
------------+-----------------------------------
      Total | 42,104,929      100.00


In [None]:
save, replace

https://www.econstor.eu/bitstream/10419/229653/1/GLO-DP-0781.pdf

Bernhardt, Robert; Munro, David; Wolcott, Erin (2021) : How Does the
Dramatic Rise of CPS Non-Response Impact Labor Market Indicators?, GLO Discussion Paper,
No. 781, Global Labor Organization (GLO), Essen

I follow their suggested apprach here, since there are indicators for biased labor force estimates due to non response.

In [7]:
numlabel, add
tab empstat




                 employment status |      Freq.     Percent        Cum.
-----------------------------------+-----------------------------------
                            0. niu |     73,564        0.14        0.14
                   1. armed forces |    132,932        0.25        0.39
                       10. at work | 31,846,236       60.43       60.82
12. has job, not at work last week |  1,541,120        2.92       63.75
                    20. unemployed |  1,045,933        1.98       65.73
21. unemployed, experienced worker |  1,050,728        1.99       67.73
        22. unemployed, new worker |    100,183        0.19       67.92
            30. not in labor force |     13,978        0.03       67.94
               31. nilf, housework |  2,917,773        5.54       73.48
          32. nilf, unable to work |  1,595,647        3.03       76.51
                  33. nilf, school |  1,024,428        1.94       78.45
                   34. nilf, other |  8,028,656       15.24  

In [9]:
tab year if empstat == 0


survey year |      Freq.     Percent        Cum.
------------+-----------------------------------
       1984 |      7,993       10.87       10.87
       1985 |      7,584       10.31       21.17
       1986 |      7,823       10.63       31.81
       1987 |      7,716       10.49       42.30
       1988 |      7,872       10.70       53.00
       1989 |      7,402       10.06       63.06
       1990 |      7,297        9.92       72.98
       1991 |      7,194        9.78       82.76
       1992 |      6,426        8.74       91.49
       1993 |      6,256        8.50      100.00
       2003 |          1        0.00      100.00
------------+-----------------------------------
      Total |     73,564      100.00


In [13]:
drop flow
label drop flow
bysort cpsidp (mish): gen flow = emp2*10 + emp2[_n+1] if emp2 != 0 & emp2[_n+1]
label define flow 11 "EE" 12 "EU" 13 "EN" 21 "UE" 22 "UU" 23 "UN" 31 "NE" 32 "NU" 33 "NN"
label value flow flow

bysort cpsidp (mish): gen flow_r = emp2[_n-1]*10 + emp2 if emp2 != 0 & emp2[_n-1] != 0
label define flow_r 11 "EE" 12 "EU" 13 "EN" 21 "UE" 22 "UU" 23 "UN" 31 "NE" 32 "NU" 33 "NN"
label value flow_r flow_r





(10,650,007 missing values generated)



(10,650,007 missing values generated)




In [11]:
tab emp2


    RECODE of empstat |
  (employment status) |      Freq.     Percent        Cum.
----------------------+-----------------------------------
                    0 |     73,564        0.14        0.14
          1. employed | 33,520,288       63.61       63.75
        2. unemployed |  2,196,844        4.17       67.92
3. not in labor force | 16,907,775       32.08      100.00
----------------------+-----------------------------------
                Total | 52,698,471      100.00


In [None]:
collapse (sum) n_flows = lnkfw1mwt, by(date sex age flow)

In [None]:
save "Z:\Daten\CPS\cps_flow.dta", replace

(note: file cps_flow_test.dta not found)
file cps_flow_test.dta saved


In [None]:
recode age (15 = .) (16/25 = 1) (26/35 = 2) (36/45 = 3) (46/55 = 4) (56/65 = 5) (66/75 = 6), gen(age_group)
label define age_group 0 "15" 1 "16-25" 2 "26-35" 3 "36-45" 4 "46-55" 5 "56-65" 6 "66-75"
label value age_group age_group


(600150 differences between age and age_group)




In [None]:
collapse (sum) n_flow = n_flow, by(age_group sex date flow)

In [None]:
save "Z:\Daten\CPS\cps_flow_agegroup.dta", replace

In [None]:
collapse (sum) n_flow = wmort, by(date sex age flow)

In [None]:
use "Z:\Daten\CPS\cps_flow.dta"

In [None]:
recode age (15 = .) (16/25 = 1) (26/35 = 2) (36/45 = 3) (46/55 = 4) (56/65 = 5) (66/75 = 6), gen(age_group)
label define age_group 0 "15" 1 "16-25" 2 "26-35" 3 "36-45" 4 "46-55" 5 "56-65" 6 "66-75"
label value age_group age_group


(600150 differences between age and age_group)




In [None]:
collapse (sum) n_flow = n_flow, by(age_group sex date flow)

In [None]:
cd "Z:\Daten\CPS\"

Z:\Daten\CPS


In [None]:
save cps_flow_agegroup, replace

file cps_flow_agegroup.dta saved


In [None]:
bysort cpsidp (mish): gen lag_panlwt = panlwt[_n-1]
bysort cpsidp (mish): gen lag_linked = linked[_n-1]
bysort cpsidp (mish): gen lag_emp = emp2[_n-1] if lag_linked == 1



gen flow = emp2 + lead_emp
replace flow = emp2 + "0" if missing(lead_emp)


* Exclude niu values 0 from anaysis (empstat == 0)

* keep if !inlist(flow, "00", "E0", "N0", "U0")

/*
bysort date age sex flow: egen count_flows = sum(lead_panlwt)
bysort date age flow: egen count_flows_age = sum(lead_panlwt)
bysort date age: egen count_total_age = sum(lead_panlwt)
gen sh_flows_age = count_flows_age / count_total_age
*/


* replace age by year of birth
gen yob = year - age

save "cps_flows.dta", replace

In [None]:
use "cps_flows.dta"

bysort date yob sex flow: egen count_flows = sum(lead_panlwt)
bysort date yob flow: egen count_flows_yob = sum(lead_panlwt)
bysort date yob: egen count_total_yob = sum(lead_panlwt)
gen sh_flows_yob = count_flows_yob / count_total_yob

keep if mish <= 2
bysort yob flow: egen mean_sh_flows_yob = mean(sh_flows_yob)
gen sh_delta_yob = sh_flows_yob - mean_sh_flows_yob 


collapse (max) sh_flows_yob = sh_flows_yob sh_delta_yob = sh_delta_yob, by (date yob flow)

keep if date >= date("1994-01-01", "YMD")

save "cps_flows_yob.dta", replace




variable yob not found


r(111);
r(111);




