# The Career Decisions of Young Men

This notebook processes and explores the estimation sample used by Michael Keane and Kenneth Wolpin to study the career decisions of young men. 

> Keane, M. P. and Wolpin, K. I. (1997). [The career decisions of young men](http://www.journals.uchicago.edu/doi/10.1086/262080). *Journal of Political Economy*, 105(3), 473-522.

The sample is based on the [National Longitudinal Survey of Youth 1979 (NLSY79)](https://www.bls.gov/nls/nlsy79.htm) and available to download [here](https://github.com/structDataset/career_decisions_data/blob/master/KW_97.raw). 

## Preparations

We first peform some basic preparations on the original dataset to ease further processing.

In [5]:
import pandas as pd
import numpy as np

columns = ['Identifier', 'Age', 'Schooling', 'Choice', 'Wage']
dtype = {'Identifier': np.int, 'Age': np.int,  'Schooling': np.int,  'Choice': 'category'}

df = pd.DataFrame(np.genfromtxt('KW_97.raw'), columns=columns).astype(dtype)
df.set_index(['Identifier', 'Age'], inplace=True, drop=False)
df["Choice"].cat.categories = ['Schooling', 'Home', 'White', 'Blue', 'Military']

## Basic Descriptives

We simply reproduce some basic descriptive statistics from the paper.

### Choice Probabilities

We reproduce the choice probabilities reported in Tabel 1.

In [6]:
# Produce the raw table
table_1 = pd.crosstab(index=df.Age, columns=df.Choice, margins=True)
## Produce frequencies
#table_1_rel = table_1.div(table_1.All, axis=0) * 100

Defaulting to column but this will raise an ambiguity error in a future version
  grouped = data.groupby(keys)
Defaulting to column but this will raise an ambiguity error in a future version
  margin = data[rows + values].groupby(rows).agg(aggfunc)


In [7]:
table_1

Choice,Schooling,Home,White,Blue,Military,All
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16,1178,145,4,45,1,1373
17,1014,197,15,113,20,1359
18,561,296,92,331,70,1350
19,420,293,115,406,107,1341
20,341,273,149,454,113,1330
21,275,257,170,498,106,1306
22,169,212,256,559,90,1286
23,105,185,336,546,68,1240
24,65,112,284,416,44,921
25,24,61,215,267,24,591


In [8]:
table_1_rel

Choice,Schooling,Home,White,Blue,Military,All
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16,85.797524,10.560816,0.291333,3.277495,0.072833,100.0
17,74.613687,14.495953,1.103753,8.314937,1.47167,100.0
18,41.555556,21.925926,6.814815,24.518519,5.185185,100.0
19,31.319911,21.849366,8.57569,30.275913,7.97912,100.0
20,25.639098,20.526316,11.203008,34.135338,8.496241,100.0
21,21.056662,19.678407,13.016845,38.1317,8.116386,100.0
22,13.141524,16.485226,19.906687,43.468118,6.998445,100.0
23,8.467742,14.919355,27.096774,44.032258,5.483871,100.0
24,7.057546,12.160695,30.836048,45.168295,4.777416,100.0
25,4.060914,10.321489,36.379019,45.177665,4.060914,100.0


### Average Real Wages

We reproduce the average real wages by occupation reported in Table 4 of the original paper. The tables here share the same information.

In [9]:
table_4_mean = pd.crosstab(index=df.Age, columns=df.Choice, values=df.Wage, aggfunc='mean', margins=True)
table_4_mean = table_4_mean[['All', 'White', 'Blue', 'Military']]

Defaulting to column but this will raise an ambiguity error in a future version
  grouped = data.groupby(keys)
Defaulting to column but this will raise an ambiguity error in a future version
  margin = data[rows + values].groupby(rows).agg(aggfunc)


In [10]:
table_4_mean

Choice,All,White,Blue,Military
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
16,10217.740418,9320.762,10286.738758,
17,11036.597108,10049.757071,11572.887893,9005.362615
18,12060.746019,11775.341338,12603.820833,10171.86815
19,12246.684578,12376.418072,12949.838227,9714.600108
20,13635.869637,13824.013219,14363.658471,10852.506971
21,14977.004406,15578.139155,15313.451473,12619.374667
22,17561.28014,20236.075551,16947.904935,13771.555541
23,18719.83594,20745.564706,17884.949782,14868.653698
24,20942.417442,24066.635884,19245.185944,15910.839514
25,22754.544937,24899.227802,21473.314696,17134.463455


In [11]:
table_4_count = pd.crosstab(index=df.Age, columns=df.Choice, values=df.Wage, aggfunc='count', margins=True).drop(['Schooling', 'Home'], axis=1)
table_4_count = table_4_count[['All', 'White', 'Blue', 'Military']]

Defaulting to column but this will raise an ambiguity error in a future version
  grouped = data.groupby(keys)
Defaulting to column but this will raise an ambiguity error in a future version
  margin = data[rows + values].groupby(rows).agg(aggfunc)


In [12]:
table_4_count

Choice,All,White,Blue,Military
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
16,28.0,2.0,26.0,0.0
17,102.0,14.0,75.0,13.0
18,377.0,71.0,246.0,60.0
19,507.0,97.0,317.0,93.0
20,587.0,128.0,357.0,102.0
21,657.0,142.0,419.0,96.0
22,764.0,214.0,476.0,74.0
23,833.0,299.0,481.0,53.0
24,667.0,259.0,373.0,35.0
25,479.0,207.0,250.0,22.0
