In [None]:
#Run this cell
from datascience import *
from pandas import read_stata
import numpy as np
from pygrowup import Calculator

import matplotlib
matplotlib.use('Agg', warn=False)
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)

# Education of Parents 

## Data Prep

We are going to use some new variables in this lab: Mother & Father ID and Education Variables.
Import your full data set, select these variables, and rename them.
Also be sure to keep your hhid & personid variables.

In [None]:
# %cd ~/Child-Dev-2019/Data
# %ls
allvars = Table.read_table('AllJoined.csv')
allvars

In [None]:
new_vars = ['hhid', 'personid', '...', '...', '...']
selected_new_vars = allvars.select(new_vars)
selected_new_vars

In [None]:
names_4_new_vars = ['Household ID', 'Individual ID', 'Sex', 'Age in Years', 'Number of Months', 'Day of Birth', 'Month of Birth', 'Year of Birth', 'Weight', 'Height','Relationship to HH Head', 'Mother ID', 'Father ID']

selected_renamed = selected_new_vars.relabel(new_vars, names_4_new_vars)
selected_renamed

Remove the nans by transforming into a dataframe & back again

In [None]:
dataframe = selected_renamed.to_df()
altered = dataframe.fillna(-99)
selected_renamed = Table.from_df(altered)

Use the code from last week to create a master id for joining.

In [None]:
generate uniqueid=hhid*1000+individualid

Now pull in the data from the end of Lab 4.  Join it with the new data.

In [None]:
lab4 = Table.read_table('Lab4.csv')
generate uniqueid=hhid*...+individualid
# do the left join!
data=selected_renamed.join......

Check to make sure you have the same size roster as when you started.

## Diving into the data

This week we will look at specific information about parents. Examine your data and find a household with a child that has both a mother and a father living in the household.  This means both Mother ID and Father ID are not empty.  Just looking at this one family, can you determine mother's age and father's age? How about mother's education and father's education? Hint: Be sure to use the correct education variable (likely 'Years of Schooling' rather than 'Currently Enrolled in School'.)  

Challenge yourself and try this activity with the kids who are living in the largest household in your data.

In [None]:
largest_hhold = data.group('Household ID').sort('count',descending = True).column('Household ID').item(0)
data.where(np.logical_and(data['Mother ID'] >= 0, data['Father ID'] >= 0)).where('Household ID', largest_hhold)

### Additional Preparation

You probably have two or more schooling variables.  Combine them into a single column "Education" to simplify your data.  

Note: some data sets will have two variables for level of schooling and year.  For example, level is high school & years is 3. This person's years of schooling should be 11.  The code below illustrates what transformation you might have to do.  Check your questionnaire!  Recode this section based on your questions & your country's questionnaire!

In [None]:
def yearsextra(x):
    if x==1: 
        return 0
    elif x==2: 
        return 0
    elif x==3: 
        return 6
    elif x==4: 
        return 9  
    elif x==5: 
        return 12 
    elif x==7: 
        return 12
    elif x==6: 
        return 14
    elif x==8: 
        return 6
    else:
        return -99
  
data['Extra Years']=data.apply(yearsextra, 'Years of Schooling - Level')
data['Education']=data['Years of Schooling - Years']+data['Extra Years']
data['Education']= [data['Education'][i] if data['Education'][i] > 0 else data['Currently Enrolled in School'][i] for i in range(data.num_rows)]
data

We can do a quick check that the schooling makes sense and that people do not have more years of education than years of life.  
<font color="Blue"> Item 1: Do a scatter plot comparing Age in years to Education (without the -99 values).  Identify the observations that are impossible.

In [None]:
data.where(data['Education']>0).where('Age in Years', are.below(20)).scatter('Education', 'Age in Years')

Now that your Years of Education data looks good, drop the other education columns so you don't get confused ('Extra Years', 'Years of Schooling - Years', 'Years of Schooling - Level').  Keep 'Currently Enrolled'.

In [None]:
data=data.drop('Extra Years', 'Years of Schooling - Level', 'Years of Schooling - Years')
data
data.num_rows

We will now join in information about mothers.

In [None]:
# Make a Mother Master ID: 
data['Mother Master ID']=data['Household ID']*1000+data['Mother ID']

# Make a table of information we want about mothers
data_on_moms=data.select('Household ID', 'Individual ID', 'Age in Years', 'Education')
data_on_moms=data_on_moms.relabel(['Age in Years', 'Education'],["Mom's Age", "Mom's Education"])

# Make a Mother Master ID: 
data_on_moms['Mother Master ID']=data['Household ID']*1000+data['Individual ID']

# Create a table with rows all -99 for children who do not have mothers in the household
data_on_moms_small=data_on_moms.group("Household ID")
data_on_moms_small=data_on_moms_small.select("Household ID")
data_on_moms_small['Individual ID']=-99
data_on_moms_small["Mom's Age"]=-99
data_on_moms_small["Mom's Education"]=-99
data_on_moms_small['Mother Master ID']=data_on_moms_small['Household ID']*1000+data_on_moms_small['Individual ID']

#append households with no mother IDs to created mother IDs
data_on_moms=data_on_moms.append(data_on_moms_small)
data_on_moms=data_on_moms.drop('Household ID', 'Individual ID')
data_on_moms

# Join the mother information to the child using the Mother Master ID
data=data.join('Mother Master ID', data_on_moms, 'Mother Master ID')
data=data.drop('Mother Master ID')
data.show(100)


Sometimes joining brings in extra lines of data.  Make sure you still have the same number of people that you started out with in the original table.

Examine your data.  Do you now have information about mothers for children who have mothers?  Does it make sense? (i.e. generally were children born when their mother was older than age 15.)  Make sure you do not have information about mothers for individuals that do not have mothers in the household.

<font color="Blue"> Item 2: Make a new variable: Mother's age when child was born.  What is the youngest age at which a mother had a baby in your data set?

Do the same for fathers: Dad's Age, Dad's Education

In [None]:
# Make a Father Master ID: 
data['Father Master ID']=data['Household ID']*1000+data['Father ID']

# Make a table of information we want about fathers
data_on_dads=data.select('Household ID', 'Individual ID', 'Age in Years', 'Education')
data_on_dads=data_on_dads.relabel(['Age in Years', 'Education'],["Dad's Age", "Dad's Education"])

# Make a Father Master ID: 
data_on_dads['Father Master ID']=data['Household ID']*1000+data['Individual ID']

# Create a table with rows all -99 for children who do not have fathers in the household
data_on_dads_small=data_on_dads.group("Household ID")
data_on_dads_small=data_on_dads_small.select("Household ID")
data_on_dads_small['Individual ID']=-99
data_on_dads_small["Dad's Age"]=-99
data_on_dads_small["Dad's Education"]=-99
data_on_dads_small['Father Master ID']=data_on_dads_small['Household ID']*1000+data_on_dads_small['Individual ID']

#append households with no mother IDs to created father IDs
data_on_dads=data_on_dads.append(data_on_dads_small)
data_on_dads=data_on_dads.drop('Household ID', 'Individual ID')
data_on_dads

# Join the father information to the child using the Father Master ID
data=data.join('Father Master ID', data_on_dads, 'Father Master ID')
data=data.drop('Father Master ID')
data.show(100)

Save your data.

 <font color="Blue"> Item 2: Who is more educated - mothers or fathers?  Make a scatter plot of mothers education & fathers education. Explain how the graph supports your conlcusion.

## Let's examine if parent's education is associated with child outcomes.

One theory suggests mother's eduction is better correlated with nutrition than father education.  <font color="Blue"> Item 3:  Make a scatter plot of z-scores (from Lab 4) by father's education level.  Make another by mother's education level. What is  your impression? (Be sure to only graph the children and eliminate the missing values.)

<font color="Blue"> Item 4: Let's see if we get a clearer picture by using mean: Calculate mean z-scores by mothers' and fathers' education levels.  Graph these and comment on your findings. 

Z-scores are just for kids age 5 and under.  Let's check out how the older kids are doing. 
<font color="Blue"> Item 5:  Calculate percent of 12-15 year olds currently enrolled in school by mother's and father's education level.

<font color="Black"> Hint: After confirming enrollment values, make a small table only including children of those ages and then pivot enrollment around mother's and father's education levels.

## Big challenge!
Can you determine the age and education of grandparents?  Specifically, maternal grandmother, maternal grandfather, paternal grandmother, and paternal grandfather.  Hint: what if earlier you also included Mother's Mother's ID in the mother data base used for joining.  Again, try looking at the largest family to see how this works before attempting the coding.