In [2]:
# run this cell
from datascience import *
from pandas import read_stata
import numpy as np

import matplotlib
matplotlib.use('Agg', warn=False)
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)

# Pygrowup
We will install this program from the World Health Organization to use to help .  
Go to new & select terminal.
type code: `pip install --user pygrowup`

In [5]:
from pygrowup import Calculator
calculator = Calculator(adjust_height_data=False, adjust_weight_scores=False,
                       include_cdc=False, logger_name='pygrowup',
                       log_level='INFO')

ModuleNotFoundError: No module named 'pygrowup'

In [None]:
data = Table.read_table('***Data Set from Lab 2***')
data

# Anthropometric measures
We are going to look at data about growth.

The growth chart is particularly nice because instead of having to know what everyone's height should be at the different ages, we can compare across ages using one number.  Children grow as they get older, but by how much should they be growing?  These growth charts help us compare across ages.

![](Lab_3_Image_1.jpg)

<font color="blue"> Item 1: Using your data, make a scatter plot of height in centimeters and age in months for children under 5 years (60 months).  Do not include missing values (-99).
> Visually inspect the graph:  What is average height/length at 1 month?  How about at 24 months?  At 5 years?  
> How does the spread of heights change as children age?

In [None]:
# make a scatter plot

![](Lab_3_Image_2.jpg)

Is your graph similar to the above chart?  The above chart shows the ranges of healthly growth.  Similar to percentiles, z-scores are comparable across age.  The z-scores in the chart below were generated based on well-nourished & healthy children all over the world. 

<font color="blue"> Item 2: Calculate average height for girls age 0-2 months, 23-25 months, and 58-60 months.  Using the growth chart above, what are the average girl's z-scores for these ages?  Use the middle month when reading the chart (1 month, 2 years, 5 years).  At which age are children in your population doing the best, relative to healthy children around the world? 

In [None]:
girls = ***Filter the table to be just girls***
avg_0_to_2 = ***Average height for girls between 0 and 2 months***
avg_23_to_25 = ***Average height for girls between 23 and 25 months***
avg_58_to_60 = ***Average height for girls between 58 and 60 months***
(avg_0_to_2, avg_23_to_25, avg_58_to_60)

The World Health Organization has provided data a program to do these transformations for us. Keep only children under age 5 (60 months or less). Implement the pygrowup code for these children. 
<font color="Blue"> Item 3:  What variables does Pygrowup generate?

In [None]:
# Adjust missing values to work in Pygrowup.  
# Later we will drop them as they will calculate into impossible z-score values.
data ['missing value height']=1
data ['missing value months old']=59
data['Height']=[data['Height'][i] if data['Height'][i]>0 else data['missing value height'][i] for i in range(data.num_rows)]
data['Months Old']=[data['Months Old'][i] if data['Months Old'][i]<61 else data['missing value months old'][i] for i in range(data.num_rows)]
data=data.drop('missing value height', 'missing value months old')
valid_children=data 
valid_children.show(100)

In [None]:
#formula from Pygrowup
def compute_z_score(row):
    if row.item('Sex') == 1:
        sex = 'M'
    else:
        sex = 'F'
    return calculator.lhfa(row.item('height'), row.item('Months Old'), sex)

z_scores = valid_children.apply(compute_z_score)

# Now add a column "z_scores" to the 'valid_children' table

#After you've added the column, run this code on it.  This changes the data storage format to a friendlier one for our purposes.
valid_children['z_scores'] = valid_children.apply(float,'z_scores')

valid_children

Z-scores less than -4 and greater than 4 are implausible.  Replace these with -99.

<font color="Blue"> Item 4: Calculate average z-scores for each month.  Graph average z-score and age in months.  Do you notice any trends?

In [None]:
# also add a line of code to eliminate z-scores with missing values.

rounded_months = valid_children.apply(round, '***Column Name***')
valid_children['***Column Name***'] = rounded_months
max_month = int(max(rounded_months))

averages = []
for month in range(max_month):
    month_avg = float(np.mean(valid_children.where('***Column Name***', month)['***Column Name***']))
    averages.append(month_avg)

averages_table = Table().with_column('***Column Name***', averages).with_column('***Column Name***', range(max_month))
averages_table.plot('***Column Name***')

<font color="Blue"> Item 5: What percentage of your population is stunted?  (Height for age z-score less than -2)

<font color="Blue"> Item 6: Calculate average z-scores for boys and girls.  How do these compare?  If there is a difference, do you think these are a result of gender discrimination, or just coincidentally different?  How would you test this?

In [None]:
boy_average = 
girl_average = 
(boy_average, girl_average)

Still have time before class ends?  Repeat exercises 4-6 with Weight for Age and/or BMI for age.  What are the differences in these two measures?  What are the implications of these differences?

# Saving the zscores for future joining.
Your Lab 1 data has all the variables we worked with except the variables generated by Pygrowup.  The current data table you have been working with has the zscores and all the variables, but only for chidlren age 5 and under.  Let's save the zscore data in a way that later we can join it to the Lab 1 data.

Make a table that includes household id, individual id and the new variables.

### Joining
We want to join the tables, linking with household id and individual id.  However, these are currently two separate columns.  We need to combine them into a single Master Id.

We need to assign a unique value to each pair of the form `(<household id>, <individual id>)`. One way to do this is to construct a number of the form `<household id>0<individual_id>`. Here, the 0 acts as a separator, telling us where the household id ends and where the individual id begins. A method to get a number of this form is to use the following equation: $$id = household\_id * 10^n + individual\_id$$ where $n$ is larger than the number of digits in the largest individual id. (Exercise: think about why this is true). 

We have provided the rough code to compute this function and return a new table with the Master ID column. However, you need to fill out the value of n. Use the blank cell below to find out how many digits there are in the largest individual id in your roster.

In [None]:
max(roster_renamed["Individual ID"])

In [None]:
roster_renamed['Master ID']=roster_renamed['Household ID']*1000+roster_renamed['Individual ID']
roster_renamed

In [None]:
#Usage: <new_table> = append_master_id(<old_table>)
#Returns: a new table with column "Master ID" appended to the old table
#Make sure <old_table> has the columns "Household ID" and "Individual ID"

def append_master_id(table, household_id_label="Household ID", individual_id_label="Individual ID"):

    #Fill in value of n. Should be 1 more than the number of digits of the largest number in the Household ID column.
    n = 3
    
    household_col = table[household_id_label]
    individual_col = table[individual_id_label]
    master_col = []
    for household_id, individual_id in zip(household_col, individual_col):
        master_col.append(household_id * 1000 + individual_id)
    return table.with_column('Master ID', master_col)
    

zscores_renamed = append_master_id(zscores_renamed)
zscores_renamed
#employment_renamed = append_master_id(employment_renamed).drop(['Household ID', 'Individual ID'])

# for instructor reference, any table that does not contain both a household ID AND
# individual ID, like the 'household' table, will not work with this function - RW

<font color="Blue"> Item 7: Confirm that there is now a unique id for each observation.  Also, add one more cleaing check to make sure your dataset includes all the children under 5.  Tell me what code you used.

Remove the columns Household ID and Indivdiual ID from the table - we don't need them anymore since we have the masterid.


Save your data as Lab_3.csv.

# Backup your notebooks!
Your labs build on each other.  If the server goes down, your work will be lost! Download all your labs to your computer to save a backup.