## Proportion of Education
This is a function called `proportion_of_education` which returns the proportion of children in the dataset who had a mother with the education levels equal to less than high school (<12), high school (12), more than high school but not a college graduate (>12) and college degree.

In [1]:
import pandas as pd
def proportion_of_education():
    df = pd.read_csv('NISPUF17.csv')
    edu = df[['EDUC1']]
    g1 = edu.where(edu['EDUC1'] == 1).dropna()
    g2 = edu.where(edu['EDUC1'] == 2).dropna()
    g3 = edu.where(edu['EDUC1'] == 3).dropna()
    g4 = edu.where(edu['EDUC1'] == 4).dropna()
    total = len(edu)
    prop1 = len(g1)/total
    prop2 = len(g2)/total
    prop3 = len(g3)/total
    prop4 = len(g4)/total
    result = {"less than high school": prop1, "high school": prop2,
        "more than high school but not college": prop3,
        "college": prop4}
    return result

## Breastmilk fed and seasonal influenza vaccine injection
This function will look into the relationship between being fed breastmilk as a child and getting a seasonal influenza vaccine from a healthcare provider, and returns a tuple of the average number of influenza vaccines for those children we know received breastmilk as a child and those who know did not.

In [None]:
import pandas as pd
def average_influenza_doses():
    df = pd.read_csv('NISPUF17.csv')
    breastmilk = df['CBF_01']
    VacNum = df['P_NUMFLU']
    temp = df[['CBF_01', 'P_NUMFLU']]
    yesvac = temp.where(temp['CBF_01'] == 1).dropna()
    a = yesvac['P_NUMFLU'].sum()/len(yesvac)
    novac = temp.where(temp['CBF_01'] == 2).dropna()
    b = novac['P_NUMFLU'].sum()/len(novac)
    return (a, b)

## Link between vaccine effectiveness and sex of the child
It would be interesting to see if there is any evidence of a link between vaccine effectiveness and sex of the child. This function calculates the ratio of the number of children who contracted chickenpox but were vaccinated against it (at least one varicella dose) versus those who were vaccinated but did not contract chicken pox. Return results by sex.

This function returns a dictionary.

In [None]:
import pandas as pd
def chickenpox_by_sex():
    df = pd.read_csv('NISPUF17.csv')
    df = df[['SEX', 'HAD_CPOX', 'P_NUMVRC']]
    male = df.where((df['SEX'] == 1) & (df['P_NUMVRC'] > 0)).dropna()
    female = df.where((df['SEX'] == 2)& (df['P_NUMVRC'] > 0)).dropna()

    Ymale = male.where(male['HAD_CPOX'] == 1).dropna()
    Nmale = male.where(male['HAD_CPOX'] == 2).dropna()
    Yfemale = female.where(female['HAD_CPOX'] == 1).dropna()
    Nfemale = female.where(female['HAD_CPOX'] == 2).dropna()
    a = len(Ymale)/len(Nmale)
    b = len(Yfemale)/len(Nfemale)
    return {"male":a, "female":b}

## Correlation between having had the chicken pox and the number of chickenpox vaccine doses given (varicella).
This function is to see if there is a correlation between having had the chicken pox and the number of chickenpox vaccine doses given (varicella).

Some notes on interpreting the answer. The had_chickenpox_column is either 1 (for yes) or 2 (for no), and the num_chickenpox_vaccine_column is the number of doses a child has been given of the varicella vaccine. A positive correlation (e.g., corr > 0) means that an increase in had_chickenpox_column (which means more no’s) would also increase the values of num_chickenpox_vaccine_column (which means more doses of vaccine). If there is a negative correlation (e.g., corr < 0), it indicates that having had chickenpox is related to an increase in the number of vaccine doses.

Also, pval is the probability that we observe a correlation between had_chickenpox_column and num_chickenpox_vaccine_column which is greater than or equal to a particular value occurred by chance. A small pval means that the observed correlation is highly unlikely to occur by chance. In this case, pval should be very small (will end in e-18 indicating a very small number).

This isn’t really the full picture, since we are not looking at when the dose was given. It’s possible that children had chickenpox and then their parents went to get them the vaccine.

In [None]:
def corr_chickenpox():
    import scipy.stats as stats
    import numpy as np
    import pandas as pd
    df = pd.read_csv('NISPUF17.csv')
    df = df[['HAD_CPOX', 'P_NUMVRC']].dropna()
    df = df[df['HAD_CPOX'] <3]
    corr, pval=stats.pearsonr(df["HAD_CPOX"],df["P_NUMVRC"])
    return corr 