# Research Question:
## Was there a statistically significant difference between the mean age of children who survived $(\bar{x}_{survivingChildrenAge})$ and the mean age of children who did not survive $(\bar{x}_{nonSurvivingChildrenAge})$? What would be the effect size?

$$H_A : \bar{x}_{survivingChildrenAge} =  \bar{x}_{nonSurvivingChildrenAge}$$

$$H_0: \bar{x}_{survivingChildrenAge} \neq  \bar{x}_{nonSurvivingChildrenAge}$$

In [5]:
#Import suseful packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from scipy import stats
%matplotlib inline

#Read data
titanic_dataframe = pd.read_csv('../Data/train.csv')

# Children dataframes separated by survival
surviving_children_dataframe = titanic_dataframe.loc[(titanic_dataframe['Age'] <= 12) & (titanic_dataframe['Survived'] == 1)]
non_surviving_children_dataframe = titanic_dataframe.loc[(titanic_dataframe['Age'] <= 12) & (titanic_dataframe['Survived'] == 0)]

In [7]:
def difference_of_means_test(data1, data2, tails):
    n1 = len(data1)
    n2 = len(data2)
    x1 = np.mean(data1)
    x2 = np.mean(data2)
    
    s1 = np.std(data1,ddof=1)
    s2 = np.std(data2,ddof=1)
    
    SE = np.sqrt(s1**2/n1 + s2**2/n2)
    Tscore = np.abs((x2-x1)) / SE
    df = min(n1,n2) - 1
    pvalue = tails * stats.t.cdf(-Tscore,df)
    SDpooled = np.sqrt((s1**2*(n1-1) + s2**2*(n2-1)) / (n1+n2-2))
    Cohensd = (x2 - x1) / SDpooled
    
    print('p-value = ', pvalue)
    print('Cohen\'s d = ', Cohensd)
    

In [8]:
difference_of_means_test(surviving_children_dataframe['Age'], non_surviving_children_dataframe['Age'], 2)

p-value =  0.0144683553951
Cohen's d =  0.653303382193


** Means are statistically significance since p-value is less than $\alpha = 0.05$** 

## Analyzing practical significance
Cohens d in the previosu step stands for the measure of the effect size i.e., practical significance.

The table below contains descriptors for magnitudes of d = 0.01 to 2.0, as initially suggested by Cohen and expanded by Sawilowsky.

Effect size | d | Reference
--- | --- | ---
Very small | 0.01 | Sawilowsky, 2009
Small | 0.20 | Cohen, 1988
Medium | 0.50 | Cohen, 1988
Large | 0.80 | Cohen, 1988
Very large | 1.20 | Sawilowsky, 2009
Huge | 2.0 | Sawilowsky, 2009

**Effect sisze $(d = 0.65)$ is from *Medium* to *Large*, according to the classification above **