<a href="https://colab.research.google.com/github/kofibeans/APM1111-FA6/blob/main/Aquino_APM1111_FA6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **APM1111 Formative Assessment 6**

**Instruction**

Provide an R or Python (with markdown) file for the independent samples t-test for Invisibility Cloak data set.

*   The file should have all the codes/and or explanations from Assumption 1 to 5 and computation.
*   Present the results of the analysis by making a short report.

### **Import Library and Dataset**
Import couple of Python library required for the analysis and create the dataset.

In [None]:
# import library
import pandas as pd # data processing, csv file i/o (e.g. pd.read_csv)
import scipy.stats as stats # statistical operations

from IPython.display import display # visualization tool

!pip install researchpy
import researchpy as rp # statistical operations

Collecting researchpy
  Downloading researchpy-0.3.5-py3-none-any.whl (33 kB)
Installing collected packages: researchpy
Successfully installed researchpy-0.3.5


In [None]:
# create the dataset
data = {
    'Participant': list(range(1, 25)),
    'Cloak': [0]*12 + [1]*12,
    'Mischief': [3, 1, 5, 4, 6, 4, 6, 2, 0, 5, 4, 5, 4, 3, 6, 6, 8, 5, 5, 4, 2, 5, 7, 5]
}

df = pd.DataFrame(data)
df.style

Unnamed: 0,Participant,Cloak,Mischief
0,1,0,3
1,2,0,1
2,3,0,5
3,4,0,4
4,5,0,6
5,6,0,4
6,7,0,6
7,8,0,2
8,9,0,0
9,10,0,5


### **Assumption Check**
Before the t-test can be interpreted, one needs to test the assumptions to consider the results as valid.

**Assumption 1 - Normality**

Check the normality assumption using Shapiro-Wilk test for each group separately. This can be completed using the `shapiro()` method from scipy.stats.



In [None]:
# calculate the data
group_0 = df[df['Cloak'] == 0]['Mischief']
group_1 = df[df['Cloak'] == 1]['Mischief']

shapiro_0 = stats.shapiro(group_0)
shapiro_1 = stats.shapiro(group_1)

# print the results
print("Shapiro-Wilk test for normality:")
print("Group 0: p-value =", shapiro_0.pvalue)
print("Group 1: p-value =", shapiro_1.pvalue)

if shapiro_0.pvalue and shapiro_1.pvalue > 0.05:
    print("\nAns. The data is normally distributed.")
else:
    print("\nAns.: The data is not normally distributed.")

Shapiro-Wilk test for normality:
Group 0: p-value = 0.23144638538360596
Group 1: p-value = 0.9361882209777832

Ans. The data is normally distributed.


**Assumption 2 - Homogeneity of variances**

Check the homogeneity of variances assumption using Levene's test. This can be completed using the `levene()` method from scipy.stats.

In [None]:
# calculate
levene = stats.levene(group_0, group_1)

# print the results
print("Levene's test for homogeneity of variances: p-value = ", levene.pvalue)

if levene.pvalue > 0.05:
  print("\nAns. The variances are equal.")
else:
  print("\nAns. The variances are not equal.")

Levene's test for homogeneity of variances: p-value =  0.6086825890641905

Ans. The variances are equal.


**Assumption 3 - Independent Samples**

This assumption is assumed to be met if the data is collected from two different groups.

> Ans. The samples are independent.


**Assumption 4 - Random Sampling**

This assumption is assumed to be met if the data is collected randomly from the population.

> Ans. The data is randomly sampled.

**Assumption 5 - Random Sampling**

This assumption is assumed to be met if the data is measured on an interval or ratio scale

> Ans. The data is measured on an interval or ratio scale.

### **Independent T-Test Using ResearchPy**
Calculate the T-test for the means of two independent samples of the Invisibility Cloak Test using the `rp.ttest()` from ResearchPy.

This method returns two data frames, one that contains the summary statistical information and the other that contained the statistical test information.

In [None]:
# summary and results information
summary, results = rp.ttest(df['Cloak'], df['Mischief'])

# print the summary
summary.style

  groups = group1.append(group2, ignore_index= True)


Unnamed: 0,Variable,N,Mean,SD,SE,95% Conf.,Interval
0,Cloak,24.0,0.5,0.510754,0.104257,0.284328,0.715672
1,Mischief,24.0,4.375,1.860633,0.3798,3.589323,5.160677
2,combined,48.0,2.4375,2.378147,0.343256,1.746958,3.128042


In [None]:
# print the results
results.style

Unnamed: 0,Independent t-test,results
0,Difference (Cloak - Mischief) =,-3.875
1,Degrees of freedom =,46.0
2,t =,-9.8388
3,Two side test p value =,0.0
4,Difference < 0 p value =,0.0
5,Difference > 0 p value =,1.0
6,Cohen's d =,-2.8402
7,Hedge's g =,-2.7936
8,Glass's delta1 =,-7.5868
9,Point-Biserial r =,-0.8233


**Presenting the results**

The data from Mischief was significantly greater for the participants (*M* = 4.375, *SD* = 1.8, *n* = 24) than from Cloak which is (*M* = 0.5, *SD* = 0.51, *n* = 24), *t*(46) = -9.84, p = 6.807743389157861e-13, d = -2.84, 95% CI [-7.52, -0.23]

### **Independent T-Test Using Scipy.Stats**

Calculate the T-test for the means of two independent samples of the Invisibility Cloak Test using the `stats.ttest_ind()` from Scipy.Stats.

This method conducts the independent sample t-test and returns only the t test statistic and it's associated p-value.

In [None]:
# p-value, t-statistics result
t_statistic, p_value = stats.ttest_ind(df['Cloak'], df['Mischief'])

# print the results
print("t-statistic: ", t_statistic)
print("p-value: ", p_value)

t-statistic:  -9.838773290101495
p-value:  6.807743389157861e-13


**Presenting the results**

In [None]:
# compute the effect size (cohen's d)
n_0 = len(df['Cloak'])
n_1 = len(df['Mischief'])
mean_diff = df['Cloak'].mean() - df['Mischief'].mean()
pooled_std = ((n_0 - 1) * df['Cloak'].var() + (n_1 - 1) * df['Mischief'].var()) / (n_0 + n_1 - 2)
effect_size = mean_diff / pooled_std

# print the result
print("The data from Mischief was significantly greater for the participants (M = {:.2f}, SD = {:.2f}, n = {})".format(df['Mischief'].mean(), df['Mischief'].std(), len(df['Mischief'])))
print("than from Cloak which is (M = {:.2f}, SD = {:.2f}, n = {})".format(df['Cloak'].mean(), df['Cloak'].std(), len(df['Cloak'])))
print("t({}) = {:.2f}, p = {}, d = {:.2f}, 95% CI [{:.2f}, {:.2f}]".format(len(df['Cloak']) + len(df['Mischief']) - 2, t_statistic, p_value, effect_size, mean_diff - 1.96 * pooled_std, mean_diff + 1.96 * pooled_std))

The data from Mischief was significantly greater for the participants (M = 4.38, SD = 1.86, n = 24)
than from Cloak which is (M = 0.50, SD = 0.51, n = 24)
t(46) = -9.84, p = 6.807743389157861e-13, d = -2.08, 95% CI [-7.52, -0.23]
