# Two way ANOVA


- As an illustration of a two-factor factorial experiment, we will consider a
study involving the Common Admission test (CAT), a standardized test
used by graduate schools of business to evaluate an applicant’s ability to
pursue a graduate program in that field.
- Scores on the CAT range from 200 to 800, with higher scores implying
higher aptitude.

- In an attempt to improve students’ performance on the CAT, a major
university is considering offering the following three CAT preparation
programs.
    1. A three-hour review session covering the types of questions generally
asked on the CAT.
    1. A one-day program covering relevant exam material, along with the taking
and grading of a sample exam.
    1. An intensive 10-week course involving the identification of each student’s
weaknesses and the setting up of individualized programs for
improvement.
### Factor -1 , 3 treatment
- One factor in this study is the CAT preparation program, which has three
treatments:
    - Three-hour review,
    - One-day program, and
    - 10-week course.
- Before selecting the preparation program to adopt, further study will be
conducted to determine how the proposed programs affect CAT scores.




### Factor - 2, 3 Treatment

- The CAT is usually taken by students from three colleges:
- the College of Business,
- the College of Engineering, and
- the College of Arts and Sciences.
- Therefore, a second factor of interest in the experiment is whether a
student’s undergraduate college affects the CAT score.
- This second factor, undergraduate college, also has three treatments:
    - Business,
    - Engineering, and
    - Arts and sciences.


In [2]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [5]:
df = pd.read_excel("/data/github/data analytics/data files/CAT_data.xlsx")

In [6]:
df


Unnamed: 0,prep_pro,college,value
0,three hour review,business,500
1,three hour review,business,580
2,three hour review,engineering,540
3,three hour review,engineering,460
4,three hour review,arts and sciences,480
5,three hour review,arts and sciences,400
6,one day program,business,460
7,one day program,business,540
8,one day program,engineering,560
9,one day program,engineering,620


In [8]:
model = ols('value ~ C(prep_pro) + C(college) + C(prep_pro):C(college)', data = df).fit()
model.summary()

0,1,2,3
Dep. Variable:,value,R-squared:,0.759
Model:,OLS,Adj. R-squared:,0.545
Method:,Least Squares,F-statistic:,3.548
Date:,"Sat, 14 Mar 2020",Prob (F-statistic):,0.0384
Time:,12:07:26,Log-Likelihood:,-88.591
No. Observations:,18,AIC:,195.2
Df Residuals:,9,BIC:,203.2
Df Model:,8,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,445.0000,33.208,13.400,0.000,369.878,520.122
C(prep_pro)[T.one day program],5.0000,46.963,0.106,0.918,-101.238,111.238
C(prep_pro)[T.three hour review],-5.0000,46.963,-0.106,0.918,-111.238,101.238
C(college)[T.business],135.0000,46.963,2.875,0.018,28.762,241.238
C(college)[T.engineering],145.0000,46.963,3.088,0.013,38.762,251.238
C(prep_pro)[T.one day program]:C(college)[T.business],-85.0000,66.416,-1.280,0.233,-235.244,65.244
C(prep_pro)[T.three hour review]:C(college)[T.business],-35.0000,66.416,-0.527,0.611,-185.244,115.244
C(prep_pro)[T.one day program]:C(college)[T.engineering],-5.0000,66.416,-0.075,0.942,-155.244,145.244
C(prep_pro)[T.three hour review]:C(college)[T.engineering],-85.0000,66.416,-1.280,0.233,-235.244,65.244

0,1,2,3
Omnibus:,16.43,Durbin-Watson:,2.984
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2.333
Skew:,0.0,Prob(JB):,0.311
Kurtosis:,1.236,Cond. No.,13.9


In [9]:
anova_table = sm.stats.anova_lm(model, typ=2)
anova_table

Unnamed: 0,sum_sq,df,F,PR(>F)
C(prep_pro),6100.0,2.0,1.382872,0.299436
C(college),45300.0,2.0,10.269521,0.004757
C(prep_pro):C(college),11200.0,4.0,1.269521,0.350328
Residual,19850.0,9.0,,
