**State the null and alternative hypothesis**

H0: Changing the power of the plasma beam has an effect on the etching rate by the machine.

H1: Changing the power of the plasma beam does not have an effect on the etching rate by the machine.

**What is the significance level?**

The significance level is considered: alpha = 0.05

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [3]:
data = pd.read_excel('anova_lab_data.xlsx', sheet_name='data_collected')
data

Unnamed: 0,Power,Etching Rate
0,160 W,5.43
1,180 W,6.24
2,200 W,8.79
3,160 W,5.71
4,180 W,6.71
5,200 W,9.2
6,160 W,6.22
7,180 W,5.98
8,200 W,7.9
9,160 W,6.01


In [13]:
data.columns =  ["power","etching_rate"]
data

Unnamed: 0,power,etching_rate
0,160 W,5.43
1,180 W,6.24
2,200 W,8.79
3,160 W,5.71
4,180 W,6.71
5,200 W,9.2
6,160 W,6.22
7,180 W,5.98
8,200 W,7.9
9,160 W,6.01


In [15]:
#Cleaning the column
data['power'] = data['power'].str.replace("W",'')

In [16]:
data['power']

0     160 
1     180 
2     200 
3     160 
4     180 
5     200 
6     160 
7     180 
8     200 
9     160 
10    180 
11    200 
12    160 
13    180 
14    200 
Name: power, dtype: object

In [17]:
#Chan
data['power'] = data['power'].astype(int)

In [20]:
data.describe()

Unnamed: 0,power,etching_rate
count,15.0,15.0
mean,180.0,6.782667
std,16.903085,1.228643
min,160.0,5.43
25%,160.0,5.845
50%,180.0,6.24
75%,200.0,7.725
max,200.0,9.2


In [19]:
data.groupby('power').agg(np.mean)

Unnamed: 0_level_0,etching_rate
power,Unnamed: 1_level_1
160,5.792
180,6.238
200,8.318


As a first insight, it can be seen that 200W may make a difference in etching rate. But it has to be analyzed statistically.

**What are the degrees of freedom of model, error terms, and total DoF?**

df = 3-1 = 2

total DoF (df2) = 15 - 3 = 12

error terms is The Error Mean Sum of Squares, which can be calculated by python.


In [21]:
# Using ANOVA as we have more than 2 samples:
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('etching_rate ~ C(power)',data=data).fit()
sm.stats.anova_lm(model)

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(power),2.0,18.176653,9.088327,36.878955,8e-06
Residual,12.0,2.95724,0.246437,,


df1 = 2 and df2 = 12, going to the table, the critical F is: 2.80680
considering that 0.000008 << 0.05 and 2.80 < 36.87, we reject the null hypothesis. So, changing the power, has an impact on etching rate. Let's see which power has an impact. As we have seen, power 200 may have a higher impact. so, let's compare this sample with two other samples using t-test. 

In [22]:
data.pivot(columns='power').describe()

Unnamed: 0_level_0,etching_rate,etching_rate,etching_rate
power,160,180,200
count,5.0,5.0,5.0
mean,5.792,6.238,8.318
std,0.319875,0.434304,0.669604
min,5.43,5.66,7.55
25%,5.59,5.98,7.9
50%,5.71,6.24,8.15
75%,6.01,6.6,8.79
max,6.22,6.71,9.2


Again here, the mean of power = 200 is higher while  sd is low. So, it is the best option to be compared with others.

So here we can define the null and alternative hypothesis again:

H0: power200 does affects the etching rate equal to power 160 and 180.

H1: power200 does does not affect the etching rate equal to power 160 and 180.

Furthermore, it is a righ-tail problem because we want to see if increasing the power will increase the etching rate.

In [23]:
from scipy.stats import ttest_ind

power_a = data[data['power'] == 200]['etching_rate']

for display in data['power'].unique():
    power_b = data[data['power'] == display]['etching_rate']
    print(display, ttest_ind(power_a, power_b))

160 Ttest_indResult(statistic=7.611403634613074, pvalue=6.237977344615716e-05)
180 Ttest_indResult(statistic=5.827496614588661, pvalue=0.0003926796476049085)
200 Ttest_indResult(statistic=0.0, pvalue=1.0)


**Summary:** Considering alpha = 0.05 and one-tail and df = 2, the critical t test statistics is: 2.92
so, we can see that compared to power160, power200 increases the etching rate significantly (we reject the null hypothesis: 7.6>2.92).
also, compared to power180, power200 increases the etching rate significantly (we reject the null hypothesis: 5.8>2.92