**Research question:** Does changing the power of the plasma beam effect the etching rate?


**Null Hypothesis (H0):** There's no significant difference in the mean etching rate for different levels of power.


**Alternative Hypothesis (Ha):** There is a significant difference in the mean etching rate for different levels of power.

**Significance level:** α = 0.05 (there's a 5% chance of rejecting the null hypothesis when it is true)

**ANOVA:** We will use the ols model from the statsmodel.api library to calculate the F-statistic and associated p-value.

**Interpret the results:** If ANOVA is statistically significant (p-value smaller than α),
we reject the null hypothesis and conclude that there is a significant difference in the mean etching rate for different levels of power.

**Degrees of freedom:** The number of degrees of freedom of the model is k-1, with k = number of power levels being analyzed.
The degrees of freedom for the error term is n-k, where n = total number of observations. 
The total degrees of freedom is n-1.

In [9]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import statsmodels.api as sm
from statsmodels.formula.api import ols
from scipy.stats import ttest_ind

In [2]:
data=pd.read_excel('/Users/pedro/Desktop/Ironhack/Unit 07/Labs/lab-inferential-statistics-anova/files_for_lab/anova_lab_data.xlsx')

In [3]:
data=data.rename(columns={'Power ':'power','Etching Rate': 'etching_rate'})

In [4]:
data

Unnamed: 0,power,etching_rate
0,160 W,5.43
1,180 W,6.24
2,200 W,8.79
3,160 W,5.71
4,180 W,6.71
5,200 W,9.2
6,160 W,6.22
7,180 W,5.98
8,200 W,7.9
9,160 W,6.01


In [5]:
data.groupby('power')['etching_rate'].mean()

power
160 W    5.792
180 W    6.238
200 W    8.318
Name: etching_rate, dtype: float64

In [6]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('etching_rate ~ C(power)', data=data).fit()
sm.stats.anova_lm(model)

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(power),2.0,18.176653,9.088327,36.878955,8e-06
Residual,12.0,2.95724,0.246437,,


- We have a P-value of 8x10^-6 that is lower than our significance level of 0.05.
- We can conclude that at least one of powers variances resulted in different mean from the others.
- We can apply t-test in each pair (we have 3 pairs) to make check if the power 200 W has more efficiency.

In [8]:
data.pivot(columns='power').describe().head(3)

Unnamed: 0_level_0,etching_rate,etching_rate,etching_rate
power,160 W,180 W,200 W
count,5.0,5.0,5.0
mean,5.792,6.238,8.318
std,0.319875,0.434304,0.669604


In [17]:
power_a = data[data['power'] == '180 W']['etching_rate']

for power in data['power'].unique():
    power_b = data[data['power'] == power]['etching_rate']
    print(power, ttest_ind(power_a, power_b))

160 W Ttest_indResult(statistic=1.84892009935179, pvalue=0.10164495449539465)
180 W Ttest_indResult(statistic=0.0, pvalue=1.0)
200 W Ttest_indResult(statistic=-5.827496614588661, pvalue=0.0003926796476049085)


We can conclude that the more power, the more efficiency it gets (within the range analysed).