In [1]:
import pandas as pd
import scipy.stats as stats

## Part I

**Context**
Suppose you are working as an analyst in a microprocessor chip manufacturing plant. You have been given the task of analyzing a plasma etching process with respect to changing Power (in Watts) of the plasma beam. Data was collected and provided to you to conduct statistical analysis and check if changing the power of the plasma beam has any effect on the etching rate by the machine. You will conduct ANOVA and check if there is any difference in the mean etching rate for different levels of power.

**Statistical Test Setup**

Since we have one independent variable here (power) this is a one way ANOVA test


**Null Hypothesis**

There is no significant difference between the means of echting rate for the different power groups.


**Alternative Hypothesis**

The mean echting rate of at least one power group is different from the mean etching rate of the other groups.


**Level of Significance (alpha)**

We set the alpha to 0.05

In [2]:
data = pd.read_excel('anova_lab_data.xlsx')

In [3]:
data

Unnamed: 0,Power,Etching Rate
0,160 W,5.43
1,180 W,6.24
2,200 W,8.79
3,160 W,5.71
4,180 W,6.71
5,200 W,9.2
6,160 W,6.22
7,180 W,5.98
8,200 W,7.9
9,160 W,6.01


In ANOVA, degrees of freedom (DoF) are a measure of the amount of information in the data used to estimate the parameters of the model. There are three types of DoF used in ANOVA:
- Degrees of freedom of the model
- Degrees of freedom of error terms
- Total degrees of freedom

**Degrees of freedom of the model:** 

This is the number of parameters in the model that are estimated from the data. In one-way ANOVA, the model has one parameter for each group mean, so the degrees of freedom of the model is the number of groups minus one (df_model = k - 1, where k is the number of groups).

**Degrees of freedom of error terms:**

This is a measure of the amount of variation in the data that is not explained by the model. It represents the variability of scores within each group and is computed as the total number of observations minus the number of groups (df_error = N - k, where N is the total number of observations).

**Total degrees of freedom:**

This is the total amount of information in the data and is computed as the sum of the degrees of freedom of the model and the degrees of freedom of error terms (df_total = df_model + df_error).

Note, the **degrees of freedom of the model** and **error terms** are used to compute the **F statistic**, which is used to test the significance of the model.

In [4]:
data.columns = data.columns.str.strip()

In [5]:
dof_model = len(data['Power'].unique()) - 1
print('Degrees of freedom of the model:', dof_model)

Degrees of freedom of the model: 2


In [6]:
dof_error = data.shape[0] - len(data['Power'].unique())
print('Degrees of freedom of error terms:', dof_error)

Degrees of freedom of error terms: 12


In [7]:
print('Total degrees of freedom:', dof_model + dof_error)

Total degrees of freedom: 14


## Part II

In [8]:
echting_per_power_group = [ data['Etching Rate'][data['Power'] == p_type] for p_type in data['Power'].unique()]

In [9]:
len(echting_per_power_group)

3

In [10]:
statistic, pvalue = stats.f_oneway(*echting_per_power_group)

In [11]:
res = 'statistically significant' if pvalue < 0.05 else 'statistically insignificant'
print('statistic of', statistic, 'with', res, 'p value of', pvalue)

statistic of 36.87895470100505 with statistically significant p value of 7.506584272358903e-06


### Conclusion

Based on the pvalue of **0.000007506584272358903** which is smaller than our alpha we can reject the Null Hypothesis in favour of the Alternative Hypothesis. The mean echting rate of at least one power group is different compared to the other groups.   