In [1]:
import pandas as pd
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import seaborn as sns

### One tailed t-test ###

In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file files_for_lab/machine.txt. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other?

In [None]:
### H0: Both machines pack equally fast
### H1: The new machine packs faster 

In [18]:
data = pd.read_csv("files_for_lab/machine.txt", encoding = "utf-16", sep = "\t", names = ["New", "Old"], header=0)

In [23]:
data.head()

Unnamed: 0,New,Old
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5


In [22]:
st.ttest_ind(data["New"], data["Old"], alternative = "less")

Ttest_indResult(statistic=-3.3972307061176026, pvalue=0.0016055712503872579)

In [None]:
### The H0 is rejected! The new machine packs faster. Personally, I think there is enough
### evidence that one machine is faster. Even with 10 samples. 

### Matched Pairs Test  ####

In this challenge we will compare dependent samples of data describing our Pokemon (file files_for_lab/pokemon.csv). Our goal is to see whether there is a significant difference between each Pokemon's defense and attack scores. Our hypothesis is that the defense and attack scores are equal. Compare the two columns to see if there is a statistically significant difference between them and comment your result.

In [None]:
### H0: Attack and defense are equal
### H1: Attack and defense are significantly different

In [24]:
data = pd.read_csv('./files_for_lab/pokemon.csv')
data.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [25]:
st.ttest_rel(data["Attack"], data["Defense"])

Ttest_relResult(statistic=4.325566393330478, pvalue=1.7140303479358558e-05)

In [None]:
### Both columns are significantly different

### Inferential statistics - ANOVA ###

In [None]:
### H0: Different powers do not have an effect on Etching Rate
### H1: Different powers have different etching rates 

In [30]:
data = pd.read_excel("./files_for_lab/anova_lab_data.xlsx")
data

Unnamed: 0,Power,Etching Rate
0,160 W,5.43
1,180 W,6.24
2,200 W,8.79
3,160 W,5.71
4,180 W,6.71
5,200 W,9.2
6,160 W,6.22
7,180 W,5.98
8,200 W,7.9
9,160 W,6.01


In [34]:
data["Power "].value_counts()

160 W    5
180 W    5
200 W    5
Name: Power , dtype: int64

In [36]:
group_df = data.groupby('Power ')['Etching Rate'].agg(power_mean='mean').reset_index()
group_df

Unnamed: 0,Power,power_mean
0,160 W,5.792
1,180 W,6.238
2,200 W,8.318


In [43]:
st.f_oneway(data[data['Power '] == "160 W"]['Etching Rate'],data[data['Power '] == "180 W"]['Etching Rate'],data[data['Power '] == "200 W"]['Etching Rate'])[1]


7.506584272358903e-06

In [None]:
### The pvalue is well below 0.05. This means that different power intensities do have an 
### effect on etching rate 