# What is ANOVA testing? 
* testing multiple features
* testing for a difference in means between mutliple groups 
* testing for differences in multiple groups
* measure pvalues on tests on multiple features 
* limiting problems with multiple tests 

H0: all means are equal

HA: at least one mean is different

Limited, insofar as it doesn't say which group is different. 

Tukey Testing will do pairwise testing across all the groups

# Objectives
YWBAT
* apply anova testing for various groups on a dataset
* apply tukey testing for various groups on a dataset 
* use plotly to plot multiple distributions at one time


# Outline
* load in data
* plot distributions using plotly
* conduct anova test
* conduct tukey test

In [None]:
import warnings
warnings.filterwarnings('ignore')
import numpy as np 
import pandas as pd

import scipy.stats as scs
import statsmodels.api as sm
from statsmodels.formula.api import ols


from statsmodels.stats.multicomp import pairwise_tukeyhsd, MultiComparison

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px 
import plotly.figure_factory as ff
#^^^ if this throws an error run the code in the cell below
# let it finish
# then rerun this cell when it is finished

In [None]:
# import sys
# !conda install --yes --prefix {sys.prefix} plotly

In [None]:
# load in data from the gzip file
df = pd.read_csv("../data/BNG_autoData.csv.gz")
df.head()

In [None]:
df.columns

In [None]:
# plot the distributions of horsepower colored by num-of-doors
fig = px.histogram(df, 
                   x="horsepower", 
                   color="num-of-doors", 
                   opacity=0.5,
                   hover_data=['fuel-system', 'bore'])
fig.show()

In [None]:
num_samples = 100
group_labels = ['two', 'four']
two_door_horsepower = df.loc[df['num-of-doors']=='two', 'horsepower'].sample(n=num_samples)
four_door_horsepower = df.loc[df['num-of-doors']=='four', 'horsepower'].sample(n=num_samples)


In [None]:
hist_data = [two_door_horsepower, four_door_horsepower]

fig = ff.create_distplot(hist_data, group_labels)
fig.show()

In [None]:
# repeat this with coloring by fuel-type
fig = px.histogram(df, 
                   x="horsepower", 
                   color="fuel-type", 
                   opacity=0.5,
                   hover_data=['fuel-system', 'bore'])
fig.show()

In [None]:
# repeat for body-style
fig = px.histogram(df, 
                   x="horsepower", 
                   color="body-style", 
                   opacity=0.5,
                   hover_data=['fuel-system', 'bore'])
fig.show()

In [None]:
df['body-style'].unique()

In [None]:
# conduct a one way anova test with horsepower by body-style
# make sure you're writing the H0 and HA

# H0: means are all equal
# HA: at least one mean is different
hatchback_hp = df.loc[df['body-style']=='hatchback', 'horsepower']
sedan_hp = df.loc[df['body-style']=='sedan', 'horsepower']
wagon_hp = df.loc[df['body-style']=='wagon', 'horsepower']
hardtop_hp = df.loc[df['body-style']=='hardtop', 'horsepower']
convertible_hp = df.loc[df['body-style']=='convertible', 'horsepower']


scs.f_oneway(hatchback_hp, sedan_hp, wagon_hp, hardtop_hp, convertible_hp)
# p is close to zero implies that at least one of these is different

# What are your conclusions?  
We know that there is a difference of means in horsepower across body-styles.  Now to investigate particular differences in body styles. 

## Tukey Testing on body-style

In [None]:
multicompare_body_style = MultiComparison(df['horsepower'], df['body-style'])

In [None]:
tukey_hsd = multicompare_body_style.tukeyhsd(alpha=.05)

In [None]:
simple_table = tukey_hsd.summary()
simple_table

## Tukey Analysis Conclusion
The only groups where the null hypothesis (means are equal) was failed to be rejected were comparing **convertible to hatchback** and **hardtop to wagon**.  All of the other groups under pairwise comparisons, yielded a **similar mean horsepower** using the Tukey test. 

In [None]:
# Repeat this for fuel-type

In [None]:
# Conduct Tukey test for horsepower by body-style

In [None]:
# What are your conclusions

# what did we learn? 
* could it be that student ttest is preferred when sample sizes are small...yes. 
* chaining the `.sample()` method to a dataframe/series to get a sample of data
* play with plotly some more 
* hot hand fallacy
* tukey testing for multicomparison 
* method for tukey testing
* Use a tukey test when the anova indicates differences in means and Bonferroni correction
* organizing data for a tukey test
* using scipy.stats for a oneway anova