# Statistical analysis of biovolume BBM

First we import all relevant libraries. Then import our biovolume data (biovolume_SBWW.xlsx) the dataframe is displayed below. 

Since we have multiple groups with one independent variable (temperature) and want to compare them we do an one-way ANOVA for statistical analysis. Therefore before we do so we need to check if we violate the assumptions for doing the one-way ANOVA analysis. Assumptions are normality and equal variance. The analysis for determining if these assumptions hold are Shapiro-wilks and Levene's test respectivly. 

In [11]:
import numpy as np
from scipy import stats
import pandas as pd
import scikit_posthocs as sp

In [2]:
df1 = pd.read_excel('biovolume_BBM.xlsx')
display(df1) #Making dataframe from excel file

Unnamed: 0,BBM_10,BBM_15,BBM_20,BBM_25
0,1726000000,2463000000,2296000000,4258000000
1,1756000000,2775000000,1876000000,3819000000
2,1422000000,2097000000,1828000000,5049000000


In [3]:
BBM_10 = df1['BBM_10'].values.tolist() #Putting each column from dataframe into lists
BBM_15 = df1['BBM_15'].values.tolist() #Putting each column from dataframe into lists
BBM_20 = df1['BBM_20'].values.tolist() #Putting each column from dataframe into lists
BBM_25 = df1['BBM_25'].values.tolist() #Putting each column from dataframe into lists
data = [BBM_10, BBM_15, BBM_20, BBM_25]
print(data)

[[1726000000, 1756000000, 1422000000], [2463000000, 2775000000, 2097000000], [2296000000, 1876000000, 1828000000], [4258000000, 3819000000, 5049000000]]


In [4]:
stats.levene(BBM_10, BBM_15, BBM_20, BBM_25, center='median', proportiontocut=0.05)

LeveneResult(statistic=0.7495201570154548, pvalue=0.552557316227051)

In [5]:
print(stats.shapiro(BBM_10),
stats.shapiro(BBM_15),
stats.shapiro(BBM_20),
stats.shapiro(BBM_25))

ShapiroResult(statistic=0.8167734742164612, pvalue=0.1552039533853531) ShapiroResult(statistic=0.9978899359703064, pvalue=0.9122374057769775) ShapiroResult(statistic=0.8260316848754883, pvalue=0.17828816175460815) ShapiroResult(statistic=0.9734259843826294, pvalue=0.6872667074203491)


In [6]:
stats.f_oneway(BBM_10, BBM_15, BBM_20, BBM_25)

F_onewayResult(statistic=29.575190870370392, pvalue=0.00011128041304519821)

In [7]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

In [8]:
df2 = pd.DataFrame({'score': [1726000000, 1756000000, 1422000000,
                                2463000000, 2775000000, 2097000000,
                               2296000000, 1876000000, 1828000000,
                               4258000000, 3819000000, 5049000000],
               'group': np.repeat(['BBM_10', 'BBM_15', 'BBM_20', 'BBM_25'], repeats=3)})
    
    

In [9]:
tukey = pairwise_tukeyhsd(endog=df2['score'],
                          groups=df2['group'],
                          alpha=0.05)


In [10]:
print(tukey)

            Multiple Comparison of Means - Tukey HSD, FWER=0.05             
group1 group2     meandiff    p-adj       lower            upper      reject
----------------------------------------------------------------------------
BBM_10 BBM_15  810333333.3333 0.1248  -205937337.4528 1826604004.1195  False
BBM_10 BBM_20  365333333.3333 0.6588  -650937337.4528 1381604004.1195  False
BBM_10 BBM_25 2740666666.6667  0.001  1724395995.8805 3756937337.4528   True
BBM_15 BBM_20    -445000000.0 0.5303 -1461270670.7861  571270670.7861  False
BBM_15 BBM_25 1930333333.3333 0.0013   914062662.5472 2946604004.1195   True
BBM_20 BBM_25 2375333333.3333  0.001  1359062662.5472 3391604004.1195   True
----------------------------------------------------------------------------
