<h1>Statistics - Python Tutorial Part 5</h1>
<h4>By: Heather S. Deter</h4>

Python is a good tool for running statistics on a lot of data, and if designed properly, you can write a script to run on multiple datasets.

There are a lot of different options for how to run statistics. What's import to learn here is not so much the precise methodology (you can Google which function to use), but more about how to approach statistical analysis from a programming perspective.



<h3>Importing data</h3>
The first thing to do when you need to run statistics is to import your data. Usually in the form of a comma seperated values (csv). Numpy <a>http://www.numpy.org/</a> and pandas <a>http://pandas.pydata.org/</a> are two libraries that are key parts of importing csv files and handling data.

In [None]:
#run this cell

import numpy as np
import pandas as pd

We are going to use some data from a 2015 study on seedlings in Fish Creek. It is important to know the following when you need to import (or export) a file. <br>
1) What is your working directory? Your working directory is the directory your Jupyter Home tab is running from.<br>
2) Where is your file relative to your working directory?<br>
If you are confused about directories check out this explanation: <a>https://www.computerhope.com/jargon/d/director.htm</a><br><br>
While there are a number of different ways to import data, one way is to use numpy. <a>https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html</a><br>
Another method is to use pandas, which is what we will be using.<br> <a>https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html</a>

In [None]:
#import a csv file

#set to the path of your CSV file relative to the working directory
CSVname = 'C:/Users/owner/Documents/research/scripts/learn-python/FishCreekseedlings2015.csv'

#import the CSV file
##the delimeter indicates what seperates your values (commas)
##skip_header allows us to only import the data - the headers are string but we want our data to be float
DATA = pd.read_csv(CSVname)

#print the numpy array
print(DATA)

<h3>T-tests</h3>
Now we can run a t-test using scipy, but first we have to import scipy.

In [None]:
#run this cell

from scipy import stats

We are going to run a Student's t-test. <a>https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html</a>

In [None]:
#seperate out treated and untreated data
Southern = DATA[DATA['Ecotype']=='Southern']
Local = DATA[DATA['Ecotype']=='Local']

#run the t-test on the CFU/ml column
ttest = stats.ttest_ind(Local['Max Height (cm)'],Southern['Max Height (cm)'])
print(ttest)

<h3>One-way ANOVA</h3>
One way to preform a oneway anova is to use scipy f_oneway. <a>https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html</a>

In [None]:
#run a oneway anova
statsanova = stats.f_oneway(Local['Max Height (cm)'],Southern['Max Height (cm)'])
print(statsanova)

Another method is to use statsmodels. <a>https://www.statsmodels.org/devel/examples/notebooks/generated/ols.html</a>

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [None]:
#first we have to rename the columns to remove spaces
DATA.columns = ['Ecotype','Max_Height']

#now we run an ANOVA using the ordinary least squares method
results = ols('Max_Height ~ Ecotype', data=DATA).fit()

#print the ANOVA table
print(results.summary())

#retrieve the actual pvalue
print('\nPvalues')
print(results.pvalues)

<h2>Practice Problem</h2>
<h3>Problem 1</h3>
Go through another dataset, Walley energy. Import the csv file (Walleyenergy.csv) and calculate the average and standard deviation for both near-shore and open-water energy. Run a t-test between the two groups and print the p-value.

Here's some help for calculating the mean and standard deviation with pandas: <a>https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html</a> <b>Hint:</b> try searching for mean and std under methods.


In [None]:
##Write a code below to answer the above question
#Be sure to comment out your code (explain what each section is doing in comments)





