# HW3 problems

### 1. Cumulative upwelling index

_Note:_ More information on loops can be found in Chapter 3 of
Python For Everybody.

NOAA publishes an upwelling index for different sites along the coast.

[http://www.pfeg.noaa.gov/products/PFEL/modeled/indices/upwelling/NA/what_is_upwell.html](http://www.pfeg.noaa.gov/products/PFEL/modeled/indices/upwelling/NA/what_is_upwell.html)

For January 2017, the following daily averages were calculated (in units of of cubic meters per second along each 100 meters of coastline, positive upwelling favorable):

[88., 11., -164.,-16., 82., -53., -321., -257., 1., -43., 21., 67., 45., 54., 41., 12., 1., -134., -9., -6., -122., -94., 22., -6., 8., 10., -7., -3., 14., 5.]

You are interested in the cumulative upwelling index. 

Write a function that uses a for loop to calculate and print the cumulative sum of an arbitrary 1-D array. Use this function to print the cumulative upwelling index for January 2017 (in units of cubic meters per 100m meters coastline). Do not use `np.cumsum` in your function, but feel free to use it to check your results. Use the function in the code block below and display your results for the cumulative upwelling index in January 2017.

_Submission format:_ Update the `cumulativesum` function in the [myfuncs.py](myfuncs.py) file file in this repository,  upload it to your HW3 repository and commit the changes.

_Grading criteria:_ Your function gives correct output. There is a docstring that describes the function. There are comments in your code.  Your function works for an array of any length. Correct units are displayed.

In [19]:
import numpy as np
from scipy import stats
import myfuncs

In [8]:
upwelling_index = [88., 11., -164.,-16., 82., -53., -321.,
                   -257., 1., -43., 21., 67., 45., 54., 41.,
                   12., 1., -134., -9., -6., -122., -94., 22.,
                   -6., 8., 10., -7., -3., 14., 5.]

# use the cumulativesum function in myfuncs.py
cui = myfuncs.cumulativesum(upwelling_index)  #m^3/s/100m coastline
cui = cui*86400  #86400 s/day

print('January 2017 cumulative upwelling index in m^3/100m coastline =',cui)

January 2017 cumulative upwelling index in m^3/100m coastline = [  7603200.   8553600.  -5616000.  -6998400.     86400.  -4492800.
 -32227200. -54432000. -54345600. -58060800. -56246400. -50457600.
 -46569600. -41904000. -38361600. -37324800. -37238400. -48816000.
 -49593600. -50112000. -60652800. -68774400. -66873600. -67392000.
 -66700800. -65836800. -66441600. -66700800. -65491200. -65059200.]


#### 2. One-way ANOVA function

Write a function that performs a one-way analysis of variance for an array with _J_ groups (in columns) and _N_ samples in each group (in rows). You may assume that the data set is balanced, i.e. there are the same number of observations in each group.

The function should return the F statistic and the p-value. You can calculate the p-value with the help of the `stats.f.cdf` function.

```
from scipy import stats
stats.f.cdf(F,dfn,dfd)
```
which returns the cumulative F distribution given the F statistic, the degrees of freedom in the numerator (dfn) and the degrees of freedom in the denominator (dfd).

Do not use `stats.f_oneway` in your function, but feel free to use this function to check your work.

Use the function to test the null hypothesis of no difference between sample means, for the MgO example in Table 10.1 of McKillup and Dyar (using the csv file incuded in this repository). Use if-else statements to print whether the null hypothesis can be rejected with 95% and 99% confidence.

_Submission format:_ Update the `anova` function the [myfuncs.py](myfuncs.py) file in this repository,  upload it to your HW3 repository and commit the changes. Define the null hypothesis being tested, and use the function in the space below to determine whether the null hypothesis can be rejected.

_Grading criteria:_ Your function should work for any balanced array of data with groups in columns and samples in rows. Your function should have a doc-string that describes the inputs and outputs. Your function should use descriptive variable names.

In [10]:
data = np.genfromtxt('MgO_Maine.csv',skip_header=1,delimiter=',')

In [14]:
# use the anova_oneway function in myfuncs.py
F,p = myfuncs.anova_oneway(data)


F-statistic = 10.8 p-value = 0.00405830677724


Display the results below. State the null hypothesis and determine whether the null hypothesis should be accpted or rejected

In [17]:
print('Null Hypthesis = mean wt. % of MgO in tourmalines from Mount Mica, Sebago Batholith, and Black Mountain are the same')
#  Alternative Hypothesis = mean wt. % of MgO in tourmalines from at least one location is different
print('F-statistic =',F,'p-value =',p)

#accept or reject Ho?

#95% confidence critical F=4.26
#   alpha=0.05
print('With 95% confidence and using the critical F-value:')
if F>4.26:
    print('Reject the null hypothesis')
else:
    print('Accept the null hypothesis')
    
print('With 95% confidence and using the p-value:')
if p<0.05:
    print('Reject the null hypothesis')
else:
    print('Accept the null hypothesis')
    
#99% confidence critical F=8.02
#   alpha=0.01
print('With 99% confidence and using the critical F-value:')
if F>8.02:
    print('Reject the null hypothesis')
else:
    print('Accept the null hypothesis')
print('With 99% confidence and using the p-value:')
if p<0.01:
    print('Reject the null hypothesis')
else:
    print('Accept the null hypothesis')

Null Hypthesis = mean wt. % of MgO in tourmalines from Mount Mica, Sebago Batholith, and Black Mountain are the same
F-statistic = 10.8 p-value = 0.00405830677724
With 95% confidence and using the critical F-value:
Reject the null hypothesis
With 95% confidence and using the p-value:
Reject the null hypothesis
With 99% confidence and using the critical F-value:
Reject the null hypothesis
With 99% confidence and using the p-value:
Reject the null hypothesis


### 3. Short problems



Fill in code to solve the short problems below. Use any function in `scipy.stats` or other libraries.

##### a. Comparing respiration rates

Water column respiration rates are measured in dark bottle incubations at two different stations on an oceanographic cruise. Three replicates are taken at each station. The values (in units of mL/L d$^{-1}$) are given below:

Station A: ```[0.45, 0.77, 0.71]```

Station B: ```[0.54, 0.43, 0.36]```

Use an appropriate statistical test to determine whether there is a significant difference in the mean respiration rate between the two stations.

In [13]:
StationA=[0.45,0.77,0.71]  #mL/L d −1
StationB=[0.54,0.43,0.36]  #mL/L d −1

#Welch's t-test
t,p = stats.ttest_ind(StationA,StationB,equal_var=False)

print('Null Hypothesis = mean respiration rate at Station A is equal to mean respiration rate at Station B.')
print('t-statistic =',t,', p-value =',p)
print('critical t-value =',stats.t.ppf(.975,2))

print('Since p =',p,'and is > 0.05 and t =',t,'and is <',stats.t.ppf(.975,2),', there is no significant difference in the mean respiration rate between the two stations.')

Null Hypothesis = mean respiration rate at Station A is equal to mean respiration rate at Station B.
t-statistic = 1.79685824471 , p-value = 0.168617556762
critical t-value = 4.30265272991
Since p = 0.168617556762 and is > 0.05 and t = 1.79685824471 and is < 4.30265272991 , there is no significant difference in the mean respiration rate between the two stations.


##### b. Comparing two years of current meter records

January means of alongshore velocity ($V$) from current meter data off the coast of British Columbia are reported in the literature for two different years. The means and standard deviations of daily averaged values in January are

Year 1: $\bar{V_1} = 23 \pm 3 \text{ cm/s}$

Year 2: $\bar{V_2} = 20 \pm 2 \text{ cm/s}$

Test the null hypothesis that the means are the same between these two years, with 95% confidence. You may assume that each daily average is an independent sample. State any other assumptions that you make in your analysis.

In [10]:
from scipy import stats
#mean and standard deviation, year 1:
meanyr1=23
sdyr1=3

#mean and standard deviation, year 2:
meanyr2=20
sdyr2=2

N=31  #sample size for both data sets: 31 (January daily means)

#t-statistic & p-value:
t,p=stats.ttest_ind_from_stats(meanyr1,sdyr1,N,meanyr2,sdyr2,N,equal_var=True)

#critical t-value for 95% confidence (one-tailed):
critt = stats.t.ppf(.95,30)
print('Critical t-value is',critt)
print('t-statistic =',t,'p-value =',p)
print('high t-value and low p-value --> Reject the null hypothesis')


Critical t-value is 1.69726089436
t-statistic = 4.6326599769 p-value = 1.99001461725e-05
high t-value and low p-value --> Reject the null hypothesis


##### c. Power analysis and experimental design

You are studying the effects of a marine reserve on juvenile rock fish. Previous literature indicates that the juveniles of the species you are studying have a standard length of 70 +/- 30 mm (mean +/- standard deviation). The marine reserve will allow you to collect 20 fish for scientific purposes. 

If your target power is 80% and your confidence level is 95%, what is the minimum difference in mean length you can expect to observe in the marine reserve? You can assume that the fish lengths are normally distributed.

What is the probability of not observing a significant effect of this magnitude if there actually is one?

In [14]:
from statsmodels.stats import power

#target power = 0.8
#alpha for 95% confidence = 0.05
#sample size = 20
effectsize=power.tt_solve_power(power=0.8,alpha=0.05,nobs=20)

#rockfish mean length and standard deviation:
rflength=70
rfsd=30

#effect size=|meanlength-70|/30
meanlength=rflength+rfsd*effectsize

diff=meanlength-rflength
print('Minimum difference in mean length you can expect to observe is',round(diff,1),'mm')

print('The probability of not observing a significant effect if there actually is one, is 20%')

Minimum difference in mean length you can expect to observe is 19.8 mm
The probability of not observing a significant effect if there actually is one, is 20%
