# F and T tests in Python

To compare your various indicator data, we need to process the data a little bit (to get averages and standard deviations, at least) and then you need to set up the F test and T-tests to then make a decision. But you already did this in Lab 1, so you can re-use all that code.

In [None]:
# Just like we imported some extra math functions before, we're going to import some 
# extra math and statistical functions here
import math
import numpy as np
import scipy.stats as stats

# we need the average and standard deviation for each of our data sets. 
#Covert them into csv files and save them in the same folder as this notebook
data = np.genfromtxt('SP20_exp3.csv', dtype=float, delimiter=',', names=True) 

#using the names = True in our import command has told python that all of our columns have names in the first row 
# and we can use those names to call the data!


average_BB = np.nanmean(data['BB'])
s_BB = np.nanstd(data['BB'])

print ("the average HCl concentration calculated using bromothymol blue is " + str(average_BB) + " +/- " + str(s_BB) + " M")

average_MR =  np.nanmean(data['MR'])
s_MR = np.nanstd(data['MR'])

print ("the average HCl concentration calculated using methyl red is " + str(average_MR) + " +/- " + str(s_MR) + " M")

average_BG =  np.nanmean(data['BG'])
s_BG = np.nanstd(data['BG'])

print ("the average HCl concentration calculated using Bromocresol Green is " + str(average_BG) + " +/- " + str(s_BG) + " M")

average_Ph =  np.nanmean(data['Ph'])
s_Ph = np.nanstd(data['Ph'])

print ("the average HCl concentration calculated using Phenolphthalein is " + str(average_Ph) + " +/- " + str(s_Ph) + " M")

# We're just missing the last two indicators! 
# Fill in your own code to print out the average and standard deviation for thymolphthalein and methyl orange.





Make sure that the output from above matches a calculation you did by hand.  

We might also want 95 % confidence interval. We calculated this back in the very first Experiment 1 post-lab notebook:

In [None]:
# Since our data arrays are all different sizes, we have to make sure we remove any blank rows (with "nan" in them
# from our calculation of the size.  You can use a similar statement to calculate n for each indicator.
n = data.size - np.isnan(data['BB']).sum()

#the first input is confidence %, the second is degrees of freedom (n-1)
t = stats.t.ppf(0.95, n-1)

CI_BB = s_BB*t/math.sqrt(n)

print ("[HCl] calculated using bromothymol blue is " + str(average_BB) + " +/- " + str(CI_BB) + " M, at the 95% confidence interval")

### F-test
We want to know whether we can use all of the indicators interchangably. Are they all giving us essentially the same answer, or are some of the indicators producing results which are significantly different from the others? We will eventually want to know whether the means are the same, but there are many different ways to compare the means. First, we need to know if the standard deviations are similar, to help us decide which t-test to do!

The F-test is a simple test:

$$ F_{calculated} = (s_{1}^{2}/s_{2}^{2}) $$

Note that $ s_{1} $ must be the larger standard deviation, so you should always have an F value greater than 1!  Let's start by comparing the two datasets that have the closest means.


In [None]:
# Paste in your F-test code from Lab 1.  Remember to add in a test to make sure that F_calc is > 1.

## Using the F-test to choose a t-test
Then we need to actually use our data to make a decision! This is where we use if-then statements to make a decision about how to proceed!

We have two possible methods for calculating our t value.
1. If the variance of the two data sets is the same, then we can use:

$$ {\displaystyle t_{calc}={\frac {{\bar {x}}_{1}-{\bar {x}}_{2}}{s_{pooled}\cdot {\sqrt {{\frac {1}{n_{1}}}+{\frac {1}{n_{2}}}}}}}} $$

where $$ s_{pooled} = {\displaystyle s_{p}={\sqrt {\frac {\left(n_{1}-1\right)s_{{1}}^{2}+\left(n_{2}-1\right)s_{{2}}^{2}}{n_{1}+n_{2}-2}}}.} $$


In this case, degrees of freedom is $ d.o.f = n_{1} + n_{2} -2 $

2. If the variance of the two data sets is different, then we must use:
$$ {\displaystyle t={\frac {{\bar {x}}_{1}-{\bar {x}}_{2}}{{\sqrt {{\frac {s_{1}^{2}}{n_{1}}}+{\frac {s_{2}^{2}}{n_{2}}}}}}}} $$

and the degrees of freedom equation is a little more complicated:

$$ {\displaystyle \mathrm {d.o.f.} ={\frac {\left({\frac {s_{1}^{2}}{n_{1}}}+{\frac {s_{2}^{2}}{n_{2}}}\right)^{2}}{{\frac {\left(s_{1}^{2}/n_{1}\right)^{2}}{n_{1}-1}}+{\frac {\left(s_{2}^{2}/n_{2}\right)^{2}}{n_{2}-1}}}}.} $$


In [None]:
# Paste in your F-test IF statements and dof and t calculation code from Lab 1 here.  
# Remember to add a test to make sure that t_calc is positive!

## Using the t-test to make a decision
Now we have another decision to make, using that t_calc. If t_calc is greater thant t_critical, the means are not the same, and therefore the two indicators are NOT producing the same answer. We would not want to combine those data sets, or use those two indicators interchangably.


To make this decisions, again, we'll use scipy to pull the right critical value. Then write your own if else statement to print out a statement about whether the two data sets have similar means or different means

In [None]:
# Paste in your t-test code from Lab 1 here!

Repeat this process for another pair of indicators! 

# Manipulating Titration Data

We will discuss how to get your data formatted to use it in these post-lab calculations!

First, let's just take a look a the pH as a function of titrant added, and make sure our data has imported correctly!


In [None]:
import math
import matplotlib.pyplot as plt
import numpy as np


# This is how we'll import our data; it should be saved as a .csv file (but NOT UTF-8 csv), 
# in the same folder as this notebook. 
# Make sure you have volume in the first column and pH in the second column, with no headings on the data!

csv = np.genfromtxt ('sample_titration.csv', delimiter=",")
volume = csv[:,1]
pH = csv[:,0]

plt.plot(volume, pH, 'ro')

# Add labels on the x and y axis, always including units.
plt.xlabel('titrant added (mL)')
plt.ylabel('pH')


#let's just take a look!
plt.show()
      
print(csv)

## Finding equivalence points

Now, paste in your code from PythonLab 1 for taking the derivative of the titration curve data.  Note that the slope will be negative, so you may want to transform the data to positive numbers ... or use the np.argmin function to find the minimum for a negative peak.

In [None]:
# Import code here!


Notice that the sharp peaks are at the equivalence points! Remember, a derivitive is just a measure of how quickly your function is changing, so it is largest where the slope of the line is largest. This makes the equivalence points a lot easier to see! 
Paste in your code from PythonLab 1 to do this.

In [None]:
# Import code here!

Does this endpoint match a visual inspection of the graph?  If so, record this endpoint volume in your ELN.  If not, check your raw data for a repeated reading (two identical points in a row).  If you find one, you can delete it in Excel and re-import, or use numpy's delete function. 



## Gran Plots

Our second method for finding endpoints is the Gran plot. This works by highlighting the region near the equivalence point and zeroing in on the slope of the line as it approaches the endpoint. <b> It is helpful to already have an idea where your endpoint is, roughly, before you begin this section. </b>


Gran plots can be helpful, because they take advantage of additional points leading up to the endpoint. Since the pH meter is least accurate near the endpoint, including these additional points in your calculation can help reduce some of the error coming from the sluggish kinetics of the pH meter itself.

<b>Note that the Gran plot is done differently depending on whether you are titrating an acid or a base.</b>  When titrating a weak base with a strong acid (in other words, the strong acid is up in the burette), we need to graph volume x 10^pH vs volume.
When titrating a weak acid with a strong base, we need to graph volume x 10^-pH vs volume.


In [None]:
# Import and modify your Gran plot code here.

The endpoint is where this graph begins to approach zero, in the steepest part of the curve. To fit this to a line, ideally we would use a linear regression on the linear data just before the endpoint (in this case, what data should you choose?)

In [None]:
# Import and modify your code here from PythonLab 1 for trimming down the Gran plot data to the linear region.

We'd like to get the most linear data close to the endpoint -- not including that last bit of curvature on the right.  Adjust the trim and start points to lose any curved data on either extreme of the plot.

Once this is done, we want the x-intercept of this plot, which will be our first equivalence point!

Remember, if $ y= mx +b $ we solve for the x-intercept by plugging in a zero for y, and solve for x. 
Rearranging, you'll get $$ x_{intercept} = \frac{-b}{m} $$

In [None]:
# getting the equation of the line

import scipy.stats

# The linear regression function in the scipy stats module returns 5 values: slope, intercept, R-squared and then 
# two uncertainty values p and s_m
# We'll ignore the last two for the moment, since all we really need right now is the equation of the line
m, b, R2, p, s_m = scipy.stats.linregress(volume_trim, Gran_trim)

# solve for the x-intercept (where y = 0)

x_intercept = 

print ("The 1st equivalence point determined by the Gran plot is " + str(x_intercept) + " mL") 

Record this endpoint in your ELN, too.  How different is this value from the 1st equivalence point determined by the derivative method?  

## Submission Instructions


Attach this completed Jupyter notebook file to your Results and Analysis page.