# Welch's T-Test Exercise

In [2]:
import numpy
import scipy.stats
import pandas

Perform a t-test on two sets of baseball data (left-handed and right-handed hitters).

You will be given a csv file that has three columns.  A player's `name`, `handedness` (`L` for lefthanded or `R` for righthanded) and their career batting average (called `avg`).
    
Write a function that will read that the csv file into a pandas data frame, and run Welch's t-test on the two cohorts defined by handedness.
    
One cohort should be a data frame of right-handed batters. And the other cohort should be a data frame of left-handed batters.
    
We have included the `scipy.stats` library to help you write or implement Welch's t-test: http://docs.scipy.org/doc/scipy/reference/stats.html
    
With a significance level of 95%, if there is no difference between the two cohorts, return a tuple consisting of `True`, and then the tuple returned by `scipy.stats.ttest`.  
    
If there is a difference, return a tuple consisting of `False`, and then the tuple returned by `scipy.stats.ttest`.
    
 For example, the tuple that you return may look like:
 ```
 (True, (9.93570222, 0.000023))
 ```

In [15]:
def compare_averages(filename):
    df = pandas.read_csv(filename)
    
    # Lefties
    df_l = df[df['handedness'] == 'L']
    
    # Righties
    df_r = df[df['handedness'] == 'R']
    
    # Run the Welch's t-test
    t_statistic, p_value = scipy.stats.ttest_ind(df_l['avg'], df_r['avg'], equal_var=False)
    
    # Is the p value greater than 0.05? (a 95% confidence interval)
    is_diff = p_value > 0.05
    
    # Return the tuple with the nested tuple
    return (is_diff, (t_statistic, p_value))

compare_averages('baseball_stats.csv')

(False, (9.9357022262420944, 3.8102742258887383e-23))