## Homework 8
### Due Friday, 12/6/2024
### SPOT Evaluations
In class, we used some of Dr. Morsony's SPOT data to look at different versions of a t-test and ANOVA.  We also used them to look at what standard deviation actually means for discrete (rather than continuous) data.

We saw that, at least for one class, the answers to the questions are statistically the same.  And the differences got even smaller if we corrected for the data being discreet.

In this assignment, I'd like to see if we can go further and try to see is there really any information in these SPOT surveys?

In the SPOT_surveys directory are .csv files for 5 classes.  We're going to take a look at all of them.

(There is also a bunch more classes with PDF files only.  10 bonus points (1 free homework) to anyone who can figure out how to extract the data from these files.  At minimum, I'd like to get the n, av., and dev. for each question, but ideally I'd like the number of 1s, 2s, 3s, for each question as well from the histograms.)

## Question A

a) First, let's just see if the question responses are the same or different for each class.  Load each class (probably with a loop) and use the different ANOVA tests (ANOVA, Alexander Govern, and Kurskal) for all 9 questions in each class.  Do any of the classes have significant results (p<0.05)?

In [13]:
import numpy as np
from matplotlib import pyplot as plt
from scipy import stats as scipy_stats
import glob
import os
from math import log, ceil

In [3]:
survey_directory = "/home/stefin/Documents/csu-stan/cs4010/homework-8/SPOT_surveys/"

csv_files = glob.glob(survey_directory + '*.csv')
csv_files = np.asarray(csv_files)
num_questions = 9
significant_results = []

for file in csv_files:
    print(f'File: {file}')
    data = np.genfromtxt(
        file,
        skip_header=1,
        usecols=np.arange(2, 2 + num_questions),
        delimiter=',',
        encoding='windows-1252',
        filling_values=np.nan
    )
    
    questions_data = []
    for q in range(num_questions):
        score = data[:, q]
        valid_scores = score[(score >= 1) & (~np.isnan(score))]
        questions_data.append(valid_scores)
    
    anova_result = scipy_stats.f_oneway(*questions_data)
    print(f'ANOVA Result: {anova_result}')
    
    kruskal_result = scipy_stats.kruskal(*questions_data)
    print(f'Kruskal-Wallis Result: {kruskal_result}')
    
    alexander_result = scipy_stats.alexandergovern(*questions_data)
    print(f'Alexander Govern Result: {alexander_result}\n')
    
    # Check for significant results (p-value < 0.05)
    if anova_result.pvalue < 0.05:
        print(f'ANOVA: Significant result detected (p-value = {anova_result.pvalue})')
        significant_results.append((file, 'ANOVA', anova_result.pvalue))
    else:
        print(f'ANOVA: No significant result (p-value = {anova_result.pvalue})')
    
    if kruskal_result.pvalue < 0.05:
        print(f'Kruskal-Wallis: Significant result detected (p-value = {kruskal_result.pvalue})')
        significant_results.append((file, 'Kruskal-Wallis', kruskal_result.pvalue))
    else:
        print(f'Kruskal-Wallis: No significant result (p-value = {kruskal_result.pvalue})')

    if alexander_result.pvalue < 0.05:
        print(f'Alexander Govern: Significant result detected (p-value = {alexander_result.pvalue})')
        significant_results.append((file, 'Alexander Govern', alexander_result.pvalue))
    else:
        print(f'Alexander Govern: No significant result (p-value = {alexander_result.pvalue})')
    
    print(f'------------------------------------------------------------\n')

if significant_results:
    for result in significant_results:
        print(f'File: {result[0]}, Test: {result[1]}, p-value: {result[2]}')
else:
    print('No significant results found in any class.')

File: /home/stefin/Documents/csu-stan/cs4010/homework-8/SPOT_surveys/Spring_2024-ASTR3000-001.csv
ANOVA Result: F_onewayResult(statistic=np.float64(0.8630019405881474), pvalue=np.float64(0.5485311902636981))
Kruskal-Wallis Result: KruskalResult(statistic=np.float64(7.4683616164839055), pvalue=np.float64(0.4870423405035633))
Alexander Govern Result: AlexanderGovernResult(statistic=np.float64(8.41263598960705), pvalue=np.float64(0.39423455225713155))

ANOVA: No significant result (p-value = 0.5485311902636981)
Kruskal-Wallis: No significant result (p-value = 0.4870423405035633)
Alexander Govern: No significant result (p-value = 0.39423455225713155)
------------------------------------------------------------

File: /home/stefin/Documents/csu-stan/cs4010/homework-8/SPOT_surveys/Spring_2024-ASTR2100-001.csv
ANOVA Result: F_onewayResult(statistic=np.float64(1.3594591512412386), pvalue=np.float64(0.21663296127570275))
Kruskal-Wallis Result: KruskalResult(statistic=np.float64(11.1973745692430

## Question B

b) In class we saw an example of including the discrete error correction for Welch's t-test by using random trials for many test and averaging the resulting p-values.  Extend this to the ANOVA testing (ANOVA, Alexander Govern, and Kurskal) trials.  Then run the responses for each class through.  How do the new (average) p-values compare to the p-values from part a)?

In [4]:
significant_results = []
num_trials = 1000
measurement_error_std = 0.5

for file in csv_files:
    print(f'File: {file}')
    
    data = np.genfromtxt(
        file,
        skip_header=1,
        usecols=np.arange(2, 2 + num_questions),
        delimiter=',',
        encoding='windows-1252',
        filling_values=np.nan
    )
    
    questions_data = []
    for q in range(num_questions):
        score = data[:, q]
        valid_scores = score[(score >= 1) & (~np.isnan(score))]
        questions_data.append(valid_scores)
    
    anova_result = scipy_stats.f_oneway(*questions_data)
    kruskal_result = scipy_stats.kruskal(*questions_data)
    alexander_result = scipy_stats.alexandergovern(*questions_data)
    
    p_values = {
        'ANOVA': [],
        'Kruskal-Wallis': [],
        'Alexander Govern': []
    }
    
    for trial in range(num_trials):
        simulated_data = []
        for q in range(num_questions):
            noise = np.random.normal(loc=0.0, scale=measurement_error_std, size=questions_data[q].shape)
            simulated_scores = questions_data[q] + noise
            simulated_data.append(simulated_scores)
        
        sim_anova = scipy_stats.f_oneway(*simulated_data)
        p_values['ANOVA'].append(sim_anova.pvalue)
        
        sim_kruskal = scipy_stats.kruskal(*simulated_data)
        p_values['Kruskal-Wallis'].append(sim_kruskal.pvalue)
        
        sim_alexander = scipy_stats.alexandergovern(*simulated_data)
        p_values['Alexander Govern'].append(sim_alexander.pvalue)
    
    average_p_values = {}
    for test in p_values:
        valid_pvals = [p for p in p_values[test] if not np.isnan(p)]
        average_p = np.mean(valid_pvals)
        average_p_values[test] = average_p
    
    tests = [
        ('ANOVA', anova_result.pvalue),
        ('Kruskal-Wallis', kruskal_result.pvalue),
        ('Alexander Govern', alexander_result.pvalue)
    ]
    
    for test_name, original_p in tests:
        avg_p = average_p_values.get(test_name)
        print(f'{test_name}: Original p-value = {original_p:.4f}, Average p-value after correction = {avg_p:.4f}')
        if avg_p < 0.05:
            print(f'{test_name}: Significant after correction.')
            significant_results.append((file, test_name, avg_p))
        else:
            print(f'{test_name}: Not significant after correction.')
    
    print(f'------------------------------------------------------------\n')

if significant_results:
    for result in significant_results:
        print(f'File: {result[0]}, Test: {result[1]}, Average p-value: {result[2]:.4f}')
else:
    print('No significant results found in any class after error correction.')

File: /home/stefin/Documents/csu-stan/cs4010/homework-8/SPOT_surveys/Spring_2024-ASTR3000-001.csv
ANOVA: Original p-value = 0.5485, Average p-value after correction = 0.5341
ANOVA: Not significant after correction.
Kruskal-Wallis: Original p-value = 0.4870, Average p-value after correction = 0.4998
Kruskal-Wallis: Not significant after correction.
Alexander Govern: Original p-value = 0.3942, Average p-value after correction = 0.4647
Alexander Govern: Not significant after correction.
------------------------------------------------------------

File: /home/stefin/Documents/csu-stan/cs4010/homework-8/SPOT_surveys/Spring_2024-ASTR2100-001.csv
ANOVA: Original p-value = 0.2166, Average p-value after correction = 0.3395
ANOVA: Not significant after correction.
Kruskal-Wallis: Original p-value = 0.1908, Average p-value after correction = 0.3309
Kruskal-Wallis: Not significant after correction.
Alexander Govern: Original p-value = 0.2946, Average p-value after correction = 0.3747
Alexander Go

## Question C

c) We'd also like to know if there are differences between classes.  There's three ways I cna think to do this.  One would be to take all the responses to all the questions for each class and do and ANOVA for the 5 classes.  But the problem is the respones in each class aren't independant, so lumping them together isn't really valid.  (So don't do this.)

A second was is to treat the respones to each question as a data set (so you have 5*9=45 data sets) and do a big ANOVA (etc.) on all of them.  Give this a try, using the methods in part a) and b).  What results to you get?

In [5]:
significant_results = []

for q in range(num_questions):
    question_number = q + 1
    print(f'Question {question_number}')
    
    questions_data = []
    for file in csv_files:
        data = np.genfromtxt(
            file,
            skip_header=1,
            usecols=np.arange(2, 2 + num_questions),
            delimiter=',',
            encoding='windows-1252',
            filling_values=np.nan
        )
        
        score = data[:, q]
        valid_scores = score[(score >= 1) & (~np.isnan(score))]
        questions_data.append(valid_scores)
    
    anova_result = scipy_stats.f_oneway(*questions_data)
    kruskal_result = scipy_stats.kruskal(*questions_data)
    alexander_result = scipy_stats.alexandergovern(*questions_data)
    
    p_values = {
        'ANOVA': [],
        'Kruskal-Wallis': [],
        'Alexander Govern': []
    }
    
    for trial in range(num_trials):
        simulated_data = []
        for group in questions_data:
            noise = np.random.normal(loc=0.0, scale=measurement_error_std, size=group.shape)
            simulated_scores = group + noise
            simulated_data.append(simulated_scores)
        
        sim_anova = scipy_stats.f_oneway(*simulated_data)
        p_values['ANOVA'].append(sim_anova.pvalue)
        
        sim_kruskal = scipy_stats.kruskal(*simulated_data)
        p_values['Kruskal-Wallis'].append(sim_kruskal.pvalue)
        
        sim_alexander = scipy_stats.alexandergovern(*simulated_data)
        p_values['Alexander Govern'].append(sim_alexander.pvalue)
    
    average_p_values = {}
    for test in p_values:
        valid_pvals = [p for p in p_values[test] if not np.isnan(p)]
        average_p = np.mean(valid_pvals)
        average_p_values[test] = average_p
    
    tests = [
        ('ANOVA', anova_result.pvalue),
        ('Kruskal-Wallis', kruskal_result.pvalue),
        ('Alexander Govern', alexander_result.pvalue)
    ]
    
    for test_name, original_p in tests:
        avg_p = average_p_values.get(test_name)
        print(f'{test_name}: Original p-value = {original_p:.4f}, Average p-value after correction = {avg_p:.4f}')
        if avg_p < 0.05:
            print(f'{test_name}: Significant after correction.')
            significant_results.append((f'Question {question_number}', test_name, avg_p))
        else:
            print(f'{test_name}: Not significant after correction.')
    
    print(f'------------------------------------------------------------\n')

print('Summary of Significant Results after Error Correction (average p-value < 0.05):')
if significant_results:
    for result in significant_results:
        print(f'Question: {result[0]}, Test: {result[1]}, Average p-value: {result[2]:.4f}')
else:
    print('No significant results found in any question after error correction.')

Question 1
ANOVA: Original p-value = 0.3327, Average p-value after correction = 0.4056
ANOVA: Not significant after correction.
Kruskal-Wallis: Original p-value = 0.3279, Average p-value after correction = 0.4285
Kruskal-Wallis: Not significant after correction.
Alexander Govern: Original p-value = 0.1227, Average p-value after correction = 0.2967
Alexander Govern: Not significant after correction.
------------------------------------------------------------

Question 2
ANOVA: Original p-value = 0.5899, Average p-value after correction = 0.5564
ANOVA: Not significant after correction.
Kruskal-Wallis: Original p-value = 0.6571, Average p-value after correction = 0.5828
Kruskal-Wallis: Not significant after correction.
Alexander Govern: Original p-value = 0.7051, Average p-value after correction = 0.5897
Alexander Govern: Not significant after correction.
------------------------------------------------------------

Question 3
ANOVA: Original p-value = 0.4046, Average p-value after corre

## Question D

d) The third way would be to treat each question seperately and compare the classes for that question with an ANOA.  Give this a try.  What p-values do you get for each question?  And there any questions where (at least) one class gave a significantly different response?

In [12]:
significant_results = []

for q in range(num_questions):
    question_number = q + 1
    print(f'Question {question_number}')
    
    questions_data = []
    for file in csv_files:
        data = np.genfromtxt(
            file,
            skip_header=1,
            usecols=np.arange(2, 2 + num_questions),
            delimiter=',',
            encoding='windows-1252',
            filling_values=np.nan
        )
        
        score = data[:, q]
        valid_scores = score[(score >= 1) & (~np.isnan(score))]
        questions_data.append(valid_scores)
    
    anova_result = scipy_stats.f_oneway(*questions_data)
    print(f'ANOVA Result: {anova_result}')
    
    p_value = anova_result.pvalue
    if p_value < 0.05:
        print(f'Significant difference detected (p-value = {p_value:.4f})')
        significant_results.append((question_number, 'ANOVA', p_value))
    else:
        print(f'No significant difference (p-value = {p_value:.4f})')
    
    print(f'------------------------------------------------------------\n')

# Summary of significant results
print('Summary of Significant Results (p-value < 0.05) per Question:')
if significant_results:
    for result in significant_results:
        print(f'Question {result[0]}: Test = {result[1]}, p-value = {result[2]:.4f}')
else:
    print('No significant differences found in any question across classes.')

Question 1
ANOVA Result: F_onewayResult(statistic=np.float64(1.1665633139618128), pvalue=np.float64(0.3326562047470449))
No significant difference (p-value = 0.3327)
------------------------------------------------------------

Question 2
ANOVA Result: F_onewayResult(statistic=np.float64(0.7067442564135814), pvalue=np.float64(0.5898958230273519))
No significant difference (p-value = 0.5899)
------------------------------------------------------------

Question 3
ANOVA Result: F_onewayResult(statistic=np.float64(1.0163744207967158), pvalue=np.float64(0.4045681731857064))
No significant difference (p-value = 0.4046)
------------------------------------------------------------

Question 4
ANOVA Result: F_onewayResult(statistic=np.float64(1.7668202823070094), pvalue=np.float64(0.14476552601743292))
No significant difference (p-value = 0.1448)
------------------------------------------------------------

Question 5
ANOVA Result: F_onewayResult(statistic=np.float64(1.097110824839193), pvalue

## Question E

e) Really what were doing isn't totally valid anyway - the reponses to the SPOT survey aren't actually numbers, they're strongly agree, agree, etc.  Really, a 1 or 2 is a "postive" reponse and an 4 or 5 is a "negative" response, and a 3 is "neutral".  If we group "postive" and "negative" repsones, and irgnore neutral respones (which is not a great idea), we can treat the repsones as a binomial distribution.  Doing this, we can estimate how many respones we need in a class to get useful data.  For example, how many students would need to respond for you to be confident more students gave positive respones than negative responses?  You can calculate this using the Chernoff_trials, if you pick a good value for p, delta, and epsilon.  What might some good values be?  How many respones are needed for those values?

(A one-sided version of the Chernoff_tails calculation might actually be more useful here, but don't worry about coming up with one for the homework.)

In [17]:
p = 0.6
delta = 0.05
epsilon = 0.05

def Chernoff_trials(p,delta,epsilon):
    # delta is the accuracy we want
    # epsilon is the probbility of obtaining the accuracy

    # We want the probability to equal epsilon for n trials

    # epsilon = 2*e^[−n*(delta^2)/(2*p*(1-p))]

    # ln(epsilon) = 2*p*(1-p) / (n*delta^2) * ln(2)

    # n = 2*p*(1-p)/(delta^2) * ln(2/epsilon)

    n = 2*p*(1-p)/(delta**2) * np.log(2/epsilon)

    return(n)

required_sample_size = calculate_sample_size(p, delta, epsilon)

print(f'Required sample size to be confident that more students gave positive responses than negative responses:')
print(f'p (True Proportion) = {p}')
print(f'delta (Error Tolerance) = {delta}')
print(f'epsilon (Failure Probability) = {epsilon}')
print(f'n (Sample Size) = {required_sample_size}')

Required sample size to be confident that more students gave positive responses than negative responses:
p (True Proportion) = 0.6
delta (Error Tolerance) = 0.05
epsilon (Failure Probability) = 0.05
n (Sample Size) = 3995


## Question F

f) Based on your results, do you think the survey questions in SPOT contain much (statistically valid) information?

Based on the actual sample sizes compared to the required ~4,000 responses, the SPOT survey questions don't contain much statistically valid information.