<a href="https://colab.research.google.com/github/nathanbollig/vet-graduate-expectations-survey/blob/main/SVM_WVMA_specialists.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Veterinary graduate expectations survey

Evaluating the differences in graduate expectations between SVM specialists and WVMA specialists. Start by uploading the data into the working directory. Two files are required:

1.   `SVM.xlsx`: SVM graduate expectations survey results
2.   `WVMA.xlsx`: WVMA graduate expectations survey results

In [1]:
! pip install xlsxwriter

Collecting xlsxwriter
[?25l  Downloading https://files.pythonhosted.org/packages/2c/ce/74fd8d638a5b82ea0c6f08a5978f741c2655a38c3d6e82f73a0f084377e6/XlsxWriter-1.4.3-py2.py3-none-any.whl (149kB)
[K     |██▏                             | 10kB 9.9MB/s eta 0:00:01[K     |████▍                           | 20kB 13.5MB/s eta 0:00:01[K     |██████▋                         | 30kB 10.0MB/s eta 0:00:01[K     |████████▊                       | 40kB 8.8MB/s eta 0:00:01[K     |███████████                     | 51kB 7.4MB/s eta 0:00:01[K     |█████████████▏                  | 61kB 7.8MB/s eta 0:00:01[K     |███████████████▍                | 71kB 7.4MB/s eta 0:00:01[K     |█████████████████▌              | 81kB 7.8MB/s eta 0:00:01[K     |███████████████████▊            | 92kB 7.6MB/s eta 0:00:01[K     |██████████████████████          | 102kB 7.2MB/s eta 0:00:01[K     |████████████████████████▏       | 112kB 7.2MB/s eta 0:00:01[K     |██████████████████████████▎     | 122kB 7.2

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import kruskal

## Read in SVM data

In [3]:
# Use top row as header and skip second header row
svm = pd.read_excel('SVM.xlsx', header=0, skiprows=lambda x: x in [1])  

In [4]:
#svm.head(3)

In [5]:
# Read in questions from second header row and associate with column names
question_svm = {}

top_rows = pd.read_excel('SVM.xlsx', nrows=2) 

for col in list(top_rows.columns):
    question_svm[col] = top_rows.iloc[0][col]

## Read in WVMA data

In [6]:
# Use top row as header and skip second header row
wvma = pd.read_excel('WVMA.xlsx', header=0, skiprows=lambda x: x in [1])  

# Read in questions from second header row and associate with column names
question_wvma = {}

top_rows_wvma = pd.read_excel('WVMA.xlsx', nrows=2) 

for col in list(top_rows_wvma.columns):
    question_wvma[col] = top_rows_wvma.iloc[0][col]

In [7]:
def encode_expectation(response_string):
    if isinstance(response_string, int) == True:
        return response_string
    
    # Encode nan values as -1
    if isinstance(response_string, str) == False:
        if np.isnan(response_string) == True:
            return -1
    
    # Encode string
    s = response_string.lower()
    if s.find('no expectation') > -1:
        return 0
    elif s.find('with assistance') > -1:
        return 1
    elif s.find('indirect supervision') > -1:
        return 3
    elif s.find('direct supervision') > -1:
        return 2
    elif s.find('independently') > -1:
        return 4
    else:
        print(response_string)
        raise ValueError('Expected performance response was not formatted as expected.')

Generalists and specialists are defined by the answer to `wvma.Q49` which is equivalent to `svm.Q59`. We will assume that an empty answer corresponds to a generalist, and a non-empty answer is a specialist.

In [9]:
from collections import Counter
Counter(wvma.Q49)

Counter({'American Board of Veterinary Practitioners': 3,
         'American College of Theriogenologists': 2,
         'American College of Veterinary Anesthesia and Analgesia': 1,
         'American College of Veterinary Anesthesia and Analgesia,American College of Veterinary Dermatology,American College of Veterinary Emergency & Critical Care,American College of Veterinary Internal Medicine,American College of Veterinary Ophthalmologists,American College of Veterinary Surgeons,American Veterinary Dental College': 1,
         'American College of Veterinary Internal Medicine': 2,
         'American College of Veterinary Nutrition': 1,
         'American College of Veterinary Pathologists': 1,
         'American College of Veterinary Surgeons': 1,
         'American Veterinary Dental College': 1,
         'Other non-AVMA recognized specialty credentials': 9,
         nan: 153})

In [11]:
Counter(svm.Q59)

Counter({'American Board of Veterinary Practitioners': 2,
         'American College of Veterinary Anesthesia and Analgesia': 2,
         'American College of Veterinary Clinical Pharmacology,American College of Veterinary Internal Medicine': 1,
         'American College of Veterinary Dermatology': 1,
         'American College of Veterinary Emergency & Critical Care': 2,
         'American College of Veterinary Internal Medicine': 7,
         'American College of Veterinary Internal Medicine,American College of Veterinary Radiology': 1,
         'American College of Veterinary Nutrition': 1,
         'American College of Veterinary Ophthalmologists': 2,
         'American College of Veterinary Pathologists': 4,
         'American College of Veterinary Preventive Medicine,American College of Veterinary Radiology': 1,
         'American College of Veterinary Radiology': 3,
         'American College of Veterinary Surgeons': 5,
         'American College of Zoological Medicine': 3,
    

In [12]:
# Form generalist and specialist datasets
wvma = wvma[wvma['Q49'].notnull()].copy()
svm = svm[svm['Q59'].notnull()].copy()

In [13]:
len(svm)

38

In [14]:
len(wvma)

22

In [15]:
# Filter dataframe to only companion animal respondants (may have responded to other species too)
ca_svm = svm[svm['Q1'].str.contains('Companion Animal (canine and/or feline)', na=False, regex=False)].copy()

In [16]:
# Filter dataframe to only companion animal respondants (may have responded to other species too)
ca_wvma = wvma[wvma['Q1'].str.contains('Companion Animal (canine and/or feline)', na=False, regex=False)].copy()

## Question analysis code

In [17]:
def analyze_question(question_number, filtered_svm_df, filtered_wvma_df, n_subquestions, alpha=0.05, verbose=True):
    """
    Perform an analysis of a given question on a species-filtered dataframe.
    
    Inputs:
        question_number: main question number to analyze
        filtered_svm_df: svm dataframe filtered to respondants with the desired species area
        filtered_wvma_df: wvma dataframe filtered to respondants with the desired species area
        n_subquestions: number of subquestions in the main question
        alpha: power level for the statistical test

    Prints a summary of results.

    Outputs:
        table: summary table
        (pooled_stat, pooled_p, pooled_diff_mean): tuple of statistics describing output of Kruskal test on data pooled across subquestions
        svm_data: list of pooled svm data
        wvma_data: list of pooled wvma data
        sig_count: number of subquestions with significant difference detected (between svm and wvma responses), according to Kruskal test applied at subquestion level

    """

    svm_counts = np.zeros((n_subquestions, 6), dtype=int) # Row for each question, column for empty (-1), 0, 1, 2, 3, and 4 responses
    wvma_counts = np.zeros((n_subquestions, 6), dtype=int) # Row for each question, column for empty (-1), 0, 1, 2, 3, and 4 responses
    rows = []
    svm_pooled = []
    wvma_pooled = []
    sig_count = 0

    for i in range(1, n_subquestions+1):
        qkey = "Q" + str(question_number) + "_" + str(i)
        qstring = question_wvma[qkey].split('-')[2]

        # Encoding
        filtered_svm_df[qkey] = filtered_svm_df[qkey].apply(lambda x: encode_expectation(x))
        filtered_wvma_df[qkey] = filtered_wvma_df[qkey].apply(lambda x: encode_expectation(x))

        # svm tally
        counts = filtered_svm_df[qkey].value_counts(dropna=False)
        for key in counts.keys():
            svm_counts[i-1][key+1] += counts[key] # question index is 1-based; keys range from -1 to 4
        counts = svm_counts[i-1][1:] # counts of 0, 1, 2, 3, and 4
        svm_num_responses = np.sum(counts)
        svm_mean = (0*counts[0] + 1*counts[1] + 2*counts[2] + 3*counts[3] + 4*counts[4]) / svm_num_responses

        # wvma tally
        counts = filtered_wvma_df[qkey].value_counts(dropna=False)
        for key in counts.keys():
            wvma_counts[i-1][key+1] += counts[key]
        counts = wvma_counts[i-1][1:] # counts of 0, 1, 2, 3, and 4
        wvma_num_responses = np.sum(counts)
        wvma_mean = (0*counts[0] + 1*counts[1] + 2*counts[2] + 3*counts[3] + 4*counts[4]) / wvma_num_responses
        
        # Get data
        svm_data = list(filtered_svm_df[qkey])
        wvma_data = list(filtered_wvma_df[qkey])

        # Remove empty values from data
        svm_data = [x for x in svm_data if x != -1]
        wvma_data = [x for x in wvma_data if x != -1]

        assert(svm_num_responses == len(svm_data))
        assert(wvma_num_responses == len(wvma_data))

        # compare samples
        stat, p = kruskal(svm_data, wvma_data)

        # Determine significance
        if p > alpha:
            sig = ""
        else:
            sig = "*"
            sig_count += 1

        # Cache for pooled data
        svm_pooled.extend(svm_data)
        wvma_pooled.extend(wvma_data)

        # Cache for table of results
        row = [qstring] + list(svm_counts[i-1]) + [svm_mean, svm_num_responses] + list(wvma_counts[i-1]) + [wvma_mean, wvma_num_responses, svm_mean-wvma_mean, stat, p, sig]
        rows.append(row)

    # Assemble table of results
    table = pd.DataFrame(rows, columns=["Subquestion", "svm: empty", "svm: 0", "svm: 1", "svm: 2", "svm: 3", "svm: 4", "svm: avg", "svm: num responses", "wvma: empty", "wvma: 0", "wvma: 1", "wvma: 2", "wvma: 3", "wvma: 4", "wvma: avg", "wvma: num responses", "Diff Mean (svm-wvma)", "stat", "pval", "sig"])

    # Apply Kruskal test to pooled data
    pooled_stat, pooled_p = kruskal(svm_pooled, wvma_pooled)
    pooled_diff_mean = np.mean(svm_pooled) - np.mean(wvma_pooled)

    # Print
    if verbose == True:
        print('Pooled Q%s: stat=%.3f, p=%.2e, diff_mean (svm-wvma)=%.3f, sig_subq=%s/%s' % (question_number, pooled_stat, pooled_p, pooled_diff_mean, sig_count, n_subquestions))

    return table, (pooled_stat, pooled_p, pooled_diff_mean), svm_pooled, wvma_pooled, sig_count

In [18]:
table, subq_pooled_result, svm_data, wvma_data, sig_count = analyze_question(16, ca_svm, ca_wvma, n_subquestions=25)

Pooled Q16: stat=131.976, p=1.51e-30, diff_mean (svm-wvma)=0.673, sig_subq=18/25


In [19]:
table

Unnamed: 0,Subquestion,svm: empty,svm: 0,svm: 1,svm: 2,svm: 3,svm: 4,svm: avg,svm: num responses,wvma: empty,wvma: 0,wvma: 1,wvma: 2,wvma: 3,wvma: 4,wvma: avg,wvma: num responses,Diff Mean (svm-wvma),stat,pval,sig
0,Obtain history and perform complete PE,2,1,0,0,0,30,3.870968,31,1,0,0,0,3,11,3.785714,14,0.085253,3.564845,0.059015,
1,Perform ophthalmic exam,2,2,0,2,6,21,3.419355,31,1,1,2,2,3,6,2.785714,14,0.633641,2.980227,0.084287,
2,Perform otoscopic exam,2,2,0,1,1,27,3.645161,31,1,1,1,1,3,8,3.142857,14,0.502304,4.230777,0.039697,*
3,Perform neurologic exam,2,2,0,1,5,23,3.516129,31,1,1,2,5,4,2,2.285714,14,1.230415,14.09235,0.000174,*
4,Perform orthopedic exam,2,2,0,1,5,23,3.516129,31,1,2,1,2,6,3,2.5,14,1.016129,10.301885,0.001329,*
5,Develop problem list and rank order different...,2,0,0,1,5,25,3.774194,31,1,0,1,1,6,6,3.214286,14,0.559908,6.515718,0.010693,*
6,Develop and interpret diagnostic plan,2,1,0,1,4,25,3.677419,31,1,0,2,1,7,4,2.928571,14,0.748848,10.408985,0.001254,*
7,Develop treatment plan,2,2,0,1,6,22,3.483871,31,1,1,2,2,5,4,2.642857,14,0.841014,7.138841,0.007543,*
8,Calculate medication dosage,2,2,0,0,1,28,3.709677,31,1,1,0,0,3,10,3.5,14,0.209677,2.244015,0.134132,
9,Write prescription,2,2,0,0,0,29,3.741935,31,1,1,1,0,3,9,3.285714,14,0.456221,5.354341,0.020671,*


## Group analysis code

In [20]:
# cache data across all groups
group_data = []
group_columns = ["Group", "Pooled stat", "Pooled p", "Pooled diff_mean (svm-wvma)", "Num questions", "Fraction of sig questions", "Pooled num svm reponses", "Pooled num wvma responses"]

In [21]:
# cache tables
output_tables = []
output_tables_sheet_names = []

# cache subquestion table data
output_subq_data = []

In [22]:
# Input info about question group

question_list = [16,17,7,8,9,10,11,12]
n_subq_list = [25,10,25,8,4,12,13,3]
question_strings = ['Medical Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Surgical Procedures', 
                    'Anesthetic Procedures', 
                    'Reproductive Procedures',
                    'Diagnostic Imaging Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures']

assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [23]:
# Code to analyze all questions within the group

def analyze_group(question_list, n_subq_list, question_strings, filtered_svm_df, filtered_wvma_df, alpha=0.05):
    svm_pooled = [] # now pooling over entire group
    wvma_pooled = []
    rows = []
    sig_count = 0
    subq_tables = []
    subq_tables_names = []

    for i in range(len(question_list)):
        question_number = question_list[i]
        n_subquestions = n_subq_list[i]
        question_string = question_strings[i]

        # Run analysis
        table, subq_pooled_result, svm_data, wvma_data, sig_subq = analyze_question(question_number, filtered_svm_df, filtered_wvma_df, n_subquestions, verbose=False)
        pooled_stat, pooled_p, pooled_diff_mean = subq_pooled_result
        svm_num_responses = len(svm_data)
        wvma_num_responses = len(wvma_data)

        # Cache procedure tables
        subq_tables.append(table)
        subq_tables_names.append('Q'+str(question_number))

        # Pool
        svm_pooled.extend(svm_data)
        wvma_pooled.extend(wvma_data)

        # Determine significance
        if pooled_p > alpha:
            sig = ""
        else:
            sig = "*"
            sig_count += 1

        # Cache data for group summary
        row = ['Q'+str(question_number), question_string, pooled_stat, pooled_p, sig, pooled_diff_mean, n_subquestions, "%i/%i" % (sig_subq,n_subquestions), svm_num_responses, wvma_num_responses]
        rows.append(row)

    # Assemble table of results
    group_table = pd.DataFrame(rows, columns=["Question number", "Category", "Pooled stat", "Pooled p", "Sig", "Pooled Diff Mean (svm-wvma)", "Num subquestions", "Fraction of sig subquestions", "Pooled num svm responses", "Pooled num wvma responses"])                     

    # Apply Kruskal test to pooled data
    pooled_stat, pooled_p = kruskal(svm_pooled, wvma_pooled)
    pooled_diff_mean = np.mean(svm_pooled) - np.mean(wvma_pooled)

    # Print
    print('Group result (all questions): stat=%.3f, p=%.2e, diff_mean (svm-wvma)=%.3f, sig_subq=%s/%s' % (pooled_stat, pooled_p, pooled_diff_mean, sig_count, len(question_list)))

    return group_table, (pooled_stat, pooled_p, pooled_diff_mean), sig_count, len(question_list), len(svm_pooled), len(wvma_pooled), (subq_tables, subq_tables_names)

In [24]:
group_table, pooled_q_stats, sig_count, n_questions, svm_responses, wvma_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, ca_svm, ca_wvma)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Companion Animal", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), svm_responses, wvma_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Companion Animal")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=283.129, p=1.56e-63, diff_mean (svm-wvma)=0.734, sig_subq=7/8


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (svm-wvma),Num subquestions,Fraction of sig subquestions,Pooled num svm responses,Pooled num wvma responses
0,Q16,Medical Procedures,131.975896,1.514469e-30,*,0.672521,25,18/25,774,350
1,Q17,Preventive Medicine/Population Health Procedures,45.461919,1.556356e-11,*,0.987436,10,7/10,300,130
2,Q7,Surgical Procedures,40.866033,1.630302e-10,*,0.615926,25,10/25,748,348
3,Q8,Anesthetic Procedures,78.179539,9.408805999999999e-19,*,1.020833,8,8/8,240,96
4,Q9,Reproductive Procedures,3.518576,0.06068457,,0.5125,4,1/4,120,48
5,Q10,Diagnostic Imaging Procedures,20.34153,6.477919e-06,*,0.651389,12,4/12,360,144
6,Q11,Clinical Pathology Procedures,41.616496,1.110523e-10,*,0.756288,13,6/13,389,153
7,Q12,Diagnostic Necropsy Procedures,16.710683,4.353516e-05,*,1.155556,3,2/3,90,36


In [25]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
ss_svm = svm[svm['Q1'].str.contains('Special Species', na=False, regex=False)].copy()
ss_wvma = wvma[wvma['Q1'].str.contains('Special Species', na=False, regex=False)].copy()

In [26]:
# Input info about question group

question_list = [43, 44, 45, 46, 48, 49, 50]
n_subq_list = [20, 9, 11, 8, 6, 13, 3]
question_strings = ['Medical Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Surgical Procedures', 
                    'Anesthetic Procedures', 
                    'Diagnostic Imaging Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures']

assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [27]:
group_table, pooled_q_stats, sig_count, n_questions, svm_responses, wvma_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, ss_svm, ss_wvma)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Special Species", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), svm_responses, wvma_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Special Species")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=152.706, p=4.44e-35, diff_mean (svm-wvma)=1.699, sig_subq=5/7


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (svm-wvma),Num subquestions,Fraction of sig subquestions,Pooled num svm responses,Pooled num wvma responses
0,Q43,Medical Procedures,100.719342,1.059855e-23,*,2.775,20,18/20,180,40
1,Q44,Preventive Medicine/Population Health Procedures,9.584002,0.001962801,*,1.217474,9,0/9,81,13
2,Q45,Surgical Procedures,19.568316,9.706563e-06,*,1.544444,11,3/11,99,20
3,Q46,Anesthetic Procedures,25.295257,4.919168e-07,*,2.097222,8,2/8,72,16
4,Q48,Diagnostic Imaging Procedures,3.796339,0.05136477,,0.796296,6,1/6,54,12
5,Q49,Clinical Pathology Procedures,23.191372,1.466538e-06,*,1.094017,13,2/13,117,26
6,Q50,Diagnostic Necropsy Procedures,0.002369,0.9611833,,-0.055556,3,0/3,27,6


In [28]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
fa_svm = svm[svm['Q1'].str.contains('Food Animal', na=False, regex=False)].copy()
fa_wvma = wvma[wvma['Q1'].str.contains('Food Animal', na=False, regex=False)].copy()

In [29]:
# Input info about question group

question_list = [20, 18, 25, 24, 21, 19, 23, 22, 27]
n_subq_list = [8, 27, 16, 10, 20, 11, 12, 3, 5]
question_strings = ['Handling and Husbandry Procedures',
                    'Medical Procedures',
                    'Surgical Procedures',
                    'Anesthetic Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Reproductive Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures',
                    'Diagnostic Imaging Procedures']


assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [30]:
group_table, pooled_q_stats, sig_count, n_questions, svm_responses, wvma_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, fa_svm, fa_wvma)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Food Animal", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), svm_responses, wvma_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Food Animal")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=43.546, p=4.14e-11, diff_mean (svm-wvma)=0.448, sig_subq=5/9


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (svm-wvma),Num subquestions,Fraction of sig subquestions,Pooled num svm responses,Pooled num wvma responses
0,Q20,Handling and Husbandry Procedures,0.008226,0.9277346,,0.058333,8,0/8,80,48
1,Q18,Medical Procedures,34.591274,4.067254e-09,*,0.645972,27,1/27,244,162
2,Q25,Surgical Procedures,0.001074,0.9738506,,0.065086,16,0/16,145,80
3,Q24,Anesthetic Procedures,6.789354,0.009170309,*,0.69191,10,1/10,89,50
4,Q21,Preventive Medicine/Population Health Procedures,12.181936,0.0004825453,*,0.488889,20,0/20,180,100
5,Q19,Reproductive Procedures,0.259993,0.6101253,,0.237374,11,0/11,99,44
6,Q23,Clinical Pathology Procedures,32.689945,1.080949e-08,*,0.595405,12,5/12,107,48
7,Q22,Diagnostic Necropsy Procedures,1.298275,0.2545285,,0.339744,3,0/3,26,12
8,Q27,Diagnostic Imaging Procedures,9.398026,0.002172192,*,1.233333,5,0/5,45,20


In [31]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
eq_svm = svm[svm['Q1'].str.contains('Equine', na=False, regex=False)].copy()
eq_wvma = wvma[wvma['Q1'].str.contains('Equine', na=False, regex=False)].copy()

In [32]:
# Input info about question group

question_list = [28, 29, 30, 31, 32, 33, 34, 35, 36]
n_subq_list = [7, 24, 8, 8, 15, 9, 11, 3, 5]
question_strings = ['Handling and Husbandry Procedures',
                    'Medical Procedures',
                    'Surgical Procedures',
                    'Anesthetic Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Reproductive Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures',
                    'Diagnostic Imaging Procedures']


assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [33]:
group_table, pooled_q_stats, sig_count, n_questions, svm_responses, wvma_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, eq_svm, eq_wvma)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Equine", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), svm_responses, wvma_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Equine")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=107.585, p=3.31e-25, diff_mean (svm-wvma)=1.010, sig_subq=6/9


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (svm-wvma),Num subquestions,Fraction of sig subquestions,Pooled num svm responses,Pooled num wvma responses
0,Q28,Handling and Husbandry Procedures,1.547991,0.2134322,,0.589286,7,0/7,56,28
1,Q29,Medical Procedures,60.849325,6.161463e-15,*,1.237208,24,12/24,171,96
2,Q30,Surgical Procedures,13.748581,0.0002089786,*,1.303571,8,2/8,56,32
3,Q31,Anesthetic Procedures,19.688348,9.11555e-06,*,1.553571,8,4/8,56,32
4,Q32,Preventive Medicine/Population Health Procedures,14.318119,0.0001543719,*,0.87381,15,0/15,105,60
5,Q33,Reproductive Procedures,10.009923,0.001556991,*,0.912698,9,2/9,63,36
6,Q34,Clinical Pathology Procedures,19.998032,7.752192e-06,*,0.730064,11,0/11,76,33
7,Q35,Diagnostic Necropsy Procedures,3.195811,0.07382716,,0.952381,3,0/3,21,9
8,Q36,Diagnostic Imaging Procedures,0.202901,0.652389,,-0.242857,5,0/5,35,14


## Group Summary

In [34]:
group_summary_table = pd.DataFrame(group_data, columns=group_columns)

In [35]:
ALPHA = 0.05

pvals = list(group_summary_table['Pooled p'])

sigs = []
for p in pvals:
  if p > ALPHA:
      sig = ""
  else:
      sig = "*"
  sigs.append(sig)

group_summary_table.insert(loc=3, column='Sig', value=sigs)

In [36]:
# Add group summary to the beginning of output tables
output_tables.insert(0, group_summary_table)
output_tables_sheet_names.insert(0, "Group summary")

In [37]:
group_summary_table

Unnamed: 0,Group,Pooled stat,Pooled p,Sig,Pooled diff_mean (svm-wvma),Num questions,Fraction of sig questions,Pooled num svm reponses,Pooled num wvma responses
0,Companion Animal,283.128781,1.562406e-63,*,0.73365,8,7/8,3021,1305
1,Special Species,152.705843,4.441818e-35,*,1.699332,7,5/7,630,133
2,Food Animal,43.546044,4.141005e-11,*,0.447816,9,5/9,1015,564
3,Equine,107.585175,3.313546e-25,*,1.009624,9,6/9,639,340


# Generate tables

We will generate the following tables using pooled data from these experiments:

1.   `summary_s.xlsx`: Group summary table and a table for procedure sets (questions) within each group.
2.   `companion_animal_s.xlsx`: Tables for all procedures within the companion animal group.
3.   `special_species_s.xlsx`: Tables for all procedures within the special species group.
4.   `food_animal_s.xlsx`: Tables for all procedures within the food animal group.
5.   `equine_s.xlsx`:Tables for all procedures within the equine group.



## Summary

In [38]:
writer = pd.ExcelWriter('summary_s.xlsx', engine='xlsxwriter')

for i,table in enumerate(output_tables):
    sheet_name = output_tables_sheet_names[i]
    table.to_excel(writer, sheet_name=sheet_name, index=False)

    # Auto-adjust columns widths
    for column in table:
        column_width = max(table[column].astype(str).map(len).max(), len(column))
        col_idx = table.columns.get_loc(column)
        writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

writer.save()

## All procedures

In [39]:
for i, file in enumerate(['companion_animal_s.xlsx', 'special_species_s.xlsx', 'food_animal_s.xlsx', 'equine_s.xlsx']):
    subq_data = output_subq_data[i]
    subq_tables, subq_tables_names = subq_data

    # Loop through tables
    writer = pd.ExcelWriter(file, engine='xlsxwriter')

    for i,table in enumerate(subq_tables):
        sheet_name = subq_tables_names[i]
        table.to_excel(writer, sheet_name=sheet_name, index=False)

        # Auto-adjust columns widths
        for column in table:
            column_width = max(table[column].astype(str).map(len).max(), len(column))
            col_idx = table.columns.get_loc(column)
            writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

    writer.save()