<a href="https://colab.research.google.com/github/nathanbollig/vet-graduate-expectations-survey/blob/main/WVMA_generalist_specialist.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Veterinary graduate expectations survey

Evaluating the differences in graduate expectations between specialists and generalists within the WVMA. Start by uploading the data into the working directory. One file is required:

 * `WVMA.xlsx`: WVMA graduate expectations survey results

In [11]:
! pip install xlsxwriter



In [12]:
import pandas as pd
import numpy as np
from scipy.stats import kruskal

## Read in WVMA data

In [13]:
# Use top row as header and skip second header row
wvma = pd.read_excel('WVMA.xlsx', header=0, skiprows=lambda x: x in [1])  

# Read in questions from second header row and associate with column names
question_wvma = {}

top_rows_wvma = pd.read_excel('WVMA.xlsx', nrows=2) 

for col in list(top_rows_wvma.columns):
    question_wvma[col] = top_rows_wvma.iloc[0][col]

In [14]:
def encode_expectation(response_string):
    if isinstance(response_string, int) == True:
        return response_string
    
    # Encode nan values as -1
    if isinstance(response_string, str) == False:
        if np.isnan(response_string) == True:
            return -1
    
    # Encode string
    s = response_string.lower()
    if s.find('no expectation') > -1:
        return 0
    elif s.find('with assistance') > -1:
        return 1
    elif s.find('indirect supervision') > -1:
        return 3
    elif s.find('direct supervision') > -1:
        return 2
    elif s.find('independently') > -1:
        return 4
    else:
        print(response_string)
        raise ValueError('Expected performance response was not formatted as expected.')

Generalists and specialists are defined by the answer to Q49. We will assume that an empty answer corresponds to a generalist, and a non-empty answer is a specialist.

In [15]:
from collections import Counter
Counter(wvma.Q49)

Counter({'American Board of Veterinary Practitioners': 3,
         'American College of Theriogenologists': 2,
         'American College of Veterinary Anesthesia and Analgesia': 1,
         'American College of Veterinary Anesthesia and Analgesia,American College of Veterinary Dermatology,American College of Veterinary Emergency & Critical Care,American College of Veterinary Internal Medicine,American College of Veterinary Ophthalmologists,American College of Veterinary Surgeons,American Veterinary Dental College': 1,
         'American College of Veterinary Internal Medicine': 2,
         'American College of Veterinary Nutrition': 1,
         'American College of Veterinary Pathologists': 1,
         'American College of Veterinary Surgeons': 1,
         'American Veterinary Dental College': 1,
         'Other non-AVMA recognized specialty credentials': 9,
         nan: 153})

In [16]:
# Form generalist and specialist datasets
generalist = wvma[wvma['Q49'].isnull()].copy()
specialist = wvma[wvma['Q49'].notnull()].copy()

In [17]:
len(generalist)

153

In [18]:
len(specialist)

22

In [19]:
# Filter dataframe to only companion animal respondants (may have responded to other species too)
ca_specialist = specialist[specialist['Q1'].str.contains('Companion Animal (canine and/or feline)', na=False, regex=False)].copy()

In [20]:
# Filter dataframe to only companion animal respondants (may have responded to other species too)
ca_generalist = generalist[generalist['Q1'].str.contains('Companion Animal (canine and/or feline)', na=False, regex=False)].copy()

## Question analysis code

In [24]:
def analyze_question(question_number, filtered_specialist_df, filtered_generalist_df, n_subquestions, alpha=0.05, verbose=True):
    """
    Perform an analysis of a given question on a species-filtered dataframe.
    
    Inputs:
        question_number: main question number to analyze
        filtered_specialist_df: specialist dataframe filtered to respondants with the desired species area
        filtered_generalist_df: generalist dataframe filtered to respondants with the desired species area
        n_subquestions: number of subquestions in the main question
        alpha: power level for the statistical test

    Prints a summary of results.

    Outputs:
        table: summary table
        (pooled_stat, pooled_p, pooled_diff_mean): tuple of statistics describing output of Kruskal test on data pooled across subquestions
        specialist_data: list of pooled specialist data
        generalist_data: list of pooled generalist data
        sig_count: number of subquestions with significant difference detected (between specialist and generalist responses), according to Kruskal test applied at subquestion level

    """

    specialist_counts = np.zeros((n_subquestions, 6), dtype=int) # Row for each question, column for empty (-1), 0, 1, 2, 3, and 4 responses
    generalist_counts = np.zeros((n_subquestions, 6), dtype=int) # Row for each question, column for empty (-1), 0, 1, 2, 3, and 4 responses
    rows = []
    specialist_pooled = []
    generalist_pooled = []
    sig_count = 0

    for i in range(1, n_subquestions+1):
        qkey = "Q" + str(question_number) + "_" + str(i)
        qstring = question_wvma[qkey].split('-')[2]

        # Encoding
        filtered_specialist_df[qkey] = filtered_specialist_df[qkey].apply(lambda x: encode_expectation(x))
        filtered_generalist_df[qkey] = filtered_generalist_df[qkey].apply(lambda x: encode_expectation(x))

        # specialist tally
        counts = filtered_specialist_df[qkey].value_counts(dropna=False)
        for key in counts.keys():
            specialist_counts[i-1][key+1] += counts[key] # question index is 1-based; keys range from -1 to 4
        counts = specialist_counts[i-1][1:] # counts of 0, 1, 2, 3, and 4
        specialist_num_responses = np.sum(counts)
        specialist_mean = (0*counts[0] + 1*counts[1] + 2*counts[2] + 3*counts[3] + 4*counts[4]) / specialist_num_responses

        # generalist tally
        counts = filtered_generalist_df[qkey].value_counts(dropna=False)
        for key in counts.keys():
            generalist_counts[i-1][key+1] += counts[key]
        counts = generalist_counts[i-1][1:] # counts of 0, 1, 2, 3, and 4
        generalist_num_responses = np.sum(counts)
        generalist_mean = (0*counts[0] + 1*counts[1] + 2*counts[2] + 3*counts[3] + 4*counts[4]) / generalist_num_responses
        
        # Get data
        specialist_data = list(filtered_specialist_df[qkey])
        generalist_data = list(filtered_generalist_df[qkey])

        # Remove empty values from data
        specialist_data = [x for x in specialist_data if x != -1]
        generalist_data = [x for x in generalist_data if x != -1]

        assert(specialist_num_responses == len(specialist_data))
        assert(generalist_num_responses == len(generalist_data))

        # compare samples
        stat, p = kruskal(specialist_data, generalist_data)

        # Determine significance
        if p > alpha:
            sig = ""
        else:
            sig = "*"
            sig_count += 1

        # Cache for pooled data
        specialist_pooled.extend(specialist_data)
        generalist_pooled.extend(generalist_data)

        # Cache for table of results
        row = [qstring] + list(specialist_counts[i-1]) + [specialist_mean, specialist_num_responses] + list(generalist_counts[i-1]) + [generalist_mean, generalist_num_responses, specialist_mean-generalist_mean, stat, p, sig]
        rows.append(row)

    # Assemble table of results
    table = pd.DataFrame(rows, columns=["Subquestion", "specialist: empty", "specialist: 0", "specialist: 1", "specialist: 2", "specialist: 3", "specialist: 4", "specialist: avg", "specialist: num responses", "generalist: empty", "generalist: 0", "generalist: 1", "generalist: 2", "generalist: 3", "generalist: 4", "generalist: avg", "generalist: num responses", "Diff Mean (specialist-generalist)", "stat", "pval", "sig"])

    # Apply Kruskal test to pooled data
    pooled_stat, pooled_p = kruskal(specialist_pooled, generalist_pooled)
    pooled_diff_mean = np.mean(specialist_pooled) - np.mean(generalist_pooled)

    # Print
    if verbose == True:
        print('Pooled Q%s: stat=%.3f, p=%.2e, diff_mean (specialist-generalist)=%.3f, sig_subq=%s/%s' % (question_number, pooled_stat, pooled_p, pooled_diff_mean, sig_count, n_subquestions))

    return table, (pooled_stat, pooled_p, pooled_diff_mean), specialist_pooled, generalist_pooled, sig_count

In [25]:
table, subq_pooled_result, specialist_data, generalist_data, sig_count = analyze_question(16, ca_specialist, ca_generalist, n_subquestions=25)

Pooled Q16: stat=34.427, p=4.43e-09, diff_mean (specialist-generalist)=-0.409, sig_subq=4/25


In [26]:
table

Unnamed: 0,Subquestion,specialist: empty,specialist: 0,specialist: 1,specialist: 2,specialist: 3,specialist: 4,specialist: avg,specialist: num responses,generalist: empty,generalist: 0,generalist: 1,generalist: 2,generalist: 3,generalist: 4,generalist: avg,generalist: num responses,Diff Mean (specialist-generalist),stat,pval,sig
0,Obtain history and perform complete PE,1,0,0,0,3,11,3.785714,14,15,0,0,0,9,76,3.894118,85,-0.108403,1.312677,0.25191,
1,Perform ophthalmic exam,1,1,2,2,3,6,2.785714,14,14,0,3,12,26,45,3.313953,86,-0.528239,1.580448,0.208696,
2,Perform otoscopic exam,1,1,1,1,3,8,3.142857,14,14,0,0,2,17,67,3.755814,86,-0.612957,3.868337,0.049205,*
3,Perform neurologic exam,1,1,2,5,4,2,2.285714,14,14,0,5,12,33,36,3.162791,86,-0.877076,8.209553,0.004167,*
4,Perform orthopedic exam,1,2,1,2,6,3,2.5,14,14,0,4,16,30,36,3.139535,86,-0.639535,2.915934,0.087709,
5,Develop problem list and rank order different...,1,0,1,1,6,6,3.214286,14,14,0,2,4,35,45,3.430233,86,-0.215947,0.695279,0.404374,
6,Develop and interpret diagnostic plan,1,0,2,1,7,4,2.928571,14,15,0,5,10,50,20,3.0,85,-0.071429,0.0,1.0,
7,Develop treatment plan,1,1,2,2,5,4,2.642857,14,15,0,5,11,51,18,2.964706,85,-0.321849,0.41391,0.519992,
8,Calculate medication dosage,1,1,0,0,3,10,3.5,14,15,0,1,4,12,68,3.729412,85,-0.229412,0.537362,0.463528,
9,Write prescription,1,1,1,0,3,9,3.285714,14,15,0,2,2,17,64,3.682353,85,-0.396639,1.07867,0.298995,


## Group analysis code

In [27]:
# cache data across all groups
group_data = []
group_columns = ["Group", "Pooled stat", "Pooled p", "Pooled diff_mean (specialist-generalist)", "Num questions", "Fraction of sig questions", "Pooled num specialist reponses", "Pooled num generalist responses"]

In [28]:
# cache tables
output_tables = []
output_tables_sheet_names = []

# cache subquestion table data
output_subq_data = []

In [29]:
# Input info about question group

question_list = [16,17,7,8,9,10,11,12]
n_subq_list = [25,10,25,8,4,12,13,3]
question_strings = ['Medical Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Surgical Procedures', 
                    'Anesthetic Procedures', 
                    'Reproductive Procedures',
                    'Diagnostic Imaging Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures']

assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [32]:
# Code to analyze all questions within the group

def analyze_group(question_list, n_subq_list, question_strings, filtered_specialist_df, filtered_generalist_df, alpha=0.05):
    specialist_pooled = [] # now pooling over entire group
    generalist_pooled = []
    rows = []
    sig_count = 0
    subq_tables = []
    subq_tables_names = []

    for i in range(len(question_list)):
        question_number = question_list[i]
        n_subquestions = n_subq_list[i]
        question_string = question_strings[i]

        # Run analysis
        table, subq_pooled_result, specialist_data, generalist_data, sig_subq = analyze_question(question_number, filtered_specialist_df, filtered_generalist_df, n_subquestions, verbose=False)
        pooled_stat, pooled_p, pooled_diff_mean = subq_pooled_result
        specialist_num_responses = len(specialist_data)
        generalist_num_responses = len(generalist_data)

        # Cache procedure tables
        subq_tables.append(table)
        subq_tables_names.append('Q'+str(question_number))

        # Pool
        specialist_pooled.extend(specialist_data)
        generalist_pooled.extend(generalist_data)

        # Determine significance
        if pooled_p > alpha:
            sig = ""
        else:
            sig = "*"
            sig_count += 1

        # Cache data for group summary
        row = ['Q'+str(question_number), question_string, pooled_stat, pooled_p, sig, pooled_diff_mean, n_subquestions, "%i/%i" % (sig_subq,n_subquestions), specialist_num_responses, generalist_num_responses]
        rows.append(row)

    # Assemble table of results
    group_table = pd.DataFrame(rows, columns=["Question number", "Category", "Pooled stat", "Pooled p", "Sig", "Pooled Diff Mean (specialist-generalist)", "Num subquestions", "Fraction of sig subquestions", "Pooled num specialist responses", "Pooled num generalist responses"])                     

    # Apply Kruskal test to pooled data
    pooled_stat, pooled_p = kruskal(specialist_pooled, generalist_pooled)
    pooled_diff_mean = np.mean(specialist_pooled) - np.mean(generalist_pooled)

    # Print
    print('Group result (all questions): stat=%.3f, p=%.2e, diff_mean (specialist-generalist)=%.3f, sig_subq=%s/%s' % (pooled_stat, pooled_p, pooled_diff_mean, sig_count, len(question_list)))

    return group_table, (pooled_stat, pooled_p, pooled_diff_mean), sig_count, len(question_list), len(specialist_pooled), len(generalist_pooled), (subq_tables, subq_tables_names)

In [33]:
group_table, pooled_q_stats, sig_count, n_questions, specialist_responses, generalist_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, ca_specialist, ca_generalist)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Companion Animal", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), specialist_responses, generalist_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Companion Animal")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=202.126, p=7.18e-46, diff_mean (specialist-generalist)=-0.576, sig_subq=7/8


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q16,Medical Procedures,34.426729,4.426031e-09,*,-0.408771,25,4/25,350,2107
1,Q17,Preventive Medicine/Population Health Procedures,59.772407,1.064859e-14,*,-0.997019,10,7/10,130,800
2,Q7,Surgical Procedures,41.587245,1.127263e-10,*,-0.514017,25,12/25,348,1968
3,Q8,Anesthetic Procedures,54.205118,1.806158e-13,*,-0.875702,8,6/8,96,623
4,Q9,Reproductive Procedures,3.452634,0.06315105,,-0.481872,4,0/4,48,308
5,Q10,Diagnostic Imaging Procedures,8.600648,0.003360434,*,-0.383182,12,0/12,144,898
6,Q11,Clinical Pathology Procedures,23.648996,1.156062e-06,*,-0.559368,13,5/13,153,972
7,Q12,Diagnostic Necropsy Procedures,17.290647,3.207619e-05,*,-1.064444,3,3/3,36,225


In [34]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
ss_specialist = specialist[specialist['Q1'].str.contains('Special Species', na=False, regex=False)].copy()
ss_generalist = generalist[generalist['Q1'].str.contains('Special Species', na=False, regex=False)].copy()

In [35]:
# Input info about question group

question_list = [43, 44, 45, 46, 48, 49, 50]
n_subq_list = [20, 9, 11, 8, 6, 13, 3]
question_strings = ['Medical Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Surgical Procedures', 
                    'Anesthetic Procedures', 
                    'Diagnostic Imaging Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures']

assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [36]:
group_table, pooled_q_stats, sig_count, n_questions, specialist_responses, generalist_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, ss_specialist, ss_generalist)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Special Species", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), specialist_responses, generalist_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Special Species")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=67.694, p=1.91e-16, diff_mean (specialist-generalist)=-1.084, sig_subq=4/7


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q43,Medical Procedures,69.238149,8.726371000000001e-17,*,-2.218333,20,11/20,40,300
1,Q44,Preventive Medicine/Population Health Procedures,9.464357,0.002095029,*,-1.108832,9,0/9,13,135
2,Q45,Surgical Procedures,5.794421,0.01607711,*,-0.766667,11,0/11,20,165
3,Q46,Anesthetic Procedures,16.355611,5.25005e-05,*,-1.5625,8,1/8,16,112
4,Q48,Diagnostic Imaging Procedures,0.427132,0.5133994,,0.309524,6,0/6,12,84
5,Q49,Clinical Pathology Procedures,3.31415,0.06868585,,-0.296703,13,2/13,26,182
6,Q50,Diagnostic Necropsy Procedures,0.800258,0.3710161,,0.547619,3,0/3,6,42


In [37]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
fa_specialist = specialist[specialist['Q1'].str.contains('Food Animal', na=False, regex=False)].copy()
fa_generalist = generalist[generalist['Q1'].str.contains('Food Animal', na=False, regex=False)].copy()

In [38]:
# Input info about question group

question_list = [20, 18, 25, 24, 21, 19, 23, 22, 27]
n_subq_list = [8, 27, 16, 10, 20, 11, 12, 3, 5]
question_strings = ['Handling and Husbandry Procedures',
                    'Medical Procedures',
                    'Surgical Procedures',
                    'Anesthetic Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Reproductive Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures',
                    'Diagnostic Imaging Procedures']


assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [39]:
group_table, pooled_q_stats, sig_count, n_questions, specialist_responses, generalist_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, fa_specialist, fa_generalist)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Food Animal", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), specialist_responses, generalist_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Food Animal")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=45.726, p=1.36e-11, diff_mean (specialist-generalist)=-0.472, sig_subq=5/9


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q20,Handling and Husbandry Procedures,3.324374,0.06826,,-0.488192,8,1/8,48,247
1,Q18,Medical Procedures,19.382545,1.1e-05,*,-0.513691,27,7/27,162,835
2,Q25,Surgical Procedures,0.117188,0.732104,,-0.156123,16,0/16,80,494
3,Q24,Anesthetic Procedures,6.131953,0.013276,*,-0.661935,10,0/10,50,310
4,Q21,Preventive Medicine/Population Health Procedures,15.988218,6.4e-05,*,-0.569094,20,1/20,100,618
5,Q19,Reproductive Procedures,1.812648,0.178191,,-0.485337,11,1/11,44,341
6,Q23,Clinical Pathology Procedures,9.074016,0.002593,*,-0.353829,12,0/12,48,370
7,Q22,Diagnostic Necropsy Procedures,3.278007,0.070214,,-0.41129,3,0/3,12,93
8,Q27,Diagnostic Imaging Procedures,11.335003,0.000761,*,-1.174026,5,2/5,20,154


In [40]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
eq_specialist = specialist[specialist['Q1'].str.contains('Equine', na=False, regex=False)].copy()
eq_generalist = generalist[generalist['Q1'].str.contains('Equine', na=False, regex=False)].copy()

In [42]:
# Input info about question group

question_list = [28, 29, 30, 31, 32, 33, 34, 35, 36]
n_subq_list = [7, 24, 8, 8, 15, 9, 11, 3, 5]
question_strings = ['Handling and Husbandry Procedures',
                    'Medical Procedures',
                    'Surgical Procedures',
                    'Anesthetic Procedures',
                    'Preventive Medicine/Population Health Procedures',
                    'Reproductive Procedures',
                    'Clinical Pathology Procedures',
                    'Diagnostic Necropsy Procedures',
                    'Diagnostic Imaging Procedures']


assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [43]:
group_table, pooled_q_stats, sig_count, n_questions, specialist_responses, generalist_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, eq_specialist, eq_generalist)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Equine", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), specialist_responses, generalist_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Equine")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=148.700, p=3.33e-34, diff_mean (specialist-generalist)=-1.093, sig_subq=7/9


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q28,Handling and Husbandry Procedures,6.445203,0.01112523,*,-0.931973,7,1/7,28,147
1,Q29,Medical Procedures,65.968115,4.582769e-16,*,-1.172589,24,11/24,96,503
2,Q30,Surgical Procedures,18.876364,1.394701e-05,*,-1.309524,8,2/8,32,168
3,Q31,Anesthetic Procedures,19.151742,1.207273e-05,*,-1.363095,8,2/8,32,168
4,Q32,Preventive Medicine/Population Health Procedures,41.215898,1.363094e-10,*,-1.356667,15,4/15,60,300
5,Q33,Reproductive Procedures,21.634058,3.299408e-06,*,-1.288889,9,2/9,36,180
6,Q34,Clinical Pathology Procedures,2.267377,0.1321234,,-0.339394,11,0/11,33,220
7,Q35,Diagnostic Necropsy Procedures,4.641966,0.03119924,*,-1.079096,3,1/3,9,59
8,Q36,Diagnostic Imaging Procedures,0.037262,0.8469323,,0.065538,5,0/5,14,97


## Group Summary

In [46]:
group_summary_table = pd.DataFrame(group_data, columns=group_columns)

In [47]:
ALPHA = 0.05

pvals = list(group_summary_table['Pooled p'])

sigs = []
for p in pvals:
  if p > ALPHA:
      sig = ""
  else:
      sig = "*"
  sigs.append(sig)

group_summary_table.insert(loc=3, column='Sig', value=sigs)

In [48]:
# Add group summary to the beginning of output tables
output_tables.insert(0, group_summary_table)
output_tables_sheet_names.insert(0, "Group summary")

In [49]:
group_summary_table

Unnamed: 0,Group,Pooled stat,Pooled p,Sig,Pooled diff_mean (specialist-generalist),Num questions,Fraction of sig questions,Pooled num specialist reponses,Pooled num generalist responses
0,Companion Animal,202.12595,7.176521e-46,*,-0.576258,8,7/8,1305,7901
1,Special Species,67.693649,1.909777e-16,*,-1.083879,7,4/7,133,1020
2,Food Animal,45.726158,1.359946e-11,*,-0.471653,9,5/9,564,3462
3,Equine,148.70029,3.334673e-34,*,-1.092671,9,7/9,340,1842


# Generate tables

We will generate the following tables using pooled data from these experiments:

1.   `summary_sg.xlsx`: Group summary table and a table for procedure sets (questions) within each group.
2.   `companion_animal_sg.xlsx`: Tables for all procedures within the companion animal group.
3.   `special_species_sg.xlsx`: Tables for all procedures within the special species group.
4.   `food_animal_sg.xlsx`: Tables for all procedures within the food animal group.
5.   `equine_sg.xlsx`:Tables for all procedures within the equine group.



## Summary

In [50]:
writer = pd.ExcelWriter('summary_sg.xlsx', engine='xlsxwriter')

for i,table in enumerate(output_tables):
    sheet_name = output_tables_sheet_names[i]
    table.to_excel(writer, sheet_name=sheet_name, index=False)

    # Auto-adjust columns widths
    for column in table:
        column_width = max(table[column].astype(str).map(len).max(), len(column))
        col_idx = table.columns.get_loc(column)
        writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

writer.save()

## All procedures

In [51]:
for i, file in enumerate(['companion_animal_sg.xlsx', 'special_species_sg.xlsx', 'food_animal_sg.xlsx', 'equine_sg.xlsx']):
    subq_data = output_subq_data[i]
    subq_tables, subq_tables_names = subq_data

    # Loop through tables
    writer = pd.ExcelWriter(file, engine='xlsxwriter')

    for i,table in enumerate(subq_tables):
        sheet_name = subq_tables_names[i]
        table.to_excel(writer, sheet_name=sheet_name, index=False)

        # Auto-adjust columns widths
        for column in table:
            column_width = max(table[column].astype(str).map(len).max(), len(column))
            col_idx = table.columns.get_loc(column)
            writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

    writer.save()