<a href="https://colab.research.google.com/github/nathanbollig/vet-graduate-expectations-survey/blob/main/analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Veterinary graduate expectations survey

Start by uploading the data into the working directory. Two files are required:

1.   `SVM.xlsx`: SVM graduate expectations survey results
2.   `WVMA.xlsx`: WVMA graduate expectations survey results

## Set up

In [59]:
! pip install xlsxwriter



In [60]:
import pandas as pd
import numpy as np
from scipy.stats import kruskal

### Read in SVM data

In [61]:
# Use top row as header and skip second header row
svm = pd.read_excel('SVM.xlsx', header=0, skiprows=lambda x: x in [1])  

# Read in questions from second header row and associate with column names
question_svm = {}

top_rows = pd.read_excel('SVM.xlsx', nrows=2) 

for col in list(top_rows.columns):
    question_svm[col] = top_rows.iloc[0][col]

### Read in WVMA data

In [62]:
# Use top row as header and skip second header row
wvma = pd.read_excel('WVMA.xlsx', header=0, skiprows=lambda x: x in [1])  

# Read in questions from second header row and associate with column names
question_wvma = {}

top_rows_wvma = pd.read_excel('WVMA.xlsx', nrows=2) 

for col in list(top_rows_wvma.columns):
    question_wvma[col] = top_rows_wvma.iloc[0][col]

### Set Analysis Parameters

In [63]:
ALPHA = 0.05

In [64]:
"""
The `analysis_mode` variable specifies which two main populations are being compared in this analysis. Possible values are:
    0 - SVM vs. WVMA
    1 - SVM specialists vs. WVMA specialists
    2 - WVMA specialists vs. WVMA generalists
"""

analysis_mode = 0

In [65]:
"""
Technical or non-technical.

The nontechnical questions are Q13, Q14, and Q16 for all species categories.
"""
nontechnical = False

Run the following to set up the notebook for this analysis.

In [66]:
# Preparation for SVM vs. WVMA
if analysis_mode == 0:
    pop1 = svm.copy()
    pop2 = wvma.copy()
    pop1_str = "SVM"
    pop2_str = "WVMA"
    file_suffix = ""

# Preparation for SVM specialists vs. WVMA specialists
if analysis_mode == 1:
    pop1 = svm[svm['Q59'].notnull()].copy()
    pop2 = wvma[wvma['Q49'].notnull()].copy()
    pop1_str = "SVM"
    pop2_str = "WVMA"
    file_suffix = "_s"

# Preparation for WVMA specialists vs. WVMA generalists
if analysis_mode == 2:
    pop1 = wvma[wvma['Q49'].notnull()].copy()
    pop2 = wvma[wvma['Q49'].isnull()].copy()
    pop1_str = "specialist"
    pop2_str = "generalist"
    file_suffix = "_sg"

# Adjust file suffix for nontechnical analysis
if nontechnical == True:
    file_suffix = file_suffix + "_nontechnical"

### Counts of species area

Let's look at the counts of species area (`Q1`) in each population. First, note that this question allowed multiple responses, which appear as a common-delimited list. The below code counts how many times each species appears, taking into account the possible of multiple responses.

In [67]:
from collections import defaultdict

pop1_counts = defaultdict(int) # start each count at zero by default

for entry in list(pop1.Q1):
    if isinstance(entry, str):
        species_list = entry.split(',')
        for species in species_list:
            pop1_counts[species] += 1
    elif np.isnan(entry) == True:
        pop1_counts["empty"] += 1

print("*** %s Survey ***" % (pop1_str,))
for key, val in pop1_counts.items():
    print("%s: %i" % (key, val))

*** pop1 Survey ***
Food Animal (bovine): 6
Companion Animal (canine and/or feline): 15
Equine: 5
Special Species (ex. exotic companion animals): 2
empty: 1


In [68]:
pop2_counts = defaultdict(int) # start each count at zero by default

for entry in list(pop2.Q1):
    if isinstance(entry, str):
        species_list = entry.split(',')
        for species in species_list:
            pop2_counts[species] += 1
    elif np.isnan(entry) == True:
        pop2_counts["empty"] += 1

print("*** %s Survey ***" % (pop2_str,))
for key, val in pop2_counts.items():
    print("%s: %i" % (key, val))

*** pop2 Survey ***
Companion Animal (canine and/or feline): 100
Food Animal (bovine): 42
Equine: 25
Special Species (ex. exotic companion animals): 19
empty: 28


### Note about organization

There are several levels of organization in our interpretation of this data.

 * `Group`: One of the 4 species groups (companion animal, special species, food animal, or equine)
     * `Question`: A group of procedures in a category such as "Medical Procedures" or "Surgical Procedures"
          * `Sub-question`: A particular procedure

We can perform analysis at the sub-question level, or pool upwards to the question or group level. I will do all of this below.





## Question analysis

Let's encode the expectation response in the following way:

 * 0: No Expectation to Perform Procedure

 * 1: Perform with Assistance (assist with portions of procedure)
 
 * 2: Perform with Direct Supervision (present in room during procedure)

 * 3: Perform with Indirect Supervision (available in building or by phone if needed)

 * 4: Perform Independently

In [69]:
def encode_expectation(response_string):
    if isinstance(response_string, int) == True:
        return response_string
    
    # Encode nan values as -1
    if isinstance(response_string, str) == False:
        if np.isnan(response_string) == True:
            return -1
    
    # Encode string
    s = response_string.lower()
    if s.find('no expectation') > -1:
        return 0
    elif s.find('with assistance') > -1:
        return 1
    elif s.find('indirect supervision') > -1:
        return 3
    elif s.find('direct supervision') > -1:
        return 2
    elif s.find('independently') > -1:
        return 4
    else:
        print(response_string)
        raise ValueError('Expected performance response was not formatted as expected.')

We will refactor the above into reusable code.

In [70]:
def analyze_question(question_number, filtered_pop1_df, filtered_pop2_df, n_subquestions, alpha=0.05, verbose=True):
    """
    Perform an analysis of a given question on a species-filtered dataframe.
    
    Inputs:
        question_number: main question number to analyze
        filtered_pop1_df: pop1 dataframe filtered to respondants with the desired species area
        filtered_pop2_df: pop2 dataframe filtered to respondants with the desired species area
        n_subquestions: number of subquestions in the main question
        alpha: power level for the statistical test

    Prints a summary of results.

    Outputs:
        table: summary table
        (pooled_stat, pooled_p, pooled_diff_mean): tuple of statistics describing output of Kruskal test on data pooled across subquestions
        pop1_data: list of pooled pop1 data
        pop2_data: list of pooled pop2 data
        sig_count: number of subquestions with significant difference detected (between pop1 and pop2 responses), according to Kruskal test applied at subquestion level

    """

    pop1_counts = np.zeros((n_subquestions, 6), dtype=int) # Row for each question, column for empty (-1), 0, 1, 2, 3, and 4 responses
    pop2_counts = np.zeros((n_subquestions, 6), dtype=int) # Row for each question, column for empty (-1), 0, 1, 2, 3, and 4 responses
    rows = []
    pop1_pooled = []
    pop2_pooled = []
    sig_count = 0

    for i in range(1, n_subquestions+1):
        qkey = "Q" + str(question_number) + "_" + str(i)
        qstring = question_svm[qkey].split('-')[2] # could refer to questions_svm or questions_wvma

        # Encoding
        filtered_pop1_df[qkey] = filtered_pop1_df[qkey].apply(lambda x: encode_expectation(x))
        filtered_pop2_df[qkey] = filtered_pop2_df[qkey].apply(lambda x: encode_expectation(x))

        # pop1 tally
        counts = filtered_pop1_df[qkey].value_counts(dropna=False)
        for key in counts.keys():
            pop1_counts[i-1][key+1] += counts[key] # question index is 1-based; keys range from -1 to 4
        counts = pop1_counts[i-1][1:] # counts of 0, 1, 2, 3, and 4
        pop1_num_responses = np.sum(counts)
        pop1_mean = (0*counts[0] + 1*counts[1] + 2*counts[2] + 3*counts[3] + 4*counts[4]) / pop1_num_responses

        # pop2 tally
        counts = filtered_pop2_df[qkey].value_counts(dropna=False)
        for key in counts.keys():
            pop2_counts[i-1][key+1] += counts[key]
        counts = pop2_counts[i-1][1:] # counts of 0, 1, 2, 3, and 4
        pop2_num_responses = np.sum(counts)
        pop2_mean = (0*counts[0] + 1*counts[1] + 2*counts[2] + 3*counts[3] + 4*counts[4]) / pop2_num_responses
        
        # Get data
        pop1_data = list(filtered_pop1_df[qkey])
        pop2_data = list(filtered_pop2_df[qkey])

        # Remove empty values from data
        pop1_data = [x for x in pop1_data if x != -1]
        pop2_data = [x for x in pop2_data if x != -1]

        assert(pop1_num_responses == len(pop1_data))
        assert(pop2_num_responses == len(pop2_data))

        # compare samples
        if len(pop1_data) >= 5 and len(pop2_data) >= 5:
            stat, p = kruskal(pop1_data, pop2_data)
        else:
            stat = 0
            p = 1

        # Determine significance
        if p > alpha:
            sig = ""
        else:
            sig = "*"
            sig_count += 1

        # Cache for pooled data
        pop1_pooled.extend(pop1_data)
        pop2_pooled.extend(pop2_data)

        # Cache for table of results
        row = [qstring] + list(pop1_counts[i-1]) + [pop1_mean, pop1_num_responses] + list(pop2_counts[i-1]) + [pop2_mean, pop2_num_responses, pop1_mean-pop2_mean, stat, p, sig]
        rows.append(row)

    # Assemble table of results
    table = pd.DataFrame(rows, columns=["Subquestion", pop1_str+": empty", pop1_str+": 0", pop1_str+": 1", 
                                        pop1_str+": 2", pop1_str+": 3", pop1_str+": 4", pop1_str+": avg", pop1_str+": num responses", 
                                        pop2_str+": empty", pop2_str+": 0", pop2_str+": 1", pop2_str+": 2", pop2_str+": 3", pop2_str+": 4", pop2_str+": avg", pop2_str+": num responses", 
                                        "Diff Mean ("+pop1_str+"-"+pop2_str+")", "stat", "pval", "sig"])

    # Apply Kruskal test to pooled data
    pooled_stat, pooled_p = kruskal(pop1_pooled, pop2_pooled)
    pooled_diff_mean = np.mean(pop1_pooled) - np.mean(pop2_pooled)

    # Print
    if verbose == True:
        print('Pooled Q%s: stat=%.3f, p=%.2e, diff_mean (%s-%s)=%.3f, sig_subq=%s/%s' % (question_number, pooled_stat, pooled_p, pop1_str, pop2_str, pooled_diff_mean, sig_count, n_subquestions))

    return table, (pooled_stat, pooled_p, pooled_diff_mean), pop1_pooled, pop2_pooled, sig_count


## Group Analysis

In [71]:
# Code to analyze all questions within the group

def analyze_group(question_list, n_subq_list, question_strings, filtered_pop1_df, filtered_pop2_df, alpha=0.05):
    pop1_pooled = [] # now pooling over entire group
    pop2_pooled = []
    rows = []
    sig_count = 0
    subq_tables = []
    subq_tables_names = []

    for i in range(len(question_list)):
        question_number = question_list[i]
        n_subquestions = n_subq_list[i]
        question_string = question_strings[i]

        # Run analysis
        table, subq_pooled_result, pop1_data, pop2_data, sig_subq = analyze_question(question_number, filtered_pop1_df, filtered_pop2_df, n_subquestions, verbose=False, alpha=alpha)
        pooled_stat, pooled_p, pooled_diff_mean = subq_pooled_result
        pop1_num_responses = len(pop1_data)
        pop2_num_responses = len(pop2_data)

        # Cache procedure tables
        subq_tables.append(table)
        subq_tables_names.append('Q'+str(question_number))

        # Pool
        pop1_pooled.extend(pop1_data)
        pop2_pooled.extend(pop2_data)

        # Determine significance
        if pooled_p > alpha:
            sig = ""
        else:
            sig = "*"
            sig_count += 1

        # Cache data for group summary
        row = ['Q'+str(question_number), question_string, pooled_stat, pooled_p, sig, pooled_diff_mean, n_subquestions, "%i/%i" % (sig_subq,n_subquestions), pop1_num_responses, pop2_num_responses]
        rows.append(row)

    # Assemble table of results
    group_table = pd.DataFrame(rows, columns=["Question number", "Category", "Pooled stat", "Pooled p", "Sig", "Pooled Diff Mean (%s-%s)"%(pop1_str, pop2_str), "Num subquestions", "Fraction of sig subquestions", "Pooled num %s responses"%(pop1_str,), "Pooled num %s responses"%(pop2_str,)])                     

    # Apply Kruskal test to pooled data
    pooled_stat, pooled_p = kruskal(pop1_pooled, pop2_pooled)
    pooled_diff_mean = np.mean(pop1_pooled) - np.mean(pop2_pooled)

    # Print
    print('Group result (all questions): stat=%.3f, p=%.2e, diff_mean (%s-%s)=%.3f, sig_subq=%s/%s' % (pooled_stat, pooled_p, pop1_str, pop2_str, pooled_diff_mean, sig_count, len(question_list)))

    return group_table, (pooled_stat, pooled_p, pooled_diff_mean), sig_count, len(question_list), len(pop1_pooled), len(pop2_pooled), (subq_tables, subq_tables_names)

In [72]:
# cache data across all groups
group_data = []
group_columns = ["Group", "Pooled stat", "Pooled p", "Pooled diff_mean (%s-%s)"%(pop1_str,pop2_str), "Num questions", "Fraction of sig questions", "Pooled num %s reponses"%(pop1_str,), "Pooled num %s responses"%(pop2_str,)]

In [73]:
# cache tables
output_tables = []
output_tables_sheet_names = []

# cache subquestion table data
output_subq_data = []

### Companion Animal Group

In [74]:
# Filter dataframe to only companion animal respondants (may have responded to other species too)
ca_pop1 = pop1[pop1['Q1'].str.contains('Companion Animal (canine and/or feline)', na=False, regex=False)].copy()

In [75]:
# Filter dataframe to only companion animal respondants (may have responded to other species too)
ca_pop2 = pop2[pop2['Q1'].str.contains('Companion Animal (canine and/or feline)', na=False, regex=False)].copy()

In [76]:
# Input info about question group

if nontechnical == False:
    question_list = [16,17,7,8,9,10,11,12]
    n_subq_list = [25,10,25,8,4,12,13,3]
    question_strings = ['Medical Procedures',
                        'Preventive Medicine/Population Health Procedures',
                        'Surgical Procedures', 
                        'Anesthetic Procedures', 
                        'Reproductive Procedures',
                        'Diagnostic Imaging Procedures',
                        'Clinical Pathology Procedures',
                        'Diagnostic Necropsy Procedures']
else:
    question_list = [13,14,15]
    n_subq_list = [11,6,8]
    question_strings = ['Communication practices',
                        'Professional and business practices',
                        'Ethics and professional practices']

assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [77]:
group_table, pooled_q_stats, sig_count, n_questions, pop1_responses, pop2_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, ca_pop1, ca_pop2, alpha=ALPHA)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Companion Animal", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), pop1_responses, pop2_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Companion Animal")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=24.292, p=8.28e-07, diff_mean (specialist-generalist)=-0.405, sig_subq=3/3


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q13,Communication practices,17.491567,2.9e-05,*,-0.453786,11,2/11,109,791
1,Q14,Professional and business practices,7.945675,0.00482,*,-0.542508,6,0/6,60,436
2,Q15,Ethics and professional practices,4.217986,0.039998,*,-0.237153,8,0/8,80,576


### Special Species Group

In [78]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
ss_pop1 = pop1[pop1['Q1'].str.contains('Special Species', na=False, regex=False)].copy()
ss_pop2 = pop2[pop2['Q1'].str.contains('Special Species', na=False, regex=False)].copy()

In [79]:
# Input info about question group

if nontechnical == False:
    question_list = [43, 44, 45, 46, 48, 49, 50]
    n_subq_list = [20, 9, 11, 8, 6, 13, 3]
    question_strings = ['Medical Procedures',
                        'Preventive Medicine/Population Health Procedures',
                        'Surgical Procedures', 
                        'Anesthetic Procedures', 
                        'Diagnostic Imaging Procedures',
                        'Clinical Pathology Procedures',
                        'Diagnostic Necropsy Procedures']
else:
    question_list = [13,14,15]
    n_subq_list = [11,6,8]
    question_strings = ['Communication practices',
                        'Professional and business practices',
                        'Ethics and professional practices']
                        
assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [80]:
group_table, pooled_q_stats, sig_count, n_questions, pop1_responses, pop2_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, ss_pop1, ss_pop2, alpha=ALPHA)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Special Species", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), pop1_responses, pop2_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Special Species")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=21.350, p=3.83e-06, diff_mean (specialist-generalist)=-1.480, sig_subq=2/3


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q13,Communication practices,21.823513,3e-06,*,-2.146853,11,0/11,11,143
1,Q14,Professional and business practices,9.284847,0.002311,*,-1.844156,6,0/6,6,77
2,Q15,Ethics and professional practices,1.30041,0.254138,,-0.278846,8,0/8,8,104


### Food Animal Group

In [81]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
fa_pop1 = pop1[pop1['Q1'].str.contains('Food Animal', na=False, regex=False)].copy()
fa_pop2 = pop2[pop2['Q1'].str.contains('Food Animal', na=False, regex=False)].copy()

In [82]:
# Input info about question group

if nontechnical == False:
    question_list = [20, 18, 25, 24, 21, 19, 23, 22, 27]
    n_subq_list = [8, 27, 16, 10, 20, 11, 12, 3, 5]
    question_strings = ['Handling and Husbandry Procedures',
                        'Medical Procedures',
                        'Surgical Procedures',
                        'Anesthetic Procedures',
                        'Preventive Medicine/Population Health Procedures',
                        'Reproductive Procedures',
                        'Clinical Pathology Procedures',
                        'Diagnostic Necropsy Procedures',
                        'Diagnostic Imaging Procedures']
else:
    question_list = [13,14,15]
    n_subq_list = [11,6,8]
    question_strings = ['Communication practices',
                        'Professional and business practices',
                        'Ethics and professional practices']

assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [83]:
group_table, pooled_q_stats, sig_count, n_questions, pop1_responses, pop2_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, fa_pop1, fa_pop2, alpha=ALPHA)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Food Animal", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), pop1_responses, pop2_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Food Animal")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=1.412, p=2.35e-01, diff_mean (specialist-generalist)=-0.321, sig_subq=0/3


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q13,Communication practices,0.061488,0.80416,,-0.326057,11,0/11,44,329
1,Q14,Professional and business practices,0.732074,0.392212,,-0.336111,6,0/6,24,180
2,Q15,Ethics and professional practices,1.014992,0.31371,,-0.302083,8,0/8,32,240


### Equine Group

In [84]:
# Filter dataframes to only companion animal respondants (may have responded to other species too)
eq_pop1 = pop1[pop1['Q1'].str.contains('Equine', na=False, regex=False)].copy()
eq_pop2 = pop2[pop2['Q1'].str.contains('Equine', na=False, regex=False)].copy()

In [85]:
# Input info about question group

if nontechnical == False:
    question_list = [28, 29, 30, 31, 32, 33, 34, 35, 36]
    n_subq_list = [7, 24, 8, 8, 15, 9, 11, 3, 5]
    question_strings = ['Handling and Husbandry Procedures',
                        'Medical Procedures',
                        'Surgical Procedures',
                        'Anesthetic Procedures',
                        'Preventive Medicine/Population Health Procedures',
                        'Reproductive Procedures',
                        'Clinical Pathology Procedures',
                        'Diagnostic Necropsy Procedures',
                        'Diagnostic Imaging Procedures']
else:
    question_list = [13,14,15]
    n_subq_list = [11,6,8]
    question_strings = ['Communication practices',
                        'Professional and business practices',
                        'Ethics and professional practices']

assert(len(question_list) == len(n_subq_list))
assert(len(n_subq_list) == len(question_strings))

In [86]:
group_table, pooled_q_stats, sig_count, n_questions, pop1_responses, pop2_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, eq_pop1, eq_pop2, alpha=ALPHA)
pooled_stat, pooled_p, pooled_diff_mean = pooled_q_stats
group_data.append(["Equine", pooled_stat, pooled_p, pooled_diff_mean, n_questions, "%i/%i" % (sig_count,n_questions), pop1_responses, pop2_responses])
output_tables.append(group_table)
output_tables_sheet_names.append("Equine")
output_subq_data.append(subq_data)
group_table

Group result (all questions): stat=13.455, p=2.44e-04, diff_mean (specialist-generalist)=-0.744, sig_subq=2/3


Unnamed: 0,Question number,Category,Pooled stat,Pooled p,Sig,Pooled Diff Mean (specialist-generalist),Num subquestions,Fraction of sig subquestions,Pooled num specialist responses,Pooled num generalist responses
0,Q13,Communication practices,10.265039,0.001356,*,-0.781818,11,0/11,33,220
1,Q14,Professional and business practices,4.246328,0.039335,*,-1.047222,6,0/6,18,120
2,Q15,Ethics and professional practices,2.271741,0.131752,,-0.464583,8,0/8,24,160


## Group Summary

In [87]:
group_summary_table = pd.DataFrame(group_data, columns=group_columns)

In [88]:
pvals = list(group_summary_table['Pooled p'])

sigs = []
for p in pvals:
  if p > ALPHA:
      sig = ""
  else:
      sig = "*"
  sigs.append(sig)

group_summary_table.insert(loc=3, column='Sig', value=sigs)

In [89]:
# Add group summary to the beginning of output tables
output_tables.insert(0, group_summary_table)
output_tables_sheet_names.insert(0, "Group summary")

In [90]:
group_summary_table

Unnamed: 0,Group,Pooled stat,Pooled p,Sig,Pooled diff_mean (specialist-generalist),Num questions,Fraction of sig questions,Pooled num specialist reponses,Pooled num generalist responses
0,Companion Animal,24.291944,8.278459e-07,*,-0.404834,3,3/3,249,1803
1,Special Species,21.350275,3.825658e-06,*,-1.48,3,2/3,25,324
2,Food Animal,1.412319,0.2346718,,-0.320561,3,0/3,100,749
3,Equine,13.454947,0.000244361,*,-0.744,3,2/3,75,500


# Generate tables

We will generate the following types of tables using pooled data from these experiments:

1.   `summary.xlsx`: Group summary table and a table for procedure sets (questions) within each group.
2.   `companion_animal.xlsx`: Tables for all procedures within the companion animal group.
3.   `special_species.xlsx`: Tables for all procedures within the special species group.
4.   `food_animal.xlsx`: Tables for all procedures within the food animal group.
5.   `equine.xlsx`:Tables for all procedures within the equine group.
6. `summary_nontechnical_allspecies.xlsx`: Summary table for responses to procedures (subquestions) pooled across species areas. Applicable only to non-technical questions.



## Summary

In [91]:
writer = pd.ExcelWriter('summary%s.xlsx'%(file_suffix,), engine='xlsxwriter')

for i,table in enumerate(output_tables):
    sheet_name = output_tables_sheet_names[i]
    table.to_excel(writer, sheet_name=sheet_name, index=False)

    # Auto-adjust columns widths
    for column in table:
        column_width = max(table[column].astype(str).map(len).max(), len(column))
        col_idx = table.columns.get_loc(column)
        writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

writer.save()

## All Species Summary

This output table only applies to the nontechnical analyses, for which the relevant questions appear in all species areas.

In [92]:
if nontechnical == True:

    group_table, pooled_q_stats, sig_count, n_questions, pop1_responses, pop2_responses, subq_data  = analyze_group(question_list, n_subq_list, question_strings, pop1, pop2)
    subq_tables, subq_tables_names = subq_data

    # Loop through tables
    writer = pd.ExcelWriter('summary%s_allspecies.xlsx'%(file_suffix,), engine='xlsxwriter')

    for i,table in enumerate(subq_tables):
        sheet_name = subq_tables_names[i]
        table.to_excel(writer, sheet_name=sheet_name, index=False)

        # Auto-adjust columns widths
        for column in table:
            column_width = max(table[column].astype(str).map(len).max(), len(column))
            col_idx = table.columns.get_loc(column)
            writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

    writer.save()

Group result (all questions): stat=8.354, p=3.85e-03, diff_mean (specialist-generalist)=-0.224, sig_subq=1/3


## All procedures

In [93]:
for i, file in enumerate(['companion_animal%s.xlsx'%(file_suffix,), 'special_species%s.xlsx'%(file_suffix,), 'food_animal%s.xlsx'%(file_suffix,), 'equine%s.xlsx'%(file_suffix,)]):
    subq_data = output_subq_data[i]
    subq_tables, subq_tables_names = subq_data

    # Loop through tables
    writer = pd.ExcelWriter(file, engine='xlsxwriter')

    for i,table in enumerate(subq_tables):
        sheet_name = subq_tables_names[i]
        table.to_excel(writer, sheet_name=sheet_name, index=False)

        # Auto-adjust columns widths
        for column in table:
            column_width = max(table[column].astype(str).map(len).max(), len(column))
            col_idx = table.columns.get_loc(column)
            writer.sheets[sheet_name].set_column(col_idx, col_idx, column_width)

    writer.save()