<a href="https://colab.research.google.com/github/nathanbollig/vet-graduate-expectations-survey/blob/main/WVMA_table.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Veterinary graduate expectations survey - WVDL table

Start by uploading `WVMA.xlsx` into the working directory. The purpose of this notebook is to produce the following table: shows the top 10 tasks (wrt level of expected independence) in all four emphasis areas from the WVMA results.

## Set up

In [1]:
import pandas as pd
import numpy as np

### Read in WVMA data

In [2]:
# Use top row as header and skip second header row
wvma = pd.read_excel('WVMA.xlsx', header=0, skiprows=lambda x: x in [1])  

# Read in questions from second header row and associate with column names
question_wvma = {}

top_rows_wvma = pd.read_excel('WVMA.xlsx', nrows=2) 

for col in list(top_rows_wvma.columns):
    question_wvma[col] = top_rows_wvma.iloc[0][col]

### Encoding

Let's encode the expectation response in the following way:

 * 0: No Expectation to Perform Procedure

 * 1: Perform with Assistance (assist with portions of procedure)
 
 * 2: Perform with Direct Supervision (present in room during procedure)

 * 3: Perform with Indirect Supervision (available in building or by phone if needed)

 * 4: Perform Independently

In [3]:
def encode_expectation(response_string):
    if isinstance(response_string, str) == False:
        return response_string
    
    # Encode string
    s = response_string.lower()
    if s.find('no expectation') > -1:
        return 0
    elif s.find('with assistance') > -1:
        return 1
    elif s.find('indirect supervision') > -1:
        return 3
    elif s.find('direct supervision') > -1:
        return 2
    elif s.find('independently') > -1:
        return 4
    else:
        return response_string

In [4]:
for col in wvma.columns:
    wvma[col] = wvma[col].apply(lambda x: encode_expectation(x))

### Filtering columns

In [5]:
# Drop all initial columns that aren't task questions
wvma = wvma.iloc[:, 23:]

In [6]:
# Drop all questions that refer to species categories
cols = [c for c in wvma.columns if '_' in c]
wvma = wvma[cols]

In [7]:
wvma

Unnamed: 0,Q16_1,Q16_2,Q16_3,Q16_4,Q16_5,Q16_6,Q16_7,Q16_8,Q16_9,Q16_10,Q16_11,Q16_12,Q16_13,Q16_14,Q16_15,Q16_16,Q16_17,Q16_18,Q16_19,Q16_20,Q16_21,Q16_22,Q16_23,Q16_24,Q16_25,Q17_1,Q17_2,Q17_3,Q17_4,Q17_5,Q17_6,Q17_7,Q17_8,Q17_9,Q17_10,Q7_1,Q7_2,Q7_3,Q7_4,Q7_5,...,Q34_5,Q34_6,Q34_7,Q34_8,Q34_9,Q34_10,Q34_11,Q35_1,Q35_2,Q35_3,Q36_1,Q36_2,Q36_3,Q36_4,Q36_5,Q14_1,Q14_2,Q14_3,Q14_4,Q14_5,Q14_6,Q13_1,Q13_2,Q13_3,Q13_4,Q13_5,Q13_6,Q13_7,Q13_8,Q13_9,Q13_10,Q13_11,Q15_1,Q15_2,Q15_3,Q15_4,Q15_5,Q15_6,Q15_7,Q15_8
0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,3.0,4.0,4.0,4.0,4.0,3.0,3.0,3.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,4.0,3.0,4.0,4.0,3.0,4.0,3.0,4.0,...,,,,,,,,,,,,,,,,3.0,3.0,4.0,3.0,1.0,1.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,3.0,3.0,4.0,4.0,3.0,4.0,3.0,3.0,3.0,3.0,3.0
1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,3.0,4.0,4.0,2.0,3.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,4.0,0.0,4.0,4.0,4.0
2,4.0,3.0,4.0,3.0,3.0,3.0,3.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,3.0,4.0,3.0,4.0,4.0,4.0,4.0,3.0,3.0,3.0,3.0,2.0,3.0,3.0,3.0,3.0,3.0,4.0,4.0,3.0,3.0,3.0,2.0,4.0,3.0,4.0,...,,,,,,,,,,,,,,,,4.0,4.0,4.0,3.0,2.0,2.0,4.0,4.0,4.0,4.0,3.0,3.0,3.0,3.0,2.0,3.0,3.0,0.0,3.0,3.0,3.0,0.0,3.0,3.0,2.0
3,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,3.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,4.0,0.0,4.0,4.0,3.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,4.0,3.0,3.0,...,,,,,,,,,,,,,,,,3.0,0.0,4.0,0.0,0.0,1.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,3.0,4.0,3.0,3.0,4.0,4.0,4.0
4,4.0,1.0,4.0,2.0,2.0,3.0,3.0,3.0,4.0,4.0,1.0,2.0,2.0,2.0,2.0,4.0,2.0,4.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,2.0,3.0,2.0,2.0,1.0,3.0,3.0,3.0,3.0,3.0,1.0,2.0,1.0,2.0,3.0,...,,,,,,,,,,,,,,,,1.0,1.0,3.0,1.0,1.0,1.0,4.0,4.0,3.0,3.0,3.0,3.0,1.0,4.0,0.0,3.0,4.0,1.0,1.0,1.0,4.0,4.0,4.0,4.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
170,3.0,2.0,3.0,1.0,1.0,3.0,2.0,2.0,2.0,3.0,2.0,0.0,3.0,3.0,1.0,3.0,0.0,3.0,2.0,2.0,3.0,2.0,2.0,2.0,2.0,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
171,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
172,4.0,4.0,3.0,2.0,2.0,3.0,4.0,4.0,4.0,1.0,2.0,2.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,4.0,1.0,1.0,4.0,1.0,2.0,2.0,2.0,1.0,4.0,2.0,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
173,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


## Compute aggregate measures over columns

In [8]:
# Average across all columns
avg = wvma.mean(axis=0)

In [9]:
# Count of non-null responses across all columns
count = wvma.count(axis=0)

## Compute emphasis area for each column

In [10]:
def area_for_column_name(colname):
    colname = colname.split('Q')[1] # take after Q
    colname = colname.split('_')[0] # take before '_'
    q = int(colname)

    if q in [16,17,7,8,9,10,11,12]:
        return "Companion Animal"
    elif q in [20, 18, 25, 24, 21, 19, 23, 22, 27]:
        return "Food Animal"
    elif q in [43, 44, 45, 46, 48, 49, 50]:
        return "Special Species"
    elif q in [28, 29, 30, 31, 32, 33, 34, 35, 36]:
        return "Equine"
    else: 
        return "" # Not a question in an emphasis area


In [11]:
areas = {}
for c in avg.index:
    area = area_for_column_name(c)
    areas[c] = area

In [12]:
areas = pd.Series(areas, index=avg.index)

## Get question text

In [13]:
questions = {}
for c in avg.index:
    q_string = question_wvma[c].split('-')[2]
    questions[c] = q_string

questions = pd.Series(questions, index=avg.index)

## Assemble a table

In [14]:
df = pd.concat([areas, questions, avg, count], axis=1)
df.columns = ['Emphasis area', 'Question', 'Mean expected independence', 'Number of respondents']

In [15]:
# Drop rows without an emphasis area
df = df[df[df.columns[0]] != ""]

In [16]:
df

Unnamed: 0,Emphasis area,Question,Mean expected independence,Number of respondents
Q16_1,Companion Animal,Obtain history and perform complete PE,3.878788,99
Q16_2,Companion Animal,Perform ophthalmic exam,3.240000,100
Q16_3,Companion Animal,Perform otoscopic exam,3.670000,100
Q16_4,Companion Animal,Perform neurologic exam,3.040000,100
Q16_5,Companion Animal,Perform orthopedic exam,3.050000,100
...,...,...,...,...
Q36_1,Equine,Determine machine settings for radiographs,2.956522,23
Q36_2,Equine,Take radiographs,3.045455,22
Q36_3,Equine,Interpret radiographic images,2.681818,22
Q36_4,Equine,Perform ultrasound (non,2.181818,22


## Table operations

We will now group by emphasis area, sort by mean expected independence within each group, and filter to top 5 within each group.

In [17]:
df = df.sort_values([df.columns[0], df.columns[2]],ascending=[True, False]).groupby(df.columns[0]).head(5)

## Save table

In [18]:
df.to_csv("WVMA_most_independent_tasks.csv")

# Next Steps


In [19]:
# Copy files to Drive
!cp *.xlsx drive/MyDrive/survey_test/