## **PULSE SURVEY 11 VALIDATION** 

1. Check the Question Stem Total column for at least 3 single select questions
2. Check Count and Demographic Value Totals column for each demographic for at least 2 single select questions
    - Note that Reporting College and Multiple Ethnicities are double counting demographics, which means that each student with multiple majors / ethnicities is counted once in each unique category. So, a student in L&S and CNR are counted as 2 responses- one from L&S and one from CNR. This means that their Demographic Value Totals will add up to more than their Question Stem Totals
3. Check Count and Demographic Value Totals, by Undergrad Grad column for one non-double-counting demographic and one double-counting demographic for at least 2 single-select questions (preferably questions that haven’t been checked)
4. Check that each Question Stem Id matches their Question Stem/Item & Question Response
    - Use Pulse Survey Content documents for this (must download the .docx files to be able to view)
    - While you’re doing this, make sure the text looks correct
5. Repeat the same thing for multi select questions

### **Demographic Categories**
**Double counting**
- Reporting College
- Multiple Ethnicities

**Non-double-counting**
- Undergrad Grad
- Derived Residency Desc
- Entry Status Desc
- Ucb Level1 Ethnic Rollup Desc
- Ucb Level2 Ethnic Rollup Desc


In [295]:
from sklearn.pipeline import Pipeline, FeatureUnion
import pandas as pd
import numpy as np
from IPython.display import display

os.chdir('/Users/roselee/VCUEDataTeam/Pulse Survey Data Source Generation/')

In [296]:
%run cleaning_transformers.ipynb

In [297]:
%run multiselect_counter_transformers.ipynb

In [298]:
DATA_SOURCE = pd.read_csv('11_ps_data_source.csv')
RAW_SURVEY = pd.read_csv('pulse_survey_11_raw_data.csv')

  RAW_SURVEY = pd.read_csv('pulse_survey_11_raw_data.csv')


##
## 1. Clean raw data

In [299]:
# data cleaning variables
COLUMNS_TO_REMOVE = ['RecordedDate'] ## may need to add:'PHQ2SCORE', 'GAD2SCORE', 'PHQ2', 'GAD2'
UNGRAD_GRAD_COL = 'UNGRADGRADCD' ## may need to replace
RESIDENCY_COL = 'RESIDENCY' ## may need to replace
ENTRY_STATUS_COL = 'ENTRYSTATUSDESC' ## may need to replace
ETH_LEVEL1_COL = 'LEVEL1ETH' ## may need to replace
ETH_LEVEL2_COL = 'LEVEL2ETH' ## may need to replace
VALUES_TO_NULLIFY = [-99, '-99', -1, '-1', -999, '-999', 'Not selected'] ## may need to replace

############# OPTIONAL: use ONLY if Reporting College cols look like a stem id #############
# rename reporintg college columns to avoid them getting treated as a question
RAW_SURVEY = RAW_SURVEY.rename(columns={'REPORTCOLLEGE1':'Reporting College - First Plan',
                                        'REPORTCOLLEGE2':'Reporting College - Second Plan',
                                        'REPORTCOLLEGE3':'Reporting College - Third Plan'})
############################################################################################
COLLEGE_COLS = RAW_SURVEY.columns[RAW_SURVEY.columns.str.contains('Reporting College')]
MULTI_ETH_COLS = ['African American / Black',
                  'Asian / Asian American',
                  'Hispanic / Latinx',
                  'International',
                  'American Indian / Alaska Native',
                  'Pacific Islander',
                  'Southwest Asian / North African',
                  'White / Caucasian',
                  'No Response']

# counting variables
QUESTION_DESC = RAW_SURVEY.loc[[0]] 
DATA = RAW_SURVEY[1:] 
DEMOGRAPHIC_COLUMNS = ['Undergrad Grad',
                       'Derived Residency Desc',
                       'Entry Status Desc',
                       'Ucb Level1 Ethnic Rollup Desc',
                       'Ucb Level2 Ethnic Rollup Desc',
                       'Low-income Status',
                       'First Gen College',
                       'Person Gender Desc',
                       'Reporting College',
                       'Multiple Ethnicities']

cleaning_pipeline = Pipeline([
    # drop null responses, remove duplicates and columns, make all missing/irrelevant values nan
    ('null rows remover', RemoveNullRowsTransformer()),
    ('values nullifier', ReplaceValuesTransformer(values_to_replace=VALUES_TO_NULLIFY)),
    ('duplicates remover', RemoveFirstDuplicateTransformer()),
    ('irrelevant columns remover', RemoveColumnsTransformer(columns_to_remove=COLUMNS_TO_REMOVE)),
    # rename column names
    ('undergrad grad col renamer', RenameColumnTransformer(UNGRAD_GRAD_COL, 'Undergrad Grad')),
    ('residency col renamer', RenameColumnTransformer(RESIDENCY_COL, 'Derived Residency Desc')),
    ('entry status col renamer', RenameColumnTransformer(ENTRY_STATUS_COL, 'Entry Status Desc')),
    ('ethnic lvl1 col renamer', RenameColumnTransformer(ETH_LEVEL1_COL, 'Ucb Level1 Ethnic Rollup Desc')),
    ('ethnic lvl2 col renamer', RenameColumnTransformer(ETH_LEVEL2_COL, 'Ucb Level2 Ethnic Rollup Desc')),
    # rename dataframe values
    ('undergrad value renamer', RelabelColumnTransformer(column_to_relabel='Undergrad Grad', new_label='U')),
    ('grad value renamer', RelabelColumnTransformer(column_to_relabel='Undergrad Grad', new_label='G')),
    ('first-year entry value renamer', RelabelColumnTransformer(column_to_relabel='Entry Status Desc', new_label='First-year')),
    # replace ADVANCED STANDING with NaN for all grad students
    ('advanced standing grad nullifier', ReplaceStringWithNaNTransformer(standing_col='Entry Status Desc')),
    # create columns for double counting demographics & mental health scores
    ('reporting clg col generator', UniqueStringListTransformer(columns_to_list=COLLEGE_COLS, unique_col_list='Reporting College')),
    ('multiple eth col generator', UniqueStringListTransformer(columns_to_list=MULTI_ETH_COLS, unique_col_list='Multiple Ethnicities')),
    ('depression col generator', AddColumnsTransformer(column_1='MHLTH1', column_2='MHLTH2', new_column='PHQ2', binary_column='DEPRESSION')),
    ('anxiety col generator', AddColumnsTransformer(column_1='MHLTH3', column_2='MHLTH4', new_column='GAD2', binary_column='ANXIETY'))
])

In [300]:
RAW_SURVEY = cleaning_pipeline.fit_transform(DATA)

In [301]:
DATA_SOURCE['Count'] = pd.to_numeric(DATA_SOURCE['Count'], downcast="integer")
DATA_SOURCE.head(2)

Unnamed: 0,Question Stem Id,Question Item Id,Demographic Category,Demographic Value,Undergrad Grad,Question Response,Count,Question Item,Question Stem,Demographic Value Total,"Demographic Value Total, by Undergrad Grad",Question Stem Total,Question Item Total
0,ADV_UG,ADV_UG,Undergrad Grad,U,U,No,2269,,"During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college?",5690,5690,5690,5690
1,ADV_UG,ADV_UG,Undergrad Grad,U,U,Yes,3421,,"During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college?",5690,5690,5690,5690


In [302]:
RAW_SURVEY.head(2)

Unnamed: 0,ResponseId,EDUCNONEXAMLEVEL,EDUCNONEXAMLEVELCD,UGENTRYSTATUS,REGSTATUSDESC,GENDER,SHORTETHNICDESC,TYPE,Undergrad Grad,LowSocioEconomicStatusFlg,NeitherParent4yrClgDegFlg,Pulse10cmp,ACADPLANNM1,ACADPLANNM2,ACADPLANNM3,CNR,CHE,COE,CED,CLS,BUS,GSE,GSJ,SPP,SOI,LAW,OPT,SPH,SSW,ADV_UG,ADV_MODES_UG_1,ADV_MODES_UG_2,ADV_MODES_UG_3,ADV_MODES_UG_4,ADV_MODES_UG_5,ADV_MODES_UG_6,ADV_MODES_UG_7,ADV_LAST_UG,ADV_IMPACT_UG,ADV_BASIC_MODES_UG_1,ADV_BASIC_MODES_UG_2,ADV_BASIC_MODES_UG_3,ADV_BASIC_MODES_UG_4,ADV_BASIC_MODES_UG_5,ADV_BASIC_RANK_UG_1,ADV_BASIC_RANK_UG_2,ADV_BASIC_RANK_UG_3,ADV_BASIC_RANK_UG_4,ADV_BASIC_RANK_UG_5,ADV_COMPLEX_MODES_UG_1,ADV_COMPLEX_MODES_UG_2,ADV_COMPLEX_MODES_UG_3,ADV_COMPLEX_MODES_UG_4,ADV_COMPLEX_MODES_UG_5,ADV_COMPLEX_RANK_UG_1,ADV_COMPLEX_RANK_UG_2,ADV_COMPLEX_RANK_UG_3,ADV_COMPLEX_RANK_UG_4,ADV_COMPLEX_RANK_UG_5,ADV_MET_G,ADV_AMT_G,ADV_TYPICAL_G,ADV_RECENT_G,ADV_IMPACT_G,HOUS_INSEC,HOUS_PLACES_1,HOUS_PLACES_2,HOUS_PLACES_3,HOUS_PLACES_4,HOUS_PLACES_5,HOUS_PLACES_6,HOUS_PLACES_7,HOUS_PLACES_8,HOUS_PLACES_9,HOUS_PLACES_10,HOUS_PLACES_11,HOUS_PLACES_12,HOUS_AMT,HOUS_WORRY,HOUS_FAR,HOUS_COMMUTE,COMM_LEADERS,COVID_COMFORT_1,COVID_COMFORT_2,COVID_BEHAVIOR1_1,COVID_BEHAVIOR1_2,COVID_BEHAVIOR1_3,COVID_BEHAVIOR1_4,COVID_BEHAVIOR1_5,COVID_BEHAVIOR2_1,COVID_BEHAVIOR2_2,COVID_BEHAVIOR2_3,COVID_BEHAVIOR2_4,COVID_BEHAVIOR2_5,COVID_BEHAVIOR2_6,COVID_BEHAVIOR2_7,COVID_BEHAVIOR2_8,COVID_MASK_1,COVID_MASK_2,MHLTH1,MHLTH2,MHLTH3,MHLTH4,PHQ2SCORE,GAD2SCORE,PHQ2,GAD2,Semester Year Name Concat,African American / Black,Asian / Asian American,Hispanic / Latinx,International,American Indian / Alaska Native,Pacific Islander,Southwest Asian / North African,White / Caucasian,No Response,First Gen College,Person Gender Desc,Entry Status Desc,Derived Residency Desc,Ucb Level1 Ethnic Rollup Desc,Ucb Level2 Ethnic Rollup Desc,Reporting College - First Plan,Reporting College - Second Plan,Reporting College - Third Plan,Low-income Status,Reporting College,Multiple Ethnicities,DEPRESSION,ANXIETY
1,R_2c0AYPeQkvJtmJ2,Senior,Senior,Transfer,Continuing Student,Female,International,Undergraduate,U,,,0,Electrical Eng & Comp Sci BS,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2022 Spring,,Asian / Asian American,,International,,,,,,Not first-generation college,Woman,ADVANCED STANDING,International,International,International,College of Engineering,,,Not low-income,[College of Engineering],"[Asian / Asian American, International]",,
3,R_2rqJppW04QcQABh,Doctoral not advanced to candidacy,Doctoral (not advanced to candidacy),,Continuing Student,Male,Chinese,Graduate Student,G,,,1,Materials Science & Eng PhD,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,One time,Email,Email,The mode of contact had a positive impact,No,,,,,,,,,,,,,,Never,1 mile to 2 miles,Rarely,Email,About the same comfort,A little more uncomfortable,Not important,Very important,Not important,Slightly important,Moderately important,Important,Slightly important,Moderately important,Slightly important,Slightly important,Slightly important,Important,Moderately important,Very comfortable,Somewhat comfortable,Several days,Not at all,Several days,Not at all,1.0,1.0,1.0,1.0,2022 Spring,,Asian / Asian American,,,,,,,,Not first-generation college,Man,FIRST TIME IN PROGRAM,CA Resident,Asian,Asian,College of Engineering,,,,[College of Engineering],[Asian / Asian American],NO,NO


In [303]:
STEM_ID = DATA_SOURCE['Question Stem Id'].unique()
STEM_ID

array(['ADV_UG', 'ADV_MODES_UG_1', 'ADV_MODES_UG_2', 'ADV_MODES_UG_3',
       'ADV_MODES_UG_4', 'ADV_MODES_UG_5', 'ADV_MODES_UG_6',
       'ADV_MODES_UG_7', 'ADV_LAST_UG', 'ADV_IMPACT_UG',
       'ADV_BASIC_RANK_UG_1', 'ADV_BASIC_RANK_UG_2',
       'ADV_BASIC_RANK_UG_3', 'ADV_BASIC_RANK_UG_4',
       'ADV_BASIC_RANK_UG_5', 'ADV_COMPLEX_RANK_UG_1',
       'ADV_COMPLEX_RANK_UG_2', 'ADV_COMPLEX_RANK_UG_3',
       'ADV_COMPLEX_RANK_UG_4', 'ADV_COMPLEX_RANK_UG_5', 'ADV_MET_G',
       'ADV_AMT_G', 'ADV_TYPICAL_G', 'ADV_RECENT_G', 'ADV_IMPACT_G',
       'HOUS_INSEC', 'HOUS_AMT', 'HOUS_WORRY', 'HOUS_FAR', 'HOUS_COMMUTE',
       'COMM_LEADERS', 'COVID_COMFORT_1', 'COVID_COMFORT_2',
       'COVID_BEHAVIOR1_1', 'COVID_BEHAVIOR1_2', 'COVID_BEHAVIOR1_3',
       'COVID_BEHAVIOR1_4', 'COVID_BEHAVIOR1_5', 'COVID_BEHAVIOR2_1',
       'COVID_BEHAVIOR2_2', 'COVID_BEHAVIOR2_3', 'COVID_BEHAVIOR2_4',
       'COVID_BEHAVIOR2_5', 'COVID_BEHAVIOR2_6', 'COVID_BEHAVIOR2_7',
       'COVID_BEHAVIOR2_8', 'COVID_MAS

In [304]:
MULTI_SELECT = ['ADV_BASIC_MODES_UG', 'ADV_COMPLEX_MODES_UG', 'HOUS_PLACES']

SINGLE_SELECT = [id for id in STEM_ID if id not in MULTI_SELECT]

SINGLE_DEMOS = ['Undergrad Grad', 'Derived Residency Desc', 
             'Entry Status Desc', 'Ucb Level1 Ethnic Rollup Desc',
             'Ucb Level2 Ethnic Rollup Desc', 'Low-income Status', 
             'First Gen College', 'Person Gender Desc']

DOUBLE_DEMOS = ['Multiple Ethnicities ', 'Reporting College']

##
## 2. Check the Question Stem Total column for at least 3 single select questions

In [305]:
# completed function 
def check_qstem_total(qstems): 
    for qstem in qstems: 
        print('_____', qstem, '_____')
        # finding data source value for question stem total 
        allstemtotal = DATA_SOURCE[DATA_SOURCE['Question Stem Id'].str.contains(qstem, case=False)]
        stemtotal = allstemtotal[['Question Item Id', 'Question Stem Total']]
        stemtotal = stemtotal.loc[DATA_SOURCE['Demographic Category'] == 'Undergrad Grad'].drop_duplicates(ignore_index=True)
        data_source_val = stemtotal['Question Stem Total'][0]

        #finding raw survey value for question stem total 
        raw_survey_val = RAW_SURVEY[qstem].count()

        print('DATA SOURCE:', data_source_val)
        print('RAW SURVEY:', raw_survey_val)
        print('Equal?:', data_source_val == raw_survey_val) 
        print("\n")

# check multiple stem totals 
qstems = ['ADV_UG', 'ADV_MODES_UG_1', 'ADV_MODES_UG_2', 'ADV_MODES_UG_3']
check_qstem_total(qstems)

_____ ADV_UG _____
DATA SOURCE: 5690
RAW SURVEY: 5690
Equal?: True


_____ ADV_MODES_UG_1 _____
DATA SOURCE: 3421
RAW SURVEY: 3421
Equal?: True


_____ ADV_MODES_UG_2 _____
DATA SOURCE: 3421
RAW SURVEY: 3421
Equal?: True


_____ ADV_MODES_UG_3 _____
DATA SOURCE: 3421
RAW SURVEY: 3421
Equal?: True




In [306]:
qstem = 'ADV_UG'

####
#### DATA SOURCE STEM TOTAL 

In [307]:
allstemtotal = DATA_SOURCE[DATA_SOURCE['Question Stem Id'].str.contains(qstem, case=False)]
allstemtotal.head(2)

Unnamed: 0,Question Stem Id,Question Item Id,Demographic Category,Demographic Value,Undergrad Grad,Question Response,Count,Question Item,Question Stem,Demographic Value Total,"Demographic Value Total, by Undergrad Grad",Question Stem Total,Question Item Total
0,ADV_UG,ADV_UG,Undergrad Grad,U,U,No,2269,,"During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college?",5690,5690,5690,5690
1,ADV_UG,ADV_UG,Undergrad Grad,U,U,Yes,3421,,"During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college?",5690,5690,5690,5690


In [308]:
stemtotal = allstemtotal[['Question Item Id', 'Question Stem Total']]
stemtotal = stemtotal.loc[DATA_SOURCE['Demographic Category'] == 'Undergrad Grad'].drop_duplicates(ignore_index=True)
stemtotal['Question Stem Total'][0]

5690

####
#### RAW SURVEY STEM TOTAL

In [309]:
RAW_SURVEY[qstem].count()

5690

In [310]:
# make sure above number is accurate and is not counting unnecessary values 
RAW_SURVEY[qstem].value_counts()

Yes    3421
No     2269
Name: ADV_UG, dtype: int64

In [311]:
RAW_SURVEY.head(3)

Unnamed: 0,ResponseId,EDUCNONEXAMLEVEL,EDUCNONEXAMLEVELCD,UGENTRYSTATUS,REGSTATUSDESC,GENDER,SHORTETHNICDESC,TYPE,Undergrad Grad,LowSocioEconomicStatusFlg,NeitherParent4yrClgDegFlg,Pulse10cmp,ACADPLANNM1,ACADPLANNM2,ACADPLANNM3,CNR,CHE,COE,CED,CLS,BUS,GSE,GSJ,SPP,SOI,LAW,OPT,SPH,SSW,ADV_UG,ADV_MODES_UG_1,ADV_MODES_UG_2,ADV_MODES_UG_3,ADV_MODES_UG_4,ADV_MODES_UG_5,ADV_MODES_UG_6,ADV_MODES_UG_7,ADV_LAST_UG,ADV_IMPACT_UG,ADV_BASIC_MODES_UG_1,ADV_BASIC_MODES_UG_2,ADV_BASIC_MODES_UG_3,ADV_BASIC_MODES_UG_4,ADV_BASIC_MODES_UG_5,ADV_BASIC_RANK_UG_1,ADV_BASIC_RANK_UG_2,ADV_BASIC_RANK_UG_3,ADV_BASIC_RANK_UG_4,ADV_BASIC_RANK_UG_5,ADV_COMPLEX_MODES_UG_1,ADV_COMPLEX_MODES_UG_2,ADV_COMPLEX_MODES_UG_3,ADV_COMPLEX_MODES_UG_4,ADV_COMPLEX_MODES_UG_5,ADV_COMPLEX_RANK_UG_1,ADV_COMPLEX_RANK_UG_2,ADV_COMPLEX_RANK_UG_3,ADV_COMPLEX_RANK_UG_4,ADV_COMPLEX_RANK_UG_5,ADV_MET_G,ADV_AMT_G,ADV_TYPICAL_G,ADV_RECENT_G,ADV_IMPACT_G,HOUS_INSEC,HOUS_PLACES_1,HOUS_PLACES_2,HOUS_PLACES_3,HOUS_PLACES_4,HOUS_PLACES_5,HOUS_PLACES_6,HOUS_PLACES_7,HOUS_PLACES_8,HOUS_PLACES_9,HOUS_PLACES_10,HOUS_PLACES_11,HOUS_PLACES_12,HOUS_AMT,HOUS_WORRY,HOUS_FAR,HOUS_COMMUTE,COMM_LEADERS,COVID_COMFORT_1,COVID_COMFORT_2,COVID_BEHAVIOR1_1,COVID_BEHAVIOR1_2,COVID_BEHAVIOR1_3,COVID_BEHAVIOR1_4,COVID_BEHAVIOR1_5,COVID_BEHAVIOR2_1,COVID_BEHAVIOR2_2,COVID_BEHAVIOR2_3,COVID_BEHAVIOR2_4,COVID_BEHAVIOR2_5,COVID_BEHAVIOR2_6,COVID_BEHAVIOR2_7,COVID_BEHAVIOR2_8,COVID_MASK_1,COVID_MASK_2,MHLTH1,MHLTH2,MHLTH3,MHLTH4,PHQ2SCORE,GAD2SCORE,PHQ2,GAD2,Semester Year Name Concat,African American / Black,Asian / Asian American,Hispanic / Latinx,International,American Indian / Alaska Native,Pacific Islander,Southwest Asian / North African,White / Caucasian,No Response,First Gen College,Person Gender Desc,Entry Status Desc,Derived Residency Desc,Ucb Level1 Ethnic Rollup Desc,Ucb Level2 Ethnic Rollup Desc,Reporting College - First Plan,Reporting College - Second Plan,Reporting College - Third Plan,Low-income Status,Reporting College,Multiple Ethnicities,DEPRESSION,ANXIETY
1,R_2c0AYPeQkvJtmJ2,Senior,Senior,Transfer,Continuing Student,Female,International,Undergraduate,U,,,0,Electrical Eng & Comp Sci BS,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2022 Spring,,Asian / Asian American,,International,,,,,,Not first-generation college,Woman,ADVANCED STANDING,International,International,International,College of Engineering,,,Not low-income,[College of Engineering],"[Asian / Asian American, International]",,
3,R_2rqJppW04QcQABh,Doctoral not advanced to candidacy,Doctoral (not advanced to candidacy),,Continuing Student,Male,Chinese,Graduate Student,G,,,1,Materials Science & Eng PhD,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Yes,One time,Email,Email,The mode of contact had a positive impact,No,,,,,,,,,,,,,,Never,1 mile to 2 miles,Rarely,Email,About the same comfort,A little more uncomfortable,Not important,Very important,Not important,Slightly important,Moderately important,Important,Slightly important,Moderately important,Slightly important,Slightly important,Slightly important,Important,Moderately important,Very comfortable,Somewhat comfortable,Several days,Not at all,Several days,Not at all,1.0,1.0,1.0,1.0,2022 Spring,,Asian / Asian American,,,,,,,,Not first-generation college,Man,FIRST TIME IN PROGRAM,CA Resident,Asian,Asian,College of Engineering,,,,[College of Engineering],[Asian / Asian American],NO,NO
2,R_2Y33BFrKqhGk0ZO,Sophomore,Sophomore,New Freshman,Continuing Student,Female,White,Undergraduate,U,,,1,Letters & Sci Undeclared UG,,,,,,,1.0,,,,,,,,,,Yes,0/None,0/None,0/None,1 time,0/None,0/None,1 time,Email,The mode of contact had a positive impact,Selected,Selected,Selected,Selected,Selected,4.0,3.0,2.0,1.0,5.0,,,Selected,Selected,,,,1.0,2.0,,,,,,,No,,,,,,,,,,,,,,Rarely,On campus or less than one mile,Rarely,Going through the ASUC/GA (student government),A little more comfortable,A lot more comfortable,Slightly important,Moderately important,Slightly important,Slightly important,Slightly important,Moderately important,Slightly important,Moderately important,Slightly important,Slightly important,Slightly important,Moderately important,Slightly important,Very comfortable,Somewhat comfortable,Not at all,Not at all,Several days,Not at all,0.0,1.0,0.0,1.0,2022 Spring,,,,,,,,White / Caucasian,,Not first-generation college,Woman,First-year,Out of State Domestic,White,White,College of Letters and Science,,,Not low-income,[College of Letters and Science],[White / Caucasian],NO,NO


##
## 3. Check Count and Demographic Value Totals column for each demographic for at least 2 single select questions

In [312]:
# completed function (one demographic value) 
def check_count_onedemo(qitem, demo, double_count_demo = False): 
    # finding data source values #
    ds_counts = DATA_SOURCE[DATA_SOURCE['Question Item Id']== qitem]
    ds_counts = ds_counts[ds_counts['Demographic Category'] == demo][['Demographic Value', 'Demographic Value Total', 'Undergrad Grad', 'Count', 'Question Response']]
    ds_counts = ds_counts.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)
    
    
    # finding raw survey values #
    raw = RAW_SURVEY
    if double_count_demo: 
        raw = RAW_SURVEY.explode(demo)
    raw['ID DUPLICATE'] = raw[qitem]
    raw_piv = pd.pivot_table(raw, values=qitem, index=['Undergrad Grad', demo, 'ID DUPLICATE'], aggfunc='count')
    raw_piv = raw_piv.reset_index().rename(columns={'Ungrad Grad Cd': 'Undergrad Grad', demo: 'Demographic Value', qitem: 'Count', 'ID DUPLICATE': 'Question Response'})

    # make demographic value total col 
    demo_vals = raw_piv.groupby('Demographic Value').sum('Count')
    demo_vals = demo_vals.to_dict('index')
    demo_vals = {k1: v for k1 in demo_vals for k2, v in demo_vals[k1].items()}
    raw_piv['Demographic Value Total'] = raw_piv['Demographic Value'].map(demo_vals)

    raw_piv = raw_piv.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)

    print('DATA SOURCE: ')
    display(ds_counts)
    print("\n")
    print('RAW SURVEY: ')
    display(raw_piv)


# completed function (all demographic values for ONE QUESTION ITEM) 
def check_count_alldemo(qitem, demo_vals): 
    for demo in demo_vals:
        print('DEMOGRAPHIC VALUE:', demo) 
        if demo in ['Reporting College', 'Multiple Ethnicities']:
            check_count_onedemo(qitem, demo, double_count_demo = True)
        else:
            check_count_onedemo(qitem, demo) 
        print("\n")
        
demo_cat = [#'Undergrad Grad',
            'Derived Residency Desc',
            'Entry Status Desc',
            'Ucb Level1 Ethnic Rollup Desc',
            'Ucb Level2 Ethnic Rollup Desc',
            'Low-income Status',
            'First Gen College',
            'Person Gender Desc',
            'Reporting College',
            'Multiple Ethnicities']

# if FALSE, check dataframes below by replacing the variables qitem and demo (typically because of cleaning/low counts) 

In [313]:
check_count_alldemo('HOUS_INSEC', demo_cat) 

DEMOGRAPHIC VALUE: Derived Residency Desc
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,27,CA Resident,5437,Yes,G
1,1028,CA Resident,5437,No,G
2,398,CA Resident,5437,Yes,U
3,3984,CA Resident,5437,No,U
4,100,International,1641,Yes,G
5,926,International,1641,No,G
6,103,International,1641,Yes,U
7,512,International,1641,No,U
8,11,Out of State Domestic,1054,Yes,G
9,362,Out of State Domestic,1054,No,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,27,CA Resident,5437,Yes,G
1,1028,CA Resident,5437,No,G
2,398,CA Resident,5437,Yes,U
3,3984,CA Resident,5437,No,U
4,100,International,1641,Yes,G
5,926,International,1641,No,G
6,103,International,1641,Yes,U
7,512,International,1641,No,U
8,11,Out of State Domestic,1054,Yes,G
9,362,Out of State Domestic,1054,No,G




DEMOGRAPHIC VALUE: Entry Status Desc
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,213,ADVANCED STANDING,1540,Yes,U
1,1327,ADVANCED STANDING,1540,No,U
2,1,DOCTORAL,62,Yes,G
3,61,DOCTORAL,62,No,G
4,153,FIRST TIME IN PROGRAM,2776,Yes,G
5,2623,FIRST TIME IN PROGRAM,2776,No,G
6,338,First-year,4137,Yes,U
7,3799,First-year,4137,No,U
8,-1,LIMITED,-1,No,G
9,-1,LIMITED,-1,Yes,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,213,ADVANCED STANDING,1540,Yes,U
1,1327,ADVANCED STANDING,1540,No,U
2,1,DOCTORAL,62,Yes,G
3,61,DOCTORAL,62,No,G
4,153,FIRST TIME IN PROGRAM,2776,Yes,G
5,2623,FIRST TIME IN PROGRAM,2776,No,G
6,338,First-year,4137,Yes,U
7,3799,First-year,4137,No,U
8,4,LIMITED,4,No,G
9,2,MASTERS,2,No,G




DEMOGRAPHIC VALUE: Ucb Level1 Ethnic Rollup Desc
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,19,Asian,2716,Yes,G
1,471,Asian,2716,No,G
2,189,Asian,2716,Yes,U
3,2037,Asian,2716,No,U
4,100,International,1641,Yes,G
5,926,International,1641,No,G
6,103,International,1641,Yes,U
7,512,International,1641,No,U
8,17,Underrepresented Minority,1911,Yes,G
9,402,Underrepresented Minority,1911,No,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,19,Asian,2716,Yes,G
1,471,Asian,2716,No,G
2,189,Asian,2716,Yes,U
3,2037,Asian,2716,No,U
4,100,International,1641,Yes,G
5,926,International,1641,No,G
6,103,International,1641,Yes,U
7,512,International,1641,No,U
8,17,Underrepresented Minority,1911,Yes,G
9,402,Underrepresented Minority,1911,No,G




DEMOGRAPHIC VALUE: Ucb Level2 Ethnic Rollup Desc
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,6,African American,313,Yes,G
1,109,African American,313,No,G
2,29,African American,313,Yes,U
3,169,African American,313,No,U
4,19,Asian,2716,Yes,G
5,471,Asian,2716,No,G
6,189,Asian,2716,Yes,U
7,2037,Asian,2716,No,U
8,10,Chicano/Latino,1541,Yes,G
9,270,Chicano/Latino,1541,No,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,6,African American,313,Yes,G
1,109,African American,313,No,G
2,29,African American,313,Yes,U
3,169,African American,313,No,U
4,19,Asian,2716,Yes,G
5,471,Asian,2716,No,G
6,189,Asian,2716,Yes,U
7,2037,Asian,2716,No,U
8,10,Chicano/Latino,1541,Yes,G
9,270,Chicano/Latino,1541,No,G




DEMOGRAPHIC VALUE: Low-income Status
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,227,Low-income,1815,Yes,U
1,1588,Low-income,1815,No,U
2,325,Not low-income,3863,Yes,U
3,3538,Not low-income,3863,No,U




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,227,Low-income,1815,Yes,U
1,1588,Low-income,1815,No,U
2,325,Not low-income,3863,Yes,U
3,3538,Not low-income,3863,No,U




DEMOGRAPHIC VALUE: First Gen College
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,17,First-generation college,2258,Yes,G
1,239,First-generation college,2258,No,G
2,258,First-generation college,2258,Yes,U
3,1744,First-generation college,2258,No,U
4,68,Not first-generation college,4835,Yes,G
5,1236,Not first-generation college,4835,No,G
6,273,Not first-generation college,4835,Yes,U
7,3258,Not first-generation college,4835,No,U
8,54,Unknown,1046,Yes,G
9,847,Unknown,1046,No,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,15,First-generation college,2254,Yes,G
1,237,First-generation college,2254,No,G
2,258,First-generation college,2254,Yes,U
3,1744,First-generation college,2254,No,U
4,29,N,29,No,G
5,68,Not first-generation college,4806,Yes,G
6,1207,Not first-generation college,4806,No,G
7,273,Not first-generation college,4806,Yes,U
8,3258,Not first-generation college,4806,No,U
9,3,U,20,Yes,G




DEMOGRAPHIC VALUE: Person Gender Desc
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,2,Decline to State,177,Yes,G
1,22,Decline to State,177,No,G
2,13,Decline to State,177,Yes,U
3,140,Decline to State,177,No,U
4,-1,Different Identity,-1,No,U
5,-1,Different Identity,-1,Yes,U
6,76,Man,3570,Yes,G
7,1286,Man,3570,No,G
8,222,Man,3570,Yes,U
9,1986,Man,3570,No,U




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,2,Decline to State,177,Yes,G
1,22,Decline to State,177,No,G
2,13,Decline to State,177,Yes,U
3,140,Decline to State,177,No,U
4,4,Different Identity,4,No,U
5,76,Man,3570,Yes,G
6,1286,Man,3570,No,G
7,222,Man,3570,Yes,U
8,1986,Man,3570,No,U
9,1,Nonbinary,65,Yes,G




DEMOGRAPHIC VALUE: Reporting College
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,2,College of Chemistry,316,Yes,G
1,115,College of Chemistry,316,No,G
2,15,College of Chemistry,316,Yes,U
3,184,College of Chemistry,316,No,U
4,45,College of Engineering,1362,Yes,G
5,658,College of Engineering,1362,No,G
6,48,College of Engineering,1362,Yes,U
7,611,College of Engineering,1362,No,U
8,9,College of Environmental Design,264,Yes,G
9,103,College of Environmental Design,264,No,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,2,College of Chemistry,316,Yes,G
1,115,College of Chemistry,316,No,G
2,15,College of Chemistry,316,Yes,U
3,184,College of Chemistry,316,No,U
4,45,College of Engineering,1362,Yes,G
5,658,College of Engineering,1362,No,G
6,48,College of Engineering,1362,Yes,U
7,611,College of Engineering,1362,No,U
8,9,College of Environmental Design,264,Yes,G
9,103,College of Environmental Design,264,No,G




DEMOGRAPHIC VALUE: Multiple Ethnicities
DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,6,African American / Black,330,Yes,G
1,117,African American / Black,330,No,G
2,31,African American / Black,330,Yes,U
3,176,African American / Black,330,No,U
4,1,American Indian / Alaska Native,105,Yes,G
5,30,American Indian / Alaska Native,105,No,G
6,10,American Indian / Alaska Native,105,Yes,U
7,64,American Indian / Alaska Native,105,No,U
8,39,Asian / Asian American,3450,Yes,G
9,608,Asian / Asian American,3450,No,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,6,African American / Black,330,Yes,G
1,117,African American / Black,330,No,G
2,31,African American / Black,330,Yes,U
3,176,African American / Black,330,No,U
4,1,American Indian / Alaska Native,105,Yes,G
5,30,American Indian / Alaska Native,105,No,G
6,10,American Indian / Alaska Native,105,Yes,U
7,64,American Indian / Alaska Native,105,No,U
8,39,Asian / Asian American,3450,Yes,G
9,608,Asian / Asian American,3450,No,G






In [314]:
qitem = 'HOUS_INSEC'
demo = 'First Gen College'

####
#### DATA SOURCE COUNTS DF

In [315]:
ds_counts = DATA_SOURCE[DATA_SOURCE['Question Item Id']== qitem]
ds_counts = ds_counts[ds_counts['Demographic Category'] == demo][['Demographic Value', 'Demographic Value Total', 'Undergrad Grad', 'Count', 'Question Response']]
ds_counts = ds_counts.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)
ds_counts

Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,17,First-generation college,2258,Yes,G
1,239,First-generation college,2258,No,G
2,258,First-generation college,2258,Yes,U
3,1744,First-generation college,2258,No,U
4,68,Not first-generation college,4835,Yes,G
5,1236,Not first-generation college,4835,No,G
6,273,Not first-generation college,4835,Yes,U
7,3258,Not first-generation college,4835,No,U
8,54,Unknown,1046,Yes,G
9,847,Unknown,1046,No,G


####
#### RAW SURVEY COUNTS DF

In [316]:
# uncomment line below if double counting
# RAW_SURVEY = RAW_SURVEY.explode(# insert double counting demographic value)
RAW_SURVEY['ID DUPLICATE'] = RAW_SURVEY[qitem]
raw_piv = pd.pivot_table(RAW_SURVEY, values=qitem, index=['Undergrad Grad', demo, 'ID DUPLICATE'], aggfunc='count')

raw_piv = raw_piv.reset_index().rename(columns={'Ungrad Grad Cd': 'Undergrad Grad', demo: 'Demographic Value', qitem: 'Count', 'ID DUPLICATE': 'Question Response'})

#make demographic value total col 
demo_vals = raw_piv.groupby('Demographic Value').sum('Count')
demo_vals = demo_vals.to_dict('index')
demo_vals = {k1: v for k1 in demo_vals for k2, v in demo_vals[k1].items()}
raw_piv['Demographic Value Total'] = raw_piv['Demographic Value'].map(demo_vals)

#replace low counts with -1
#raw_piv['Count'] = raw_piv['Count'].apply(lambda x: -1 if x < 11 else x)

raw_piv = raw_piv.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)
raw_piv

Unnamed: 0,Count,Demographic Value,Demographic Value Total,Question Response,Undergrad Grad
0,15,First-generation college,2254,Yes,G
1,237,First-generation college,2254,No,G
2,258,First-generation college,2254,Yes,U
3,1744,First-generation college,2254,No,U
4,29,N,29,No,G
5,68,Not first-generation college,4806,Yes,G
6,1207,Not first-generation college,4806,No,G
7,273,Not first-generation college,4806,Yes,U
8,3258,Not first-generation college,4806,No,U
9,3,U,20,Yes,G


In [317]:
ds_counts.astype(str).equals(raw_piv.astype(str))

False

##
## 4. Check Count and Demographic Value Totals, by Undergrad Grad column for one non-double-counting demographic and one double-counting demographic for at least 2 single-select questions 
Preferably questions that haven’t been checked

In [318]:
# completed function (one demographic value) 
def check_count_ug_onedemo(qitem, demo, double_count_demo = False): 
    # finding data source values #
    ds_counts = DATA_SOURCE[DATA_SOURCE['Question Item Id']== qitem]
    ds_counts = ds_counts[ds_counts['Demographic Category'] == demo][['Demographic Value', 'Demographic Value Total, by Undergrad Grad', 'Undergrad Grad', 'Count', 'Question Response']]
    ds_counts = ds_counts.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)
    
    
    # finding raw survey values #
    raw = RAW_SURVEY
    if double_count_demo: 
        raw = RAW_SURVEY.explode(demo)
    raw['ID DUPLICATE'] = raw[qitem]
    raw_piv = pd.pivot_table(raw, values=qitem, index=['Undergrad Grad', demo, 'ID DUPLICATE'], aggfunc='count')
    raw_piv = raw_piv.reset_index().rename(columns={'Ungrad Grad Cd': 'Undergrad Grad', demo: 'Demographic Value', qitem: 'Count', 'ID DUPLICATE': 'Question Response'})

    # make demographic value total by ug col 
    demo_vals = raw_piv.groupby(['Demographic Value', 'Undergrad Grad']).sum('Count')
    demo_vals.reset_index()
    raw_piv = demo_vals.merge(raw_piv, 'right', on=['Demographic Value', 'Undergrad Grad'], suffixes=('_dvt by ug', '')).rename(columns={'Count_dvt by ug': 'Demographic Value Total, by Undergrad Grad'})

    # sort columns 
    raw_piv = raw_piv.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)

    print('DATA SOURCE: ')
    display(ds_counts)
    print("\n")
    print('RAW SURVEY: ')
    display(raw_piv)

# completed function (demographic values for ONE QUESTION ITEM) 
def check_count_ug_alldemo(qitem, demo_vals): 
    for demo in demo_vals:
        print('DEMOGRAPHIC VALUE:', demo) 
        if demo in ['Reporting College', 'Multiple Ethnicities']:
            check_count_ug_onedemo(qitem, demo, double_count_demo = True)
        else:
            check_count_ug_onedemo(qitem, demo) 
        print("\n")

In [319]:
check_count_ug_onedemo(qitem, demo, double_count_demo = False)

DATA SOURCE: 


Unnamed: 0,Count,Demographic Value,"Demographic Value Total, by Undergrad Grad",Question Response,Undergrad Grad
0,17,First-generation college,256,Yes,G
1,239,First-generation college,256,No,G
2,258,First-generation college,2002,Yes,U
3,1744,First-generation college,2002,No,U
4,68,Not first-generation college,1304,Yes,G
5,1236,Not first-generation college,1304,No,G
6,273,Not first-generation college,3531,Yes,U
7,3258,Not first-generation college,3531,No,U
8,54,Unknown,901,Yes,G
9,847,Unknown,901,No,G




RAW SURVEY: 


Unnamed: 0,Count,Demographic Value,"Demographic Value Total, by Undergrad Grad",Question Response,Undergrad Grad
0,15,First-generation college,252,Yes,G
1,237,First-generation college,252,No,G
2,258,First-generation college,2002,Yes,U
3,1744,First-generation college,2002,No,U
4,29,N,29,No,G
5,68,Not first-generation college,1275,Yes,G
6,1207,Not first-generation college,1275,No,G
7,273,Not first-generation college,3531,Yes,U
8,3258,Not first-generation college,3531,No,U
9,3,U,20,Yes,G


In [320]:
qitem = 'HOUS_INSEC'
demo = 'Reporting College'#np.random.choice(demo)


####
#### DATA SOURCE COUNTS DF BY UG

In [321]:
ds_counts = DATA_SOURCE[DATA_SOURCE['Question Item Id']== qitem]
ds_counts = ds_counts[ds_counts['Demographic Category'] == demo][['Demographic Value', 'Demographic Value Total, by Undergrad Grad', 'Undergrad Grad', 'Count', 'Question Response']]
ds_counts = ds_counts.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)
ds_counts

Unnamed: 0,Count,Demographic Value,"Demographic Value Total, by Undergrad Grad",Question Response,Undergrad Grad
0,2,College of Chemistry,117,Yes,G
1,115,College of Chemistry,117,No,G
2,15,College of Chemistry,199,Yes,U
3,184,College of Chemistry,199,No,U
4,45,College of Engineering,703,Yes,G
5,658,College of Engineering,703,No,G
6,48,College of Engineering,659,Yes,U
7,611,College of Engineering,659,No,U
8,9,College of Environmental Design,112,Yes,G
9,103,College of Environmental Design,112,No,G


####
#### RAW SURVEY COUNTS DF BY UG

In [322]:
RAW_SURVEY = RAW_SURVEY.explode(demo)
RAW_SURVEY['ID DUPLICATE'] = RAW_SURVEY[qitem]
raw_piv = pd.pivot_table(RAW_SURVEY, values=qitem, index=['Undergrad Grad', demo, 'ID DUPLICATE'], aggfunc='count')

raw_piv = raw_piv.reset_index().rename(columns={'Ungrad Grad Cd': 'Undergrad Grad', demo: 'Demographic Value', qitem: 'Count', 'ID DUPLICATE': 'Question Response'})

# make demographic value total by ug col 
demo_vals = raw_piv.groupby(['Demographic Value', 'Undergrad Grad']).sum('Count')
demo_vals.reset_index()
raw_piv = demo_vals.merge(raw_piv, 'right', on=['Demographic Value', 'Undergrad Grad'], suffixes=('_dvt by ug', '')).rename(columns={'Count_dvt by ug': 'Demographic Value Total, by Undergrad Grad'})

raw_piv = raw_piv.sort_index(axis=1).sort_values(by = ['Demographic Value', 'Undergrad Grad', 'Count', 'Question Response']).reset_index(drop=True)
raw_piv

Unnamed: 0,Count,Demographic Value,"Demographic Value Total, by Undergrad Grad",Question Response,Undergrad Grad
0,2,College of Chemistry,117,Yes,G
1,115,College of Chemistry,117,No,G
2,15,College of Chemistry,199,Yes,U
3,184,College of Chemistry,199,No,U
4,45,College of Engineering,703,Yes,G
5,658,College of Engineering,703,No,G
6,48,College of Engineering,659,Yes,U
7,611,College of Engineering,659,No,U
8,9,College of Environmental Design,112,Yes,G
9,103,College of Environmental Design,112,No,G


##
## 5. Check that each Question Stem Id matches their Question Stem/Item & Question Response

In [323]:
def check_qstem_qitem(): 
    STEM_IDS = DATA_SOURCE['Question Stem Id'].unique()
    for qstem in STEM_IDS: 
        qstem_str = DATA_SOURCE[DATA_SOURCE['Question Item Id'].str.contains(qstem, case=False)]['Question Stem'].unique()
        qitem_str = DATA_SOURCE[DATA_SOURCE['Question Item Id'].str.contains(qstem, case=False)]['Question Item'].unique()
        

        print('########', qstem, '########')
        print('QUESTION STEM:', qstem_str)
        print("\n")
        print('QUESTION ITEM:', qitem_str)
        print("\n")
    
check_qstem_qitem()

######## ADV_UG ########
QUESTION STEM: ['During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college?']


QUESTION ITEM: [nan]


######## ADV_MODES_UG_1 ########
QUESTION STEM: ['During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college? -  Help/reception desk, in-person']


QUESTION ITEM: [nan]


######## ADV_MODES_UG_2 ########
QUESTION STEM: ['During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college? -  Help desk by Zoom or phone']


QUESTION ITEM: [nan]


######## ADV_MODES_UG_3 ########
QUESTION STEM: ['During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college? -  One-on-one meeting, in-person']


QUESTION ITEM: [nan]


######## ADV_MODES_UG_4 #######

In [324]:
qstem_str = DATA_SOURCE[DATA_SOURCE['Question Item Id'].str.contains(qstem, case=False)]['Question Stem'].unique()
qitem_str = DATA_SOURCE[DATA_SOURCE['Question Item Id'].str.contains(qstem, case=False)]['Question Item'].unique()[0]

# make sure there is only one question stem for each question item 
if (len(qstem_str) == 1) == False: 
    print ('!!!! ERROR: MULTIPLE QUESTION STEMS FOR ONE QUESTION STEM !!!!')
    # ex: the question item is not properly separated from stem 
    # ex: 'During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college? -  Help/reception desk, in-person'
    # instead of: 'During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college?' 

print(qstem_str), print(qitem_str)

['During this academic year (since the beginning of the Fall 21 semester), have you consulted with an academic advisor in your major or college?']
nan


(None, None)