# Step 1: Preprocess data

Load data from:
ACCESS Advising Evaluation (Responses).xlsx

Clean data:
* remove unused columns
* remove text from column names
* filter data for selected month
* replace NaN with 'No Response'

## 1. Install openpyxl
Install openpyxl to use 'open_excel' function. In terminal, enter: ```pip install pandas openpyxl```

## 2. Import libraries
Recommend: pandas

In [1]:
# import libraries

import pandas as pd

## 3. Load data
Use pd.read_excel to load the Excel file.

In [2]:
# load data

df = pd.read_excel('ACCESS Advising Evaluation (Responses).xlsx')

In [3]:
# check data

print(pd.DataFrame(df))

                   Timestamp           Advisor  \
0    2017-08-07 16:33:40.020             Kalei   
1    2017-08-07 16:33:42.351             Kalei   
2    2017-08-08 08:25:01.505             Jason   
3    2017-08-08 09:57:45.599             Kalei   
4    2017-08-08 12:00:22.898             Jason   
...                      ...               ...   
6358 2026-01-08 10:47:33.973  Sunhee Kim Fujii   
6359 2026-01-08 11:14:56.960  Sunhee Kim Fujii   
6360 2026-01-08 13:27:37.492  Sunhee Kim Fujii   
6361 2026-01-08 16:03:10.874  Sunhee Kim Fujii   
6362 2026-01-09 06:55:56.012   Jennifer Oshiro   

     Please evaluate your advising session  [I am more aware of the opportunities and options available to me.]  \
0                                    Strongly Agree (5)                                                           
1                                    Strongly Agree (5)                                                           
2                                             Agree (4

## 4. Clean data (remove unused columns)
pandas.DataFrame.drop (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html)

```df.drop(columns=cols_to_remove)```

In [None]:
# drop columns


In [None]:
# check data

## 5. Clean data (remove text from column names)
pandas.DataFrame.replace (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html)

```df.columns.str.replace(text_to_remove, '') # replace text_to_remove with empty string```

```df.columns.str.strip() # remove extra spaces from column names```

In [7]:
# get column names

print(df.columns)

# remove text from column names
# different section titles and different endings

df.columns = df.columns.str.replace('Please evaluate your advising session', '')
df.columns = df.columns.str.replace('Please evaluate the advisor', '')
df.columns = df.columns.str.replace('Overall advising appointment', '')
df.columns = df.columns.str.replace('[', '')
df.columns = df.columns.str.replace(']', '')
df.columns = df.columns.str.replace('.', '')
df.columns = df.columns.str.replace(':', '')

# remove extra spaces from column names

df.columns = df.columns.str.strip()

Index(['Timestamp', 'Advisor',
       '  I am more aware of the opportunities and options available to me',
       '  I am better able to select courses and evaluate my academic progress',
       '  I feel more confident about deciding my next steps',
       ' The advisor was informative and knowledgeable',
       ' The advisor was respectful and listened carefully to what I shared',
       ' I was satisfied with how my advisor handled my questions',
       ' It was easy to talk to my advisor',
       ' My overall evaluation of this advising appointment is',
       ' My overall evaluation of this advisor is',
       'Please feel free to make additional comments of the above questions'],
      dtype='object')


In [8]:
# check data

print(df.columns)

Index(['Timestamp', 'Advisor',
       'I am more aware of the opportunities and options available to me',
       'I am better able to select courses and evaluate my academic progress',
       'I feel more confident about deciding my next steps',
       'The advisor was informative and knowledgeable',
       'The advisor was respectful and listened carefully to what I shared',
       'I was satisfied with how my advisor handled my questions',
       'It was easy to talk to my advisor',
       'My overall evaluation of this advising appointment is',
       'My overall evaluation of this advisor is',
       'Please feel free to make additional comments of the above questions'],
      dtype='object')


## 6. Clean data (filter data for selected month)
pandas.Series.dt (https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.html)

```df['Timestamp'].dt.year == filter_year```

```df['Timestamp'].dt.month == filter_month```

In [9]:
# filter data for selected month

filter_year = 2025
filter_month = 12

df = df[df['Timestamp'].dt.year == filter_year]
df = df[df['Timestamp'].dt.month == filter_month]

In [10]:
# check data

print(df)

                   Timestamp           Advisor  \
6350 2025-12-01 12:51:05.322     Nanette Miles   
6351 2025-12-02 11:40:59.414  Marilou Matsuura   
6352 2025-12-03 06:13:57.428         AJ Simpao   
6353 2025-12-03 20:36:38.047         AJ Simpao   
6354 2025-12-04 04:49:57.400  Marilou Matsuura   
6355 2025-12-04 07:41:58.238     Keiko Knudson   
6356 2025-12-05 23:37:46.999        Jason Higa   
6357 2025-12-09 23:04:13.640     Keiko Knudson   

     I am more aware of the opportunities and options available to me  \
6350                                 Strongly Agree (5)                 
6351                                 Strongly Agree (5)                 
6352                                 Strongly Agree (5)                 
6353                                 Strongly Agree (5)                 
6354                                        Neutral (3)                 
6355                                 Strongly Agree (5)                 
6356                                 S

## 7. Clean data (replace NaN with 'No Response')
pandas.DataFrame.fillna (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html)

```df = df.fillna(no_response_text)```

In [None]:
# replace NaN with 'No Response'

df = df.fillna('No Response')

In [None]:
# check data

print(df)

## ðŸŽ‰ Step 1 complete!