#### Cleaning the Dataframe

In [1]:
# read in the dataframe
import pandas as pd
df = pd.read_csv('trace_data_stores/Section 4 Trace Surveys.csv')

### Observing NAN Values

In [2]:
# look at the NAN values in the entire DataFrame and per column
total_nan = df.isna().sum().sum()
nan_per_column = df.isna().sum()

print("Total NaN values:", total_nan)
print("NaN values per column:\n", nan_per_column)

Total NaN values: 297480
NaN values per column:
 Instructor                                                                 0
Course Title                                                               0
Section                                                                    0
Course ID                                                                  0
Online course materials were organized to help me navigate th…           194
Online interactions with my instructor created a sense of conne…         194
Online course interactions created a sense of community and…             194
I had the necessary computer skills and technology to success…           194
The syllabus was accurate and helpful in delineating expectati…          194
Required and additional course materials were helpful in achie…          194
In-class sessions were helpful for learning.                             194
Out-of-class assignments and/or fieldwork were helpful for lear…         194
This course was intellectua

### Observing which Trace Surveys are Causing NANs

In [3]:
bad_traces = df[df['Online course materials were organized to help me navigate th…'].isna()]
bad_traces

Unnamed: 0,Instructor,Course Title,Section,Course ID,Online course materials were organized to help me navigate th…,Online interactions with my instructor created a sense of conne…,Online course interactions created a sense of community and…,I had the necessary computer skills and technology to success…,The syllabus was accurate and helpful in delineating expectati…,Required and additional course materials were helpful in achie…,...,The section instructor provided feedback that was timely and v…,I would recommend this section instructor to other students.,The section instructor treated students with respect.,The section instructor acknowledged and took effective action…,The section instructor was able to address my content questio…,The section instructor displayed enthusiasm for the course.,The amount of interaction with my section instructor met my le…,How would you rate the use of guest speakers in this course?,How would you rate the quality of our class discussions?,How would you rate the use guest speakers in this course?
26,"Russo, Anthony",Financl Accounting Reporting,01,61313,,,,,,,...,0.1,-0.3,-0.1,0.2,0.0,0.1,0.0,,,
27,"Ruff, Michael",Financl Accounting Reporting,03,61373,,,,,,,...,-0.1,0.1,0.0,0.0,0.1,0.0,0.0,,,
86,"McCarty, Paulette",Introduction to Business,01,61314,,,,,,,...,-0.1,0.1,0.1,0.1,0.3,0.1,0.1,,,
87,"Muscolino, Vincent",Introduction to Business,01,61314,,,,,,,...,-0.1,0.1,0.1,0.1,0.3,0.1,0.1,,,
88,"Coleman, Kerry",Introduction to Business,02,61374,,,,,,,...,0.0,0.3,-0.1,0.0,-0.2,0.0,0.0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8450,"McCullough, Robert",Introduction to Marketing,06,11613,,,,,,,...,0.1,-0.3,0.7,0.1,0.3,0.5,-0.4,,,
8451,"Sieloff, Susan",Introduction to Marketing,06,11613,,,,,,,...,0.1,-0.3,0.7,0.1,0.3,0.5,-0.4,,,
8452,"Wright, Frederick",Introduction to Marketing,07,11637,,,,,,,...,0.8,1.0,0.4,0.1,0.3,0.6,0.9,,,
8453,"Sieloff, Susan",Introduction to Marketing,07,11637,,,,,,,...,0.8,1.0,0.4,0.1,0.3,0.6,0.9,,,


### Make a CSV with All Surveys and then Export

In [4]:
# now replace the NaNs with 0
total_df = df.fillna(0)

# create a new column that sums all the question columns and makes a new one that is the total score
total_df['Professor Score'] = round(total_df.iloc[:, 4:].sum(axis=1), 1)

# remove unneeded columns; change column names; export the csv with all surveys
total_df = total_df.iloc[:, [0, 1, 3, -1]]
total_df.columns = ['instructor', 'course_title', 'course_id', 'professor_score']
# total_df.to_csv('trace_data_stores/All Section 4 Trace Surveys.csv', index=False)

### Make a CSV without the Surveys Causing NANs and then Export

In [5]:
# now filter the dataframe to remove the surveys we don't want
cleaned_df = df[~df.index.isin(bad_traces.index)]

# remove the columns that are causing NANs
cleaned_df = cleaned_df.iloc[:, :23]

# now sum to get the total score
cleaned_df['Professor Score'] = round(cleaned_df.iloc[:, 4:].sum(axis=1), 1)

# remove unneeded columns; change column names; export to a csv with the filtered dataframe
cleaned_df = cleaned_df.iloc[:, [0, 1, 3, -1]]
cleaned_df.columns = ['instructor', 'course_title', 'course_id', 'professor_score']
# cleaned_df.to_csv('trace_data_stores/Filtered Section 4 Trace Surveys.csv', index=False)