## Load the model
Load the trained model from the other Colab file, "616Model.ipybn". To do this ;

- Run the "616Model.ipybn" file (Runtime , Run All)
- In the files section to the left, download the Model file named, "Student_Churn_Prediction_model.pickle"
- Add this downloaded File into the sample_data folder of this current colab file.
- Run the code block below to load the model into this notebook's environemnt

In [None]:
import pickle as pkl

model_pred_churn = pkl.load(open('sample_data/Student_Churn_Prediction_model.pickle', 'rb'))

## Format the new Student Data
To convert the new student data into a format that is readable to our model;
- Run the Productionized Cleaning RMD file ('231128-Project3-FinalProduct.RMD') with necessary raw data files provided by Alan Clift and cleaned by Dr.Farmer's methods

  1) Student Attributes.csv

  2) BA_majors.csv

  3) BA Major Students - Minors.xlsx

  4) BA Major Students - Thematic Sequence ISA2.xlsx

- Download the resultant csv file of cleaned student data named "StudentData_NoFlag.csv"

- Add this file to the sample_data folder of this project

- run the lines below to read in this file

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/mwalstad/Project-3-ISA-616/main/StudentData_NoFlag')

## Format the Data into the Model's format
Run the code blocks below

In [None]:
display(df.head(5))

Unnamed: 0.1,Unnamed: 0,Student.ID,Term.Code,Gender,First.Generation.Indicator,Cum.UG.Crs.GPA,Cohort.Term,Enrolled.Student.Count,Major.1,Major.2,Major.3,Minor,Thematic.Sequence.Title,Course,Final Letter Grade Group
0,1,8071760,202410,Male,Not First Generation College Stdnt,,202410.0,1.0,BA Major,,,,,,
1,2,8085468,202410,Male,Not First Generation College Stdnt,,202410.0,1.0,BA Major,,,,,,
2,3,8085741,202410,Male,Not First Generation College Stdnt,,202410.0,1.0,BA Major,,,,,,
3,4,8088041,202410,Male,Not First Generation College Stdnt,,202410.0,1.0,BA Major,,,,,,
4,5,8088162,202410,Male,Not First Generation College Stdnt,,202410.0,1.0,BA Major,,,,,,


In [None]:
df.drop('Unnamed: 0', inplace = True, axis = 1)

df.drop('Enrolled.Student.Count', axis = 1, inplace = True)

non_finite_mask = ~df['Cohort.Term'].isin([np.nan, np.inf, -np.inf])
df.loc[~non_finite_mask, 'Cohort.Term'] = 0
df['Cohort.Term'] = df['Cohort.Term'].astype(int)

## Add dummy Variables
import numpy as np
df['is_BA_Major'] = np.where(
    (df['Major.1'] == 'BA Major') |
    (df['Major.2'] == 'BA Major') |
    (df['Major.3'] == 'BA Major'),
    1,
    0
)
df['is_BA_Thematic'] = np.where(df['Thematic.Sequence.Title'] == 'ISA2 Applied Business Statistics', 1, 0)
df['is_BA_minor'] = np.where(df['Minor'] == 'Business Analytics', 1, 0)

## Count As, Bs, Cs, Ds
# Filter rows where Grade is 'A' and group by Student.ID to count occurrences
df_grade_count = df[df['Final Letter Grade Group'] == 'A'].groupby('Student.ID').size().reset_index(name='A_Grade_Count')
# Merge the count back to the original DataFrame
df = pd.merge(df, df_grade_count, on='Student.ID', how='left')
# Fill NaN values in the new column with 0
df['A_Grade_Count'] = df['A_Grade_Count'].fillna(0)
# Filter rows where Grade is 'A' and group by Student.ID to count occurrences
df_grade_count = df[df['Final Letter Grade Group'] == 'B'].groupby('Student.ID').size().reset_index(name='B_Grade_Count')
# Merge the count back to the original DataFrame
df = pd.merge(df, df_grade_count, on='Student.ID', how='left')
# Fill NaN values in the new column with 0
df['B_Grade_Count'] = df['B_Grade_Count'].fillna(0)
# Filter rows where Grade is 'A' and group by Student.ID to count occurrences
df_grade_count = df[df['Final Letter Grade Group'] == 'C'].groupby('Student.ID').size().reset_index(name='C_Grade_Count')
# Merge the count back to the original DataFrame
df = pd.merge(df, df_grade_count, on='Student.ID', how='left')
# Fill NaN values in the new column with 0
df['C_Grade_Count'] = df['C_Grade_Count'].fillna(0)
# Filter rows where Grade is 'D' and group by Student.ID to count occurrences
df_grade_count = df[df['Final Letter Grade Group'] == 'D'].groupby('Student.ID').size().reset_index(name='D_Grade_Count')
# Merge the count back to the original DataFrame
df = pd.merge(df, df_grade_count, on='Student.ID', how='left')
# Fill NaN values in the new column with 0
df['D_Grade_Count'] = df['D_Grade_Count'].fillna(0)

## Condense Data to Student
df.drop(['Major.1','Major.2', 'Major.3', 'Minor', 'Thematic.Sequence.Title'], axis = 1, inplace = True)
# Group by 'Student.ID' and keep only the first row for each group
Student_set = df.groupby('Student.ID').first().reset_index()
df = pd.get_dummies(Student_set)

## Remove Null Values
df = df.dropna()


## construct X to make predictions on
studentIndex = df['Student.ID']
X = df.drop(['Student.ID','Term.Code'], axis = 1)

## Predict the Churn of new Student data
run the code block below

In [None]:
pred = model_pred_churn.predict(X)
display(len(studentIndex))
display(len(pred))

709

709