## HEXAD vs student activity

This notebook explores the impact of HEXAD traits on student activity.


In [None]:
%load_ext autoreload
%autoreload 2

import sys
sys.path.append("..")

import pandas as pd
import src.modeling.ols_model as ols_models
import src.modeling.negative_binomial_model as nb_models
import src.modeling.binomial_model as bin_models
import src.modeling.logistic_model as logistic_models

# Load and scale data
df = pd.read_csv("../data/preprocessed/student_time_features_2021_2024.csv")
scale_cols = ['pre_test', 'HEXAD_P', 'HEXAD_S', 'HEXAD_F', 'HEXAD_A', 'HEXAD_D', 'HEXAD_R']
df = ols_models.standardize_columns(df, scale_cols)

## Solved tasks

This model examines how students’ HEXAD traits and prior knowledge relate to the number of programming tasks they completed. It helps identify which traits are associated with higher or lower task-solving activity.

In [None]:
# Select model
model_name = "solved_tasks_main"
formula = nb_models.get_nb_formula_by_name(model_name)

# Fit the selected model
model = nb_models.fit_negative_binomial(df, formula)

In [None]:
formula

In [None]:
model.summary()

## Participation

This model examines how students’ HEXAD traits and prior knowledge relate to whether they participated (submitted code at least once). It helps identify which traits are associated with a higher or lower likelihood of participation.

In [None]:
df = logistic_models.add_participation_variable(df)

In [None]:
model_name = "participated_main"
formula = logistic_models.get_logit_formula_by_name(model_name)
model = logistic_models.fit_logit_model(df, formula)

In [None]:
formula

In [None]:
model.summary()

## Late work

This model examines how students’ HEXAD traits and prior knowledge relate to the likelihood of being a late worker (i.e., submitting most work near the deadline). It helps identify which traits are associated with a higher or lower tendency to submit tasks late.

In [None]:
late_work_threshold = 0.9
df = logistic_models.add_late_worker_variable(df, late_work_threshold)

In [None]:
model_name = "late_work_main"
formula = logistic_models.get_logit_formula_by_name(model_name)
model = logistic_models.fit_logit_model(df, formula)

In [None]:
formula

In [None]:
model.summary()

## Shallow engagement

This model examines how students’ HEXAD traits and prior knowledge relate to the likelihood of showing shallow engagement, defined as being among the least active 25% of students (based on n_days_active). It helps identify which traits are associated with lower levels of consistent activity.

In [None]:
engagement_threshold = 0.25
df = logistic_models.add_shallow_engagement_variable(df, engagement_threshold)

In [None]:
model_name = "shallow_engagement_main"
formula = logistic_models.get_logit_formula_by_name(model_name)
model = logistic_models.fit_logit_model(df, formula)

In [None]:
formula

In [None]:
model.summary()

## Solve rate

This model examines how students’ HEXAD traits and prior knowledge relate to their solve rate - the proportion of successfully completed tasks out of all attempts. It helps identify which traits are associated with more effective problem solving during task submissions.

In [None]:
# Prepare data
endog, exog =  bin_models.prepare_solve_rate_data(df)

# Fit binomial model
model = bin_models.fit_binomial_model(endog, exog)

In [None]:
model.summary()

## First day

This model examines how students’ HEXAD traits and prior knowledge relate to the timing of their first code submission. It helps identify which traits are associated with earlier or later initial engagement in the course.

In [None]:
# Filter out zero-day users for 'first_day'
df = df[df["first_day"] > 0].copy()

In [None]:
is_over, mean_val, var_val = nb_models.check_overdispersion(df, count_col="first_day")
print(f"Mean: {mean_val:.2f}, Variance: {var_val:.2f}")
if is_over:
    print("➡ Overdispersion likely — NB model appropriate")

model_name = "first_day_main"
# Get formula and fit model
formula = nb_models.get_nb_formula_by_name(model_name)
model = nb_models.fit_negative_binomial(df, formula)

In [None]:
formula

In [None]:
model.summary()

## Median day of activity

This model examines how students’ HEXAD traits and prior knowledge relate to the median day of their activity during the course. It helps identify which traits are associated with earlier or later patterns of sustained engagement.

In [None]:
is_over, mean_val, var_val = nb_models.check_overdispersion(df, count_col="median_day_of_activity")
print(f"Mean: {mean_val:.2f}, Variance: {var_val:.2f}")
if is_over:
    print("➡  Potential overdispersion detected (variance > mean)")

In [None]:
model_name = "median_day_main"
formula = nb_models.get_nb_formula_by_name(model_name)
model = nb_models.fit_negative_binomial(df, formula)

In [None]:
formula

In [None]:
model.summary()

## Days of activity

This model examines how students’ HEXAD traits and prior knowledge relate to the number of days they were active on the platform (code submission) during the course. It helps identify which traits are associated with more consistent engagement over time.

In [None]:
is_over, mean_val, var_val = nb_models.check_overdispersion(df, count_col="n_days_active")
print(f"Mean: {mean_val:.2f}, Variance: {var_val:.2f}")
if is_over:
    print("➡  Potential overdispersion detected (variance > mean)")

In [None]:
model_name = "days_of_activity_main"
formula = nb_models.get_nb_formula_by_name(model_name)
model = nb_models.fit_negative_binomial(df, formula)

In [None]:
formula

In [None]:
model.summary()

## Weeks of activity

This model examines how students’ HEXAD traits and prior knowledge relate to the number of weeks they were active during the course. It helps identify which traits are associated with sustained engagement across multiple weeks.

In [None]:
is_over, mean_val, var_val = nb_models.check_overdispersion(df, count_col="n_weeks_active")
print(f"Mean: {mean_val:.2f}, Variance: {var_val:.2f}")
if is_over:
    print("➡  Potential overdispersion detected (variance > mean)")

In [None]:
model_name = "weeks_of_activity_main"
formula = nb_models.get_nb_formula_by_name(model_name)
model = nb_models.fit_negative_binomial(df, formula)

In [None]:
formula

In [None]:
model.summary()