## Baseline Recommendation System (EDNet KT4)

This notebook implements simple baseline models for next-item recommendation
using student interaction sequences from the EDNet dataset.

The goal is to establish a lower bound for performance before applying
sequential deep learning models (e.g., GRU-based Knowledge Tracing).


In [2]:
import pandas as pd
from collections import Counter, defaultdict

processed_csv = r"C:\Users\MASTER\OneDrive\Desktop\projects for master\large projects that prove that you a researcher\Intelligent Recommendation System for Coding Courses Based on Learner Behavior\Data\data\processed\ednet_sequences.csv"

df = pd.read_csv(processed_csv)
print(f"Loaded {len(df)} interactions")
df.head()


Loaded 7729 interactions


Unnamed: 0,timestamp,subject_id,item_id,is_correct
0,1567413540117,u12531,q3605,False
1,1567413573276,u12531,q4895,False
2,1567413619332,u12531,q5365,False
3,1567413640139,u12531,q5577,False
4,1567413670061,u12531,q869,False


In [3]:
student_sequences = (
    df.sort_values(["subject_id", "timestamp"])
    .groupby("subject_id")["item_id"]
    .apply(list)
    .to_dict()
)

print(f"Total students: {len(student_sequences)}")


Total students: 147


In [4]:
all_items = df["item_id"].tolist()
global_top_items = [i for i, _ in Counter(all_items).most_common(10)]

def recommend_global():
    return global_top_items


In [5]:
def recommend_from_history(history, k=5):
    counts = Counter(history)
    return [i for i, _ in counts.most_common(k)]


In [6]:
def recall_at_k(sequences, recommender, k=5):
    hits, total = 0, 0

    for seq in sequences.values():
        if len(seq) < 2:
            continue

        for i in range(len(seq) - 1):
            history = seq[:i+1]
            true_next = seq[i+1]

            recs = recommender(history)[:k]
            hits += int(true_next in recs)
            total += 1

    return hits / total if total > 0 else 0


In [8]:
recall_global_1 = recall_at_k(student_sequences, lambda h: global_top_items, k=1)
recall_global_5 = recall_at_k(student_sequences, lambda h: global_top_items, k=5)

recall_history_1 = recall_at_k(student_sequences, recommend_from_history, k=1)
recall_history_5 = recall_at_k(student_sequences, recommend_from_history, k=5)

print(f"Global MF Recall@1: {recall_global_1:.4f}")
print(f"Global MF Recall@5: {recall_global_5:.4f}")
print(f"History MF Recall@1: {recall_history_1:.4f}")
print(f"History MF Recall@5: {recall_history_5:.4f}")


Global MF Recall@1: 0.0038
Global MF Recall@5: 0.0179
History MF Recall@1: 0.0152
History MF Recall@5: 0.0367


## Discussion

The global most-frequent baseline performs poorly, confirming that
EDNet recommendation is a highly personalized and sequential problem.

The history-based baseline shows slight improvement but remains limited,
motivating the use of recurrent neural networks in the next stage.
