# Virtual Study Buddy – Matching Algorithm

This notebook builds the logic for pairing students based on shared attributes like availability, preferred subjects, GPA, and more. It includes both a default algorithm and a customizable version driven by user preferences.

## Set Up

### Import Libraries

In [97]:
# Import Libraries
import sqlite3 
import pandas as pd

### Connect to Database

In [98]:
# Connect to the SQLite database
# Make sure the path is correct. Adjust the path as needed.
database = ('../data/processed/study_buddy.db')
conn = sqlite3.connect(database)

### Load & Merge Data

In [99]:
# Load tables into DataFrames
df_students = pd.read_sql("SELECT * FROM students", conn)
df_subjects = pd.read_sql("SELECT * FROM subjects", conn)
df_student_subjects = pd.read_sql("SELECT * FROM student_subjects", conn)
df_study_days = pd.read_sql("SELECT * FROM study_days", conn)
df_utc_study_days = pd.read_sql("SELECT * FROM utc_study_days", conn)


## Default Mode Matching Logic

This section defines the default matching logic for the Virtual Study Buddy App. The algorithm identifies compatible student pairs based on the following criteria:

1. Overlapping UTC Availability Days
Matches are only considered between students who share at least one common study day in UTC format.

2. Shared Study Subjects
Students must have at least one preferred subject in common to ensure academic alignment.

3. Similar Study Styles
Matching is further refined by comparing study style preferences to improve compatibility.

The result is a list of suggested peer matches generated automatically, without requiring user input or customization.

In [100]:
# Merge student-subject relationships into one DataFrame
df_merged_subjects = pd.merge(df_student_subjects, df_subjects, on="subject_id", how="left")
df_student_with_subjects = pd.merge(df_merged_subjects, df_students, on="student_id", how="left")



### Create subject name mapping and build student profiles

In [101]:
# Create a subject_id -> subject_name dictionary
subject_map = df_subjects.set_index("subject_id")["subject_name"].to_dict()

# Replace subject_id with subject_name in student_subjects table
df_student_subjects["subject_name"] = df_student_subjects["subject_id"].map(subject_map)

# Build a profile for each student with their subjects, availability, and study style
student_profiles = {}

for sid in df_students["student_id"]:
    subjects = set(df_student_subjects[df_student_subjects["student_id"] == sid]["subject_name"])
    days = set(df_utc_study_days[df_utc_study_days["student_id"] == sid]["utc_day"])
    style = df_students[df_students["student_id"] == sid]["study_style"].values[0]
    personality = df_students[df_students["student_id"] == sid]["personality_type"].values[0] 
    gpa_goal = df_students[df_students["student_id"] == sid]["GPA"].values[0] 
    
    student_profiles[sid] = {
        "subjects": subjects,
        "days": days,
        "style": style,
        "personality": personality,
        "GPA": gpa_goal
    }

# 

### Compute match scores

In [102]:
# Match each student with others who share availability, subjects, and study style
match_results = []

student_ids = df_students["student_id"].tolist()

for sid in student_ids:
    sid_subjects = student_profiles[sid]["subjects"]
    sid_days = student_profiles[sid]["days"]
    sid_style = student_profiles[sid]["style"]
    
    for partner_id in student_ids:
        if sid == partner_id:
            continue  # Skip matching with self
        
        partner_subjects = student_profiles[partner_id]["subjects"]
        partner_days = student_profiles[partner_id]["days"]
        partner_style = student_profiles[partner_id]["style"]
        
        subject_match = len(sid_subjects & partner_subjects)
        day_match = len(sid_days & partner_days)
        style_match = 1 if sid_style == partner_style else 0
        
        total_score = subject_match + day_match + style_match  # Simple additive score
        
        match_results.append({
            "student_id": sid,
            "potential_match": partner_id,
            "subject_overlap": subject_match,
            "day_overlap": day_match,
            "style_match": style_match,
            "total_score": total_score
        })



### Convert to DataFrame and get top matches

In [103]:
# Convert match results to a DataFrame
match_df = pd.DataFrame(match_results)

# Sort matches by student and by highest score
top_matches = match_df.sort_values(by=["student_id", "total_score"], ascending=[True, False])

# Show top 3 matches for each student
top_matches.groupby("student_id").head()


Unnamed: 0,student_id,potential_match,subject_overlap,day_overlap,style_match,total_score
46,stu1000,stu1047,0,5,1,6
81,stu1000,stu1082,1,4,1,6
48,stu1000,stu1049,1,3,1,5
50,stu1000,stu1051,1,3,1,5
53,stu1000,stu1054,0,4,1,5
...,...,...,...,...,...,...
23717,stu1154,stu1001,0,4,1,5
23728,stu1154,stu1012,1,4,0,5
23735,stu1154,stu1019,0,4,1,5
23760,stu1154,stu1044,0,4,1,5


## Custom Matching Based on User Preferences 
This section introduces a more flexible matching approach where students can be paired based on selected priorities. Instead of relying solely on availability and shared subjects, this logic allows users to emphasize specific traits that matter to them.

**Two core functions are included:**

- `custom_match(user_id, preferences):` Matches one student with others based on criteria like subject overlap, shared study days, similar study style, or a mutual goal to improve GPA. Matching is driven by the user's selected preferences using a dictionary of booleans.

- `generate_all_custom_matches(preferences):` Wraps the logic into a loop that generates top matches for every student in the dataset using the same set of preferences.

This structure allows future integration with the frontend (e.g., checkboxes in an HTML form), enabling each student to customize how they want to be matched. Matching results are scored and returned as a ranked DataFrame, making them easy to analyze, display, or export.

In [104]:
 # Function to generate custom matches for a single user based on selected preferences
def custom_match(user_id, preferences):
    """
    Matches a given student with others based on selected matching preferences.
    
    Args:
        user_id (int): The ID of the student to match.
        preferences (dict): A dictionary specifying which criteria to match on. 
            Example: {'subjects': True, 'days': False, 'style': True, 'GPA': True}
    
    Returns:
        List of dictionaries containing matched students and match score components.
    """
    # Check if user exists in the student profiles
    if user_id not in student_profiles:
        print(f"User {user_id} not found.")
        return pd.DataFrame()

    user_profile = student_profiles[user_id]
    user_subjects = user_profile['subjects']
    user_days = user_profile['days']
    user_style = user_profile['style']
    user_goal = user_profile.get('GPA', None) 
    user_personality = user_profile['personality'] 


    results = []

    for partner_id in student_profiles:
        if partner_id != user_id:  # Avoid matching with self
            partner = student_profiles[partner_id]
            score = 0

            # Match based on selected preferences
            if preferences.get('subjects'):
                score += len(user_subjects & partner['subjects'])

            if preferences.get('days'):
                score += len(user_days & partner['days'])

            if preferences.get('style'):
                score += 1 if user_style == partner['style'] else 0

            if preferences.get('GPA'):
                score += 1 if user_goal == partner.get('GPA', None) else 0

            if preferences.get('personality'):
                score += 1 if user_personality == partner['personality'] else 0

            results.append({
                'student_id': user_id,
                'match_id': partner_id,
                'subject_overlap': len(user_subjects & partner['subjects']),
                'day_overlap': len(user_days & partner['days']),
                'style_match': user_style == partner['style'],
                'goal_match': user_goal == partner.get('GPA', None),
                'personality_match': user_personality == partner['personality'],
                'total_score': score
            })

    # Return the top 3 matches sorted by score
    sorted_results = sorted(results, key=lambda x: x['total_score'], reverse=True)
    return sorted_results[:3]


In [105]:
# Function to generate top matches for all users using custom preferences
def generate_all_custom_matches(preferences):
    """
    Applies the custom_match function to all students in the dataset.
    
    Args:
        preferences (dict): Matching preferences selected by user.
        
    Returns:
        DataFrame of all top matches across users.
    """
    all_matches = []

    for user_id in student_profiles:
        matches = custom_match(user_id, preferences)
        all_matches.extend(matches)

    return pd.DataFrame(all_matches)


In [106]:
# Define sample user preferences for matching
user_preferences = {
    'subjects': True,
    'days': True,
    'style': True,
    'GPA': True, 
    'personality': True
}

# Generate top matches for all users
custom_matches_df = generate_all_custom_matches(user_preferences)

# Display the result
custom_matches_df


Unnamed: 0,student_id,match_id,subject_overlap,day_overlap,style_match,goal_match,personality_match,total_score
0,stu1000,stu1047,0,5,True,False,False,6
1,stu1000,stu1082,1,4,True,False,False,6
2,stu1000,stu1090,0,4,True,False,True,6
3,stu1001,stu1019,1,4,True,False,False,6
4,stu1001,stu1026,0,4,True,True,False,6
...,...,...,...,...,...,...,...,...
460,stu1153,stu1025,1,3,True,False,False,5
461,stu1153,stu1044,0,4,True,False,False,5
462,stu1154,stu1001,0,4,True,False,False,5
463,stu1154,stu1012,1,4,False,False,False,5
