# Machine Learning Project (Recommender System)
### Noah Choate & Eli Ritchie

## Overview

College students across the nation often struggle with managing their personal finances because of the lack of experience this age group generally has with money. Tuition, housing, food, and other costs tend to add up fast and may leave students in tough situations financially. In order to provide assistance to those who may be financially ignorant, we have created a tool that takes user input for all categories of expenses and income and gives recommendations for which categories of spending should be reduced. We used LightFM as our machine learning model and added code to consider factors when generating output such as tuition being a fixed amount that can not be cut and housing/health and wellness/personal care being categories that can be more difficult to cut expenses in. For the purpose of this assignment, the research question we set out to solve is which areas of spending can college students reduce to help their financial standing. This tool has potential to be very helpful to those who need help financially and are not experienced at budgeting their money.

### Data Information

#### About the Data
For the purpose of this assignment, the dataset we will be using a kaggle dataset that contains undergraduate student income and spending data. This dataset provides a variety of information ranging from income and financial aid, to expenses as big as tuition to as small as personal care and school supplies. To gather a better understaning of each category the dataframe holds, the top 5 entries are printed below. The dataset contains the aforementioned information from 1,000 users.

#### Loading in the Libraries
To begin this assignment, we started by importing pandas in order to load in the dataset and manipulate it. We additionally added in numpy to aid with numerical computing. We then imported in the label encoder provided by sklearn to encode the ids and categorical labels for use in the machine learning algorithm. The tool we will be utilizing for this assignment to create our recommendation system will be LightFM, and the .data addition is to manage the student spending dataset we are loading in. We lastly imported in drive from google colab in order to load in our excel file from our google drive.


In [6]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from lightfm import LightFM
from lightfm.data import Dataset
from google.colab import drive
drive.mount('/content/drive')

df = pd.read_excel("/content/drive/My Drive/ColabNotebooks/DSC3344/Data/student_spending_data.xlsx")
df.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,monthly_income,financial_aid,tuition,housing,food,transportation,books_supplies,entertainment,personal_care,technology,health_wellness,miscellaneous
0,958,270,5939,709,296,123,188,41,78,134,127,72
1,1006,875,4908,557,365,85,252,74,92,226,129,68
2,734,928,3051,666,220,137,99,130,23,239,112,133
3,617,265,4935,652,289,114,223,99,30,163,105,55
4,810,522,3887,825,372,168,194,48,71,88,71,104


#### Preparing the Data

We first wanted to identify the spending categories prior to beginning our recommendation system model. We considered all the expenses are placed it under "spending_categories." We then wanted to add an user_id column which is used to uniquely identify each case of spending in the dataframe.

Next comes the real preparation and manipulating of the dataframe. We started this by converting the data to long format. This is necessary for the analysis as it splits each row into seperate entries with one category each. This is ideal to input into a typical recommender system. We then encoded these categories into integers which are readible to the model.

In [7]:
# Define the relevant spending categories (including tuition) and exclude income
spending_categories = ['housing', 'food', 'transportation', 'books_supplies', 'entertainment',
                       'personal_care', 'technology', 'health_wellness', 'miscellaneous', 'tuition']

# Add a user_id column if it is not present
df['user_id'] = np.arange(0, len(df))

# Transform the data into long format (User, Category, Spending)
data_long = df.melt(id_vars=['user_id'], value_vars=spending_categories,
                      var_name='category', value_name='spending')

# Encode users and categories as integers
category_encoder = LabelEncoder().fit(data_long['category'])
data_long['category_encoded'] = category_encoder.transform(data_long['category'])

# Display the transformed dataset
data_long.tail()

Unnamed: 0,user_id,category,spending,category_encoded
9995,995,tuition,3688,9
9996,996,tuition,3380,9
9997,997,tuition,3497,9
9998,998,tuition,3649,9
9999,999,tuition,5965,9


#### Recommendation System

To begin the creation of the recommendation system, we created a LightFM dataset object, then fitted it with the dataset we just previously manipulated. Next, we created a build interaction matrix which was necessary to distinctively make each entry represent a user's spending amount. We lastly initialized and trained the LightFM model. We added the loss=warp parameter, which is helpful in ranking which categories are most necessary expenditures from a user.

In [8]:
# Create a LightFM dataset object
dataset = Dataset()

# Fit the dataset with user IDs and categories (spending only, no income)
dataset.fit(data_long['user_id'], data_long['category_encoded'])

# Build interactions (user-category spending matrix)
(interactions, _) = dataset.build_interactions(
    [(row['user_id'], row['category_encoded'], row['spending']) for _, row in data_long.iterrows()]
)

# Initialize the LightFM model
model = LightFM(loss='warp')  # WARP is good for ranking/recommendation tasks

# Train the model
model.fit(interactions, epochs=30, num_threads=2)

<lightfm.lightfm.LightFM at 0x7a95f24656f0>

#### Recommendation System

This function defined is aimed to predict scores for all spending categories for a user. It also adjusts scores for certain categories so the system does not recommend cuts for tuition since that usually cannot be cut. It additionally reduces the scores for housing, health and wellness, and personal care which are categories that cannot be cut easily. A minimum spending threshold of $40 has been set as well so that the system does not recommend cuts for categories that have not exceeded this monthly amount. After this, the recommendations are created by sorting the highest predicted scores and choosing the top 3 areas where spending should be reduced. The encoded categories are then converted back to their original names in order to be suitable for user-friendly output.

In [9]:
# Define a function to recommend categories for spending reduction
def recommend_spending_reduction(model, user_id, spending_categories, user_spending, num_recommendations=3, min_spending_threshold=40):

    # Predict scores for all categories for the given user
    scores = model.predict(user_id, np.arange(len(spending_categories)))

    # Apply penalty or boost scores for essential categories
    essential_categories = ['housing', 'health_wellness', 'personal_care']
    for i, category in enumerate(spending_categories):
        if category in essential_categories:
            # Reduce score (penalty) for essential categories, to avoid cutting necessary spending
            scores[i] *= 0.25

    # Filter out categories where spending is too low or that should never be recommended for cuts (like tuition)
    valid_categories = [i for i, category in enumerate(spending_categories)
                        if user_spending[category] >= min_spending_threshold and category != 'tuition']

    # Only consider valid categories for recommendations
    filtered_scores = scores[valid_categories]
    filtered_categories = np.array(spending_categories)[valid_categories]

    # Get top categories with highest predicted spending after adjustments
    top_recommendations = np.argsort(-filtered_scores)[:num_recommendations]

    # Decode category labels back to original names
    recommended_categories = filtered_categories[top_recommendations]

    return recommended_categories

#### Recommendation System

This function calls for the user to input their spending and income data. With that information, then the recommendations are made and finally called on and printed in the last paragraph of code.

After calling for the user's input data, we create a NumPy array 'new_user_interactions'. this step holds the spending amount and prepares the users data for the recommendation system.

Finally, a new_user_id is created to distinguish that a new user's spending data is being inserted, and the recommendations catered to the new user is given.

In [None]:
# Simulate user input and allow users to input all income sources (including tuition)
def get_user_input():
    user_spending = {}
    print("Enter your spending for the following categories (including tuition):")
    for category in spending_categories:
        user_spending[category] = float(input(f"Spending for {category}: "))

    # Gather income sources
    print("\nEnter your income sources:")
    income = float(input("Monthly Income: "))
    loans = float(input("Loans (monthly amount): "))
    parental_support = float(input("Money from Parents/Guardians (monthly amount): "))
    financial_aid = float(input("Financial aid (monthly amount): "))

    # Total income calculation
    total_income = income + loans + parental_support + financial_aid
    print(f"\nTotal available income: {total_income}")

    return user_spending, total_income

# Simulate user input
user_spending, total_income = get_user_input()

# Create an interaction matrix for the new user input
new_user_interactions = np.array([user_spending[category] for category in spending_categories])

# Predict and recommend areas to reduce spending for the new user
new_user_id = 1  # Assume this is a new user or an existing one for demonstration purposes
recommended_categories = recommend_spending_reduction(model, new_user_id, spending_categories, user_spending)

# Display the personalized recommendations
print(f"\nBased on your spending, you should consider reducing spending on:")
for category in recommended_categories:
    print(f"- {category}")

Enter your spending for the following categories (including tuition):


#### Conclusion

With the use of LightFM, we have effectively created a recommender system that after inputting an estimate of their income and expenditure habits for a variety of categories, can accurately recommend what category a user may need to reduce spending on while maintaining some weight on categories the user may struggle to cut down on. This effectively solves the research question that we had set out at the start of this assignment. This is important as a tool like this can alter users bad spending habits and ultimately help put a user in a better financial standing than prior to using this tool. Ultimately, this tool could lead to more financial freedom for users. This tool is mainly catered to college students as the data used for the model consists of typical college students spending habits and income, as well as it takes into account tuition and financial aid. This tool aims to help solve the issue of poor financial literacy in college students, as it gives recommendations of where budget cuts should be made given their own user inputted data.
