🎮 Steam Game Recommender System using Spark MLlib

📖 Project Overview

This project builds a collaborative filtering recommender system using Apache Spark MLlib on a dataset collected from the Steam gaming platform.
The system leverages implicit feedback data (purchases and playtime) to uncover latent characteristics of both users and games, generating personalized game recommendations.

Developed in Databricks Community Edition, the project demonstrates big data processing, distributed machine learning, and recommendation system design.

🗂️ Dataset

Source: Steam game interaction dataset.
Features:
- member_id → unique user ID
- game → game title
- behavior → purchase or play indicator
- value → implicit rating (hours played or purchase flag)
Data was cleaned, transformed, and formatted into a user–item–rating matrix for ALS.

🔑 Methodology

The project follows a standard machine learning pipeline:

Data Import & Preparation
- Load CSV into Spark DataFrame.
- Clean and preprocess data (remove nulls, rename columns, encode users/games).
- Split dataset into training and testing sets.
Model Training
- Implement Alternating Least Squares (ALS) from Spark MLlib.
- ALS is chosen for its suitability for implicit feedback recommendation tasks.
Evaluation
- Use Root Mean Square Error (RMSE) to evaluate prediction accuracy.
- Hyperparameter tuning performed on:
  - Rank (latent factors)
  - Regularization parameter (λ)
  - Number of iterations
Recommendation Generation
- Top-N game recommendations generated for each user.
- Predictions compared against test data.

✨ Features

Big Data Handling: Uses Apache Spark for distributed processing of large datasets.
Collaborative Filtering: ALS algorithm captures hidden patterns between users and games.
Implicit Feedback Modeling: Works with playtime and purchase data instead of explicit ratings.
Hyperparameter Tuning: Optimizes performance for best RMSE score.
Personalized Recommendations: Produces tailored suggestions per user.

⚙️ Usage

Prerequisites

Databricks Community Edition OR local Spark setup.
Python 3.x with PySpark.

Running the Notebook

Import the provided notebook file into Databricks.
Attach to a Spark cluster.
Run cells sequentially:
- Data import and preprocessing
- Model training with ALS
- Evaluation and recommendation generation

Example Code Snippet

from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator

# Train ALS model
als = ALS(
    userCol="user_id",
    itemCol="game_id",
    ratingCol="value",
    rank=10,
    maxIter=10,
    regParam=0.1,
    coldStartStrategy="drop"
)

model = als.fit(training)

# Evaluate
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="value", predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print(f"Root-mean-square error = {rmse}")

📊 Results

ALS successfully generated personalized recommendations for Steam users.
Model performance measured by RMSE, with tuned hyperparameters improving accuracy.
Example recommendation output:
- User A → Recommended games: Game X, Game Y, Game Z
- User B → Recommended games: Game P, Game Q

🔒 Data Privacy & Ethics

Dataset used for educational purposes only.
No personally identifiable information (PII) is included.
Recommendations respect the principle of implicit user behavior analysis.

✅ Conclusion

This project demonstrates how Spark MLlib’s ALS algorithm can be applied to real-world datasets to build a scalable recommender system.
It shows:

How to preprocess large datasets in Spark.
How to implement and tune ALS for collaborative filtering.
How to evaluate recommendations using RMSE.

Future Improvements:

Integrate content-based filtering (hybrid recommender).
Deploy as an API for real-time recommendations.
Extend evaluation with ranking metrics (Precision@K, MAP).

👨‍💻 Author

Ibinabo Orifama
Module: Big Data Tools & Techniques (BDTT) – Task 2

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
IbinaboOrifama_BDTT_Task2.ipynb		IbinaboOrifama_BDTT_Task2.ipynb
IbinaboOrifama_BDTT_Task2.py		IbinaboOrifama_BDTT_Task2.py
README.md		README.md
README_Steam_Recommender.md		README_Steam_Recommender.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎮 Steam Game Recommender System using Spark MLlib

📖 Project Overview

🗂️ Dataset

🔑 Methodology

✨ Features

⚙️ Usage

Prerequisites

Running the Notebook

Example Code Snippet

📊 Results

🔒 Data Privacy & Ethics

✅ Conclusion

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

ibisoris/Recommender-System-Using-Spark-MLlib

Folders and files

Latest commit

History

Repository files navigation

🎮 Steam Game Recommender System using Spark MLlib

📖 Project Overview

🗂️ Dataset

🔑 Methodology

✨ Features

⚙️ Usage

Prerequisites

Running the Notebook

Example Code Snippet

📊 Results

🔒 Data Privacy & Ethics

✅ Conclusion

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages