In [0]:
#What is the ALS Model?
ALS (Alternating Least Squares) is a popular algorithm used in recommendation systems, particularly for collaborative filtering. It predicts user preferences for items (e.g., movies, products) based on historical user-item interactions (e.g., ratings, purchases). The ALS model is efficient and scalable, making it suitable for large datasets.

#How ALS Works
ALS is based on matrix factorization, where the interaction matrix is decomposed into two smaller matrices:

User Matrix (𝑈): Represents the preferences of each user for latent features.
Item Matrix (𝑉): Represents the properties of each item based on latent features.
The goal is to approximate the interaction matrix 𝑅 (e.g., user-item ratings) as: 𝑅≈𝑈×𝑉𝑇

#Step-by-Step Process
#1.Input Data
A sparse matrix R is created, where rows represent users, columns represent items, and the values represent interactions (e.g., ratings or clicks).
Missing values indicate no interaction between a user and an item.

#2.Matrix Factorization
ALS factorizes R into two matrices:
𝑈 (Users × Features): User preferences.
𝑉 (Items × Features): Item attributes.
The latent features capture relationships between users and items.

#3.Alternating Optimization
ALS alternates between fixing 𝑈 and solving for 𝑉 and then fixing 𝑉 and solving for 𝑈.
Each step minimizes the squared error between the predicted and actual ratings:

#4.Cold Start Handling
For new users or items with no data, ALS cannot make predictions directly. Strategies like default recommendations or content-based filtering are often combined with ALS.

#5.Predictions
After convergence, the model predicts missing values in 𝑅 by multiplying 𝑈×𝑉𝑇.
Recommendations are generated by finding the top items for each user with the highest predicted values.

# Advantages of ALS
Scalability: Efficiently handles large, sparse datasets.
Parallelizable: Well-suited for distributed computing frameworks like Apache Spark.
Regularization: Reduces overfitting by adding a penalty for large values in 𝑈 and 𝑉.
Flexibility: Works with implicit and explicit feedback.


In [0]:
#Recommend whey protein products. We'll use a sample dataset of user ratings for whey protein products.

#Step-by-Step Example with PySpark Code
# 1. Sample Dataset : We’ll create a dataset with user ratings for various whey protein products.

user_id	product_id	rating
1	101	4.5
1	102	3.0
2	101	5.0
2	103	2.0
3	102	4.0
3	104	3.5

In [0]:
from pyspark.ml.recommendation import ALS
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, IntegerType, FloatType

# Define schema
schema = StructType([
    StructField("user_id", IntegerType(), True),
    StructField("product_id", IntegerType(), True),
    StructField("rating", FloatType(), True)
])

# Training Data
train_data = [
    (1, 101, 4.0), (1, 102, 5.0), (1, 103, 3.0),
    (2, 101, 2.0), (2, 102, 4.0), (3, 103, 5.0),
    (3, 104, 3.0), (4, 101, 3.5), (4, 104, 4.5)
]
train_df = spark.createDataFrame(train_data, schema)

# Test Data
test_data = [
    (1, 104, 4.0), (2, 103, 3.5), (3, 102, 4.5), (4, 102, 3.0)
]
test_df = spark.createDataFrame(test_data, schema)


In [0]:
# Step 4: Build ALS Model
als = ALS(      # Creates an instance of the ALS (Alternating Least Squares) model, which is a collaborative filtering algorithm for recommendation systems.
    maxIter=10,  # Specifies the maximum number of iterations the ALS algorithm will run
    regParam=0.1,  # It controls the trade-off between minimizing the error on the training data and preventing overfitting by penalizing large or complex model parameters.

    userCol="user_id",   # Specifies the name of the column in the dataset that represents the unique IDs for users.
    itemCol="product_id",  # Specifies the name of the column in the dataset that represents the unique IDs for items (or products) being recommended.
    ratingCol="rating",  # Specifies the name of the column that contains user ratings for the items.
    coldStartStrategy="drop"  # Handles cases where predictions can't be made
)

# Train the ALS model
model = als.fit(train_df)

#1.Trains the ALS model on the train_df dataset.
#2.train_df is the DataFrame containing user-item interaction data, which includes the columns for users (user_id), items (product_id), and their corresponding ratings (rating).


In [0]:
# Step 5: Make Predictions
predictions = model.transform(test_df)

# Show predictions
predictions.show()


+-------+----------+------+----------+
|user_id|product_id|rating|prediction|
+-------+----------+------+----------+
|      1|       104|   4.0| 3.3963487|
|      2|       103|   3.5| 1.7770505|
|      3|       102|   4.5| 3.1851737|
|      4|       102|   3.0| 3.0493019|
+-------+----------+------+----------+



In [0]:
# Step 6: Recommend Top 3 Products for Each User
user_recommendations = model.recommendForAllUsers(3)
user_recommendations.show(truncate=False)


+-------+-----------------------------------------------------+
|user_id|recommendations                                      |
+-------+-----------------------------------------------------+
|1      |[{102, 4.937282}, {101, 3.8121293}, {104, 3.3963487}]|
|2      |[{102, 3.860107}, {101, 2.076191}, {103, 1.7770505}] |
|3      |[{103, 4.853014}, {102, 3.1851737}, {104, 3.0128176}]|
|4      |[{104, 4.347753}, {103, 3.5297143}, {101, 3.4983032}]|
+-------+-----------------------------------------------------+



In [0]:
# Step 7: Recommend Top 3 Users for Each Product (Optional)
product_recommendations = model.recommendForAllItems(3)
product_recommendations.show(truncate=False)

+----------+------------------------------------------------+
|product_id|recommendations                                 |
+----------+------------------------------------------------+
|101       |[{1, 3.8121293}, {4, 3.4983032}, {3, 2.5929391}]|
|102       |[{1, 4.937282}, {2, 3.860107}, {3, 3.1851737}]  |
|103       |[{3, 4.853014}, {4, 3.5297143}, {1, 3.0178907}] |
|104       |[{4, 4.347753}, {1, 3.3963487}, {3, 3.0128176}] |
+----------+------------------------------------------------+

