<h1>TikTok Algorithm with Machine Learning</h1>

The TikTok algorithm splits into two types of recommendation algorithms which can ultimately be formed into a single ensemble-type algorithm, namely collaborative filtering and content-based filtering.



<h2>Collaborative Filtering</h2>

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator

# Initialize Spark session
spark = SparkSession.builder \
    .appName("ALSRecommendation") \
    .getOrCreate()

# Read the text file into an RDD
lines = spark.read.text("data.txt").rdd

# Assume data format: userId::itemId::rating
parts = lines.map(lambda row: row.value.split("::"))
liked_data = parts.map(lambda p: Row(userId=int(p[0]), itemId=int(p[1]), like=float(p[2])))

# Create DataFrame
liked_df = spark.createDataFrame(liked_data)

# Split into training and test sets
(training, test) = liked_df.randomSplit([0.75, 0.25])

# Build the ALS model
als = ALS(
    maxIter=4,
    regParam=0.05,
    userCol="userId",
    itemCol="itemId",
    ratingCol="like",
    coldStartStrategy="drop"
)

# Train the model
model = als.fit(training)

# Make predictions
predictions = model.transform(test)

# Evaluate the model
evaluator = RegressionEvaluator(
    metricName="rmse",
    labelCol="like",
    predictionCol="prediction"
)

rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

# Recommend top 20 items for each user
user_recommendations = model.recommendForAllUsers(20)
user_recommendations.show()


Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/07/10 17:19:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/07/10 17:19:12 WARN InstanceBuilder: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
25/07/10 17:19:12 WARN InstanceBuilder: Failed to load implementation from:dev.ludovic.netlib.lapack.JNILAPACK


Root-mean-square error = 2.996586135029316
+------+--------------------+
|userId|     recommendations|
+------+--------------------+
|    10|[{106, 4.981884},...|
|     1|[{102, 4.9233437}...|
|     2|[{104, 4.4804144}...|
|     4|[{104, 1.994873},...|
|     5|[{101, 4.9790974}...|
|     6|[{104, 3.989746},...|
|     7|[{102, 5.024801},...|
|     8|[{106, 4.4915133}...|
|     9|[{101, 4.033677},...|
+------+--------------------+



<h2>Content-Based Filtering</h2>

In [6]:
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

data = pd.read_csv("data.csv")

alg = cosine_similarity(data)
alg

array([[1.        , 0.99040455, 0.99901581, 0.99865319],
       [0.99040455, 1.        , 0.99555612, 0.98196475],
       [0.99901581, 0.99555612, 1.        , 0.99538834],
       [0.99865319, 0.98196475, 0.99538834, 1.        ]])