## question4
### answer:
One potential improvement is to implement a hybrid recommendation model that combines collaborative filtering with content-based techniques. With access to item features and an open-source LLM, we can generate dense semantic embeddings from item metadata—like descriptions, categories, and tags. These embeddings can be integrated with historical user-item interactions to boost recommendation relevance, particularly in cold-start scenarios or for items with limited interaction history. This approach helps improve overall personalization while leveraging the new infrastructure and tools now available.

In [6]:
import pandas as pd
import os
from cf import ItemBasedCF
from mf import MFRecommender

In [7]:
def load_data(data_path='ml-latest-small'):
    """
    Load MovieLens dataset
    
    Parameters:
        data_path: Dataset path
        
    Returns:
        ratings_df: Rating data
        movies_df: Movie data
    """
    ratings_file = os.path.join(data_path, 'ratings.csv')
    movies_file = os.path.join(data_path, 'movies.csv')
    
    ratings_df = pd.read_csv(ratings_file)
    movies_df = pd.read_csv(movies_file)
    
    return ratings_df, movies_df

In [8]:
def item_based_cf_results(ratings_df):
    """
    Run item-based collaborative filtering model and get results
    
    Parameters:
        ratings_df: Rating data
    """
    print("Training item-based collaborative filtering model...")
    cf_model = ItemBasedCF()
    cf_model.fit(ratings_df)
    
    # Find the 10 most similar movies for the given movie IDs
    target_movies = [260, 1407, 4993]
    
    print("\nItem-based collaborative filtering model results:")
    for movie_id in target_movies:
        similar_movies = cf_model.get_similar_movies(movie_id, top_n=10)
        print(f"10 most similar movies to movie {movie_id}: {similar_movies}")
    
    return cf_model

In [9]:
def matrix_factorization_results(ratings_df):
    """
    Run matrix factorization model and get results
    
    Parameters:
        ratings_df: Rating data
    """
    print("\nTraining matrix factorization model...")
    mf_model = MFRecommender(n_factors=20, n_epochs=5)  # Reduce training epochs to speed up demonstration
    mf_model.fit(ratings_df)
    
    # Recommend top 10 movies for given users
    target_users = [1, 2, 3]
    
    print("\nMatrix factorization model results:")
    for user_id in target_users:
        recommended_movies = mf_model.recommend_for_user(user_id, top_n=10)
        print(f"10 movies recommended for user {user_id}: {recommended_movies}")
    
    return mf_model


In [10]:
# Load data
ratings_df, movies_df = load_data()
print(f"Loaded {len(ratings_df)} rating records and {len(movies_df)} movies")
    
# Run item-based collaborative filtering model
cf_model = item_based_cf_results(ratings_df)
    
# Run matrix factorization model
mf_model = matrix_factorization_results(ratings_df)
    
print("\nTask completed!")

Loaded 100836 rating records and 9742 movies
Training item-based collaborative filtering model...

Item-based collaborative filtering model results:
10 most similar movies to movie 260: [1196, 1210, 1198, 2571, 1291, 1270, 2628, 1240, 858, 2028]
10 most similar movies to movie 1407: [1717, 2710, 1387, 1573, 2115, 3499, 1517, 2502, 1994, 1393]
10 most similar movies to movie 4993: [7153, 5952, 6539, 2571, 4306, 2959, 4226, 5349, 3578, 33794]

Training matrix factorization model...
Epoch 1/5, Loss: 0.9811
Epoch 2/5, Loss: 0.9481
Epoch 3/5, Loss: 0.9315
Epoch 4/5, Loss: 0.9267
Epoch 5/5, Loss: 0.9265

Matrix factorization model results:
10 movies recommended for user 1: [np.int64(6818), np.int64(53), np.int64(26810), np.int64(6835), np.int64(5746), np.int64(93022), np.int64(40491), np.int64(136503), np.int64(136834), np.int64(6442)]
10 movies recommended for user 2: [np.int64(6818), np.int64(93008), np.int64(142444), np.int64(53), np.int64(107771), np.int64(118894), np.int64(6835), np.int