<a href="https://colab.research.google.com/github/shakhan-17/Big-Data-Projects/blob/main/Initial_Codes_and_results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**PROJECT TITLE:** A COMPREHENSIVE APPROACH TO ADDRESS THE COLD START PROBLEM IN RECOMMENDER SYSTEMS

**Initial Results and Code**

**Submitted by:** Md Shamsul Arif Khan

**Student ID:** 501140715


**Supervisor Name:** Ceni Babaoglu

**Course Code:** CIND820

**Date of Submission:** November 15, 2023

**Importing necessary libraries and tools**

In [1]:
import pandas as pd
import numpy as np
from zipfile import ZipFile
import urllib.request
import os
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse.linalg import svds
from scipy.sparse import csr_matrix
from sklearn.feature_extraction.text import TfidfVectorizer

**Loading the datasets**

In [2]:
# Link for the MovieLens small dataset
url = 'http://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
file_name = 'movielens_small.zip'

# Download and extract datasets
if not os.path.exists(file_name):
    urllib.request.urlretrieve(url, file_name)
    with ZipFile(file_name, 'r') as zip_ref:
        zip_ref.extractall()

# Load datasets into Pandas DataFrames
movies = pd.read_csv('ml-latest-small/movies.csv')
ratings = pd.read_csv('ml-latest-small/ratings.csv')
tags = pd.read_csv('ml-latest-small/tags.csv')
links = pd.read_csv('ml-latest-small/links.csv')

**Processing the Datasets**

In [3]:
# Merge tags into movies DataFrame
movies = pd.merge(movies, tags, on='movieId', how='left')

# Merge links with movies DataFrame
movies = pd.merge(movies, links, on='movieId', how='left')

# Clean NaN values in tags, genres, and IMDbId columns
movies['tag'] = movies['tag'].fillna('')
movies['genres'] = movies['genres'].str.replace('|', ' ')

# Combine relevant information for movie features
movies['features'] = movies['genres'] + ' ' + movies['tag']

# Split data into training and test sets for collaborative filtering
train_data, test_data = train_test_split(ratings, test_size=0.2, random_state=42)

# Create a user-item matrix for collaborative filtering
train_user_item_matrix = train_data.pivot_table(index='userId', columns='movieId', values='rating').fillna(0)

# Convert the DataFrame into a sparse matrix
train_user_item_matrix_sparse = csr_matrix(train_user_item_matrix.values)

  movies['genres'] = movies['genres'].str.replace('|', ' ')


**Building recommender system using Collaborative Filtering method**

In [4]:
# Collaborative filtering using matrix factorization (SVD)
num_factors = 50
U, sigma, Vt = svds(train_user_item_matrix_sparse, k=num_factors)
sigma = np.diag(sigma)
predicted_ratings = np.dot(np.dot(U, sigma), Vt)

In [5]:
# Function to recommend movies based on collaborative filtering
def collaborative_filtering_recommendations(user_id, predicted_ratings, num_recommendations=10):
    user_ratings = predicted_ratings[user_id - 1]
    sorted_indices = user_ratings.argsort()[::-1]
    user_seen_movies = train_user_item_matrix.columns[train_user_item_matrix.loc[user_id].gt(0)].tolist()

    recommended_movies = []
    for idx in sorted_indices:
        movie_id = idx + 1
        if movie_id not in user_seen_movies:
            movie_info = movies[movies['movieId'] == movie_id]['title'].values
            if len(movie_info) > 0:
                movie_title = movie_info[0]
                recommended_movies.append((movie_title, user_ratings[idx]))
                if len(recommended_movies) >= num_recommendations:
                    break

    return recommended_movies

In [6]:
# Collaborative Filtering Example
user_id_collab = 1
collab_recommended_movies = collaborative_filtering_recommendations(user_id_collab, predicted_ratings)
print(f"Collaborative Filtering Recommendations for User {user_id_collab}:")
for idx, (movie, rating) in enumerate(collab_recommended_movies, start=1):
    print(f"{idx}. {movie} (Predicted Rating: {rating})")


Collaborative Filtering Recommendations for User 1:
1. That Darn Cat (1997) (Predicted Rating: 5.015913227046896)
2. Muppet Christmas Carol, The (1992) (Predicted Rating: 4.870666524016725)
3. Perfect World, A (1993) (Predicted Rating: 4.757768102863207)
4. Fear and Loathing in Las Vegas (1998) (Predicted Rating: 4.679545505282457)
5. Inspector General, The (1949) (Predicted Rating: 4.659823207461228)
6. Interview with the Vampire: The Vampire Chronicles (1994) (Predicted Rating: 4.458921745950045)
7. Wild Reeds (Les roseaux sauvages) (1994) (Predicted Rating: 4.231743713145592)
8. 8 Seconds (1994) (Predicted Rating: 3.97254336381257)
9. American Buffalo (1996) (Predicted Rating: 3.9537390735075695)
10. Crow: City of Angels, The (1996) (Predicted Rating: 3.6317791961621655)


**Building recommender system using Content-based Filtering method**

In [7]:
# Compute the similarity matrix using cosine similarity for content-based filtering
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['features'].values.astype('U'))
item_similarity = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Function to recommend movies based on content-based filtering
def content_based_recommendations(movie_title, similarity_matrix, num_recommendations=10):
    movie_index = movies[movies['title'] == movie_title].index.values[0]
    similar_scores = similarity_matrix[movie_index]
    similar_movies_indices = similar_scores.argsort()[::-1][1:]  # Exclude the movie itself
    similar_movies = movies.iloc[similar_movies_indices]
    return similar_movies[['title', 'genres', 'imdbId']]


# Content-Based Filtering Example
movie_title_content = 'Toy Story (1995)'
content_recommended_movies = content_based_recommendations(movie_title_content, item_similarity)
print("\nContent-Based Filtering Recommendations:")
print(content_recommended_movies.head(10))




Content-Based Filtering Recommendations:
                                                   title  \
1                                       Toy Story (1995)   
3214                                  Toy Story 2 (1999)   
3217                                  Toy Story 2 (1999)   
2484                                Bug's Life, A (1998)   
8672                                           Up (2009)   
4633                               Monsters, Inc. (2001)   
11499                                       Moana (2016)   
3966                    Emperor's New Groove, The (2000)   
9544   Asterix and the Vikings (Astérix et les Viking...   
10948                           The Good Dinosaur (2015)   

                                            genres   imdbId  
1      Adventure Animation Children Comedy Fantasy   114709  
3214   Adventure Animation Children Comedy Fantasy   120363  
3217   Adventure Animation Children Comedy Fantasy   120363  
2484           Adventure Animation Children Comed

**Building recommender system using Hybrid Filtering method**

In [8]:
# Function for hybrid recommendations combining collaborative and content-based approaches
def hybrid_recommendations(user_id, movie_title, num_recommendations=10):
    collab_recommended = collaborative_filtering_recommendations(user_id, predicted_ratings)
    content_recommended = content_based_recommendations(movie_title, item_similarity)

    # Merge recommendations from both models
    hybrid_recommendations = []
    collab_titles = [title for title, _ in collab_recommended]
    for idx, (title, _) in enumerate(collab_recommended):
        if title not in collab_titles:
            hybrid_recommendations.append((title, idx+1))

    content_titles = [title for title in content_recommended['title']]
    for idx, (title, _) in enumerate(content_recommended.values):
        if title not in content_titles:
            hybrid_recommendations.append((title, idx+1))

    hybrid_recommendations = sorted(hybrid_recommendations, key=lambda x: x[1])
    return [movie[0] for movie in hybrid_recommendations[:num_recommendations]]



In [9]:
 # Function for hybrid recommendations combining collaborative and content-based approaches
def hybrid_recommendations(user_id, movie_title, num_recommendations=10):
    collab_recommended = collaborative_filtering_recommendations(user_id, predicted_ratings)
    content_recommended = content_based_recommendations(movie_title, item_similarity)

    hybrid_recommendations = []
    collab_titles = [title for title, _ in collab_recommended]
    for idx, (title, _) in enumerate(collab_recommended):
        if title not in collab_titles:
            hybrid_recommendations.append((title, idx+1))

    content_titles = [title for title in content_recommended['title']]
    for title in content_titles:
        if title not in collab_titles:
            hybrid_recommendations.append((title, idx+1))

    hybrid_recommendations = sorted(hybrid_recommendations, key=lambda x: x[1])
    return [movie[0] for movie in hybrid_recommendations[:num_recommendations]]



# Hybrid Filtering Example
user_id_hybrid = 1
hybrid_recommended_movies = hybrid_recommendations(user_id_hybrid, movie_title_content)
print(f"\nHybrid Recommendations for User {user_id_hybrid} based on '{movie_title_content}':")
for idx, movie in enumerate(hybrid_recommended_movies, start=1):
    print(f"{idx}. {movie}")


Hybrid Recommendations for User 1 based on 'Toy Story (1995)':
1. Toy Story (1995)
2. Toy Story 2 (1999)
3. Toy Story 2 (1999)
4. Bug's Life, A (1998)
5. Up (2009)
6. Monsters, Inc. (2001)
7. Moana (2016)
8. Emperor's New Groove, The (2000)
9. Asterix and the Vikings (Astérix et les Vikings) (2006)
10. The Good Dinosaur (2015)


Creating output for the initial codes and results

In [11]:
# Import the tools
import nbconvert

# To upload the Initial Codes and results file ( "Initial Codes and results.ipynb" ) from PC
from google.colab import files
uploaded = files.upload()

# To generate the pdf file
!jupyter nbconvert --to pdf /content/Initial Codes and results.ipynb

This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr