MovieRecommenderSystem

An automatic movie recommendations system project

Overview

This project was started for University of Messina course "Advanced Techniques for Data Analysis" from the Master's Degree in Engineering and Computer Science.

TODO

Log

2023/01/21 11:50 - Created the project
2023/01/21 11:55 - Created section "Main approaches for creating recommender systems"
2023/01/21 12:15 - Created sections "Technologies" and "Datasets"
2023/01/21 12:20 - Created section "References"
2023/01/21 12:50 - Created section "Useful links"
2023/01/21 13:20 - Created section "Problem formulation"
2023/01/21 13:50 - Added more sections
2023/01/21 14:40 - Added more sections
2023/01/21 20:30 - Added React, Flask, MongoDB, Docker Compose inside "app" folder
2023/01/26 20:20 - Created job system (mockup)
2023/01/29 15:00 - Finishing Iteration 0
2023/01/29 17:20 - Problem formulation
2023/01/31 18:45 - Uploaded first notebook
2023/01/31 10:00 - Switching away from Google Colab because of runtime limitations

Problem formulation

Recommendation systems are a subclass of information filtering systems.

A recommender system uses:

a dataset of items, like videos, songs, books, movies, items of clothing, e-commerce products etc.
a dataset of user interactions (e.g. reads, likes, buys) with some or all of those items

in order to predict items that are more pertinent to a particular user.

More formally, it is defined by:

$C$: the set of all users
$S$: the set of all possible items that can be recommended
$U$: a utility function $U : C \times S$ [3]

Peculiarities of the recommendation problem

Recommendation is a different enough problem from classification and regression to have its own unique challenges.

Novelty should sometimes be taken into account

TODO
Diversity should be taken into account

Users can get better satisfaction when lists of recommended items are not monotonous, but rather diverse. [4]
Serendipity should be taken into account

Users want to be surprised by item recommendations; they do not want them to be boring or predictable. [5]

As an example, recommending to a user a movie directed by his favorite director is generally not considered a serendipitous recommendation. The user would have likely discovered that movie on its own. [6]
Often, recommender systems cannot be fully offline

TODO
In production environments, recommender systems should be mindful of presentation bias

TODO
In production environments, recommender systems should be robust

TODO
The future is not always like the past

TODO

Main approaches for creating recommender systems

Collaborative filtering

Collaborative filtering systems use a database of preferences for items by users in order to predict additional items a new user might like. They do not rely on the features of users and items for predictions. [1]

Techniques

Memory-based

Neighbor-based

Neighborhood-based techniques can be described as automating the concept of word of mouth. Generally they are very simple to implement; in the least sophisticated case, only one hyperparameter is present (the number of neighbors chosen). They also have good explainability and require no training phase. [6]

Top-N

Model-based

Bayesian belief nets
Clustering
Singular value decomposition (SVD)
Principal component analysis (PCA)
Sparse factor analysis
Neural networks

Content-based filtering

Content-based recommender systems analyze item metadata, like movie names, movie descriptions and tags, and find regularities in the content. They rely on the features of users and items for predictions. [1]

Techniques

TODO

Hybrid approaches

Hybrid approaches combine content-based filtering techniques with those based on collaborative filtering. [2]

Techniques

TODO

Performance metrics

TODO

Project scope and approach

TODO

Iterations

Iteration 0

Iteration 0 is comprised of a mock frontend written in React and a mock REST API written in Python using the Flask framework.

It is meant as a rough first draft to serve as the template for the web app project.

It does not contain an actual Machine Learning model, nor does it contain any data scraper.

Job requests are received from a form in the frontend which accepts a Letterboxd profile name to be scraped.

<div className="Home AppPage">
  <h2>Find movies you will like!</h2>
  <div className="AppContainer">
    <form className="AppForm" onSubmit={handleSubmit(onSubmit)}>
      <div className="AppFormGroup">
        <label htmlFor="name">
          Insert your Letterboxd profile name:
        </label>
        <input
          type="text"
          {...register("name")}
        />
      </div>
      {canSubmit? <div className="AppFormGroup">
        <button className="AppButton" type="submit">
          <PlayArrowIcon />
          Submit
        </button>
      </div> : ""}
    </form>
  </div>
  {jobStatus === null ? "" : <>
    <h3>Job status</h3>
    <div className="AppContainer">
      <JobStatusBar />
    </div>
  </>}
  {(jobResponse === null || jobResponse.length === 0) ? "" : <>
    <h3>Check out your results</h3>
    <div className="AppContainer">
      <ResponseSection />
    </div>
  </>}
</div>

An unique job ID is created in the corresponding Flask endpoint, and it is sent back to the frontend.

Jobs are stored in a MongoDB collection.

@app.route(API_PREFIX + "/request", methods=["POST"])
def post_request():
    my_mongo_connect()
    data = request.get_json()
    if "name" not in data:
        return jsonify({"error": "Missing field name in request body"}), 400
    profile_name = data["name"]
    job_id = str(uuid.uuid4())
    job_doc = Job(job_id=job_id, profile_name=profile_name, status=JOB_STATUSES[0], response=[])
    job_doc.save()
    q.enqueue(process_job, {"id": job_id, "profile_name": profile_name})
    return jsonify({"id": job_id})

Jobs are passed to workers through a Redis queue for asynchronous execution. A worker script, also written in Python, pretends to process the job request going through different phases.

def process_job(job):
    my_mongo_connect()
    for i, job_status in enumerate(JOB_STATUSES):
        job_doc = Job.objects(job_id=job['id']).first()
        if job_doc.stopped: return
        if i == 0: continue
        job_doc.status = job_status
        if i == len(JOB_STATUSES) - 1: 
            job_doc.response = [Suggestion(name=movie['name'], score=movie['score']) for movie in DEFAULT_SUGGESTIONS]
        job_doc.save()
        time.sleep(MOCK_WAIT)

The web page polls a job status endpoint every few seconds, notifying the user of any updates. When the job status is set to "DONE", a job response endpoint is queried and the mock suggestions are downloaded in order to display them.

  useEffect(() => {
    let intervalId = null;

    if (jobStatus === "DONE" && jobResponse === null) {
      intervalId = setInterval(() => {
        fetch(BACKEND_API_URL + `/response/${jobId}`)
          .then((res) => res.json())
          .then((data) => {console.log(data); setJobResponse(data.response)})
          .catch((error) => console.error(error));
      }, POLLING_DELAY_MS);
    }

    return () => clearInterval(intervalId);
  }, [jobId, jobStatus, jobResponse]);

  useEffect(() => {
    let intervalId = null;

    if (jobId) {
      intervalId = setInterval(() => {
        fetch(BACKEND_API_URL + `/status/${jobId}`)
          .then((res) => res.json())
          .then((data) => setJobStatus(data.status))
          .catch((error) => console.error(error));
      }, POLLING_DELAY_MS);
    }

    return () => clearInterval(intervalId);
  }, [jobId]);

Iteration 1

In Iteration 1 two important elements are attached to the backend of the previously created demo:

a Machine Learning model;
a ratings scraper for the Letterboxd profile name input.

Exploratory phase

This iteration requires the beginning of an exploratory phase in order to compare different techniques.

Jupyter Notebooks are used in order to quickly get a feel for the data.

Here is a high level template of the preliminary steps:

# Import necessary libraries

# Download the dataset from Kaggle

# Load the dataset into a Pandas dataframe

# Visualize, clean and preprocess the data

Before starting, it is necessary to download the Kaggle API Token following this guide:

https://www.kaggle.com/general/74235

The first technique tested is neighbourhood-based collaborative filtering.

Here is a high level template of the required steps:

# Create function to calculate similarity between users

# Create a neighborhood-based recommender

# Test the recommender

TODO

Technologies

The following technologies were used for the project:

Frontend

React
CSS
Zustand

TODO

Backend

Flask
Redis
rq, a Python package for Redis queues
MongoDB

TODO

Machine learning

TODO

Containerization, deployment and infrastructure

Docker
Docker Compose

TODO

Datasets

The following datasets were used for the project:

TODO

References

[1]

Xiaoyuan Su and Taghi M. Khoshgoftaar, "A Survey of Collaborative Filtering Techniques", https://downloads.hindawi.com/archive/2009/421425.pdf

[2]

Ana Belén Barragáns-Martínez, Enrique Costa-Montenegro, Juan C. Burguillo, Marta Rey-López, Fernando A. Mikic-Fonte, Ana Peleteiro, A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition, Information Sciences, Volume 180, Issue 22, 2010, Pages 4290-4311, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2010.07.024.

[3]

Nitin Mishra et al 2021 J. Phys.: Conf. Ser. 1717 012002

[4]

Ziegler, C.N., McNee, S.M., Konstan, J.A. and Lausen, G. (2005). "Improving recommendation lists through topic diversification". Proceedings of the 14th international conference on World Wide Web. pp. 22–32.

[5]

Castells, Pablo; Hurley, Neil J.; Vargas, Saúl (2015). "Novelty and Diversity in Recommender Systems". In Ricci, Francesco; Rokach, Lior; Shapira, Bracha (eds.). Recommender Systems Handbook (2 ed.). Springer US. pp. 881–918. doi:10.1007/978-1-4899-7637-6_26. ISBN 978-1-4899-7637-6.

[6]

Christian Desrosiers and George Karypis "A Comprehensive Survey of Neighborhood-based Recommendation Methods" https://www.inf.unibz.it/~ricci/ISR/papers/handbook-neighbor.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
app		app
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieRecommenderSystem

Overview

Log

Problem formulation

Peculiarities of the recommendation problem

Main approaches for creating recommender systems

Collaborative filtering

Techniques

Memory-based

Model-based

Content-based filtering

Techniques

Hybrid approaches

Techniques

Performance metrics

Project scope and approach

Iterations

Iteration 0

Iteration 1

Exploratory phase

Technologies

Frontend

Backend

Machine learning

Containerization, deployment and infrastructure

Datasets

References

Useful links

About

Releases

Packages

Languages

giuven95/MovieRecommenderSystem

Folders and files

Latest commit

History

Repository files navigation

MovieRecommenderSystem

Overview

Log

Problem formulation

Peculiarities of the recommendation problem

Main approaches for creating recommender systems

Collaborative filtering

Techniques

Memory-based

Model-based

Content-based filtering

Techniques

Hybrid approaches

Techniques

Performance metrics

Project scope and approach

Iterations

Iteration 0

Iteration 1

Exploratory phase

Technologies

Frontend

Backend

Machine learning

Containerization, deployment and infrastructure

Datasets

References

Useful links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages